Slovenska bibliografija

NUK

napredno iskanje

domov > napredno iskanje > rezultati > izpis

Zapis SUTRS

VRSTA GRADIVA	analitična raven (sestavni del), tekstovno gradivo, tiskano, 1.01 - izvirni znanstveni članek
DRŽAVA IZIDA	Slovenija
LETO IZIDA	2004
JEZIK BESEDILA/IZVIRNIKA	slovenski
PISAVA	latinica
AVTOR	Rotovnik, Tomaž - avtor
ODGOVORNOST	Horvat, Bogomir - avtor // Kačič, Zdravko - avtor
NASLOV	Zasnova sistema za avtomatsko razpoznavanje govora s podporo mere zaupanja
V PUBLIKACIJI	Elektrotehniški vestnik. - ISSN 0013-5852. -ǂLetn. ǂ71, ǂšt. ǂ3 (2004), str. 128-133.
KRATKA VSEBINA	V članku obravnavamo sistem za avtomatsko razpoznavanje govora (ARG), ki vključuje podporo za ovrednotenje razpoznanehipoteze (mera zaupanja). Proučili smo vpliv mere zaupanja na uspešnost razpoznavanja. Uporabili smo več mer zaupanja ter s pomočjo le-teh in l1elinearnega klasifikatorja izboljšali uspešnost izločanja besed, ki jih ni v slovarju. Mero zaupanja smo ovrednotili z napako zamenjave (CER - Confusion Error Rate) in s krivuljo ROC (Receiver Operating Characteristic). Z uporabo akustične mere zaupanja smo pri slovenski govorni bazi SpeechdatII dosegli CER 12.5%. Z uporabo klasifikatorja na podlagi nevronske mreže smo dodatno zmanjšali napako CER za 2.2%. // The paper is concerned with an architecture for the ASR (Automatic Speech Recognition) system with integrated confidence measure (CM) support. The system was designed to be modular, upgradeable, scalable. This means that any module can be improved, while its improvement will not affect other modules. Figure 1 shows the architecture of this system. CM is defined as aposterior probability of word correctness, given the values of some set of confidence indicators [6]. CM can be used in several applications throughout the recognition process. In the presented system, it is used fordetecting Out Of Vocabulary (OOV) words, where it belongs to the acoustic CM category [5]. Equation (2) presents the derived CM equation and is basedon statistics, derived directly from the maximum-likelihood Viterbi beam search decoder. Prob(Wi) is assigned for acoustical probability of the recognized word Wi in interval [ts, TJ. In the search process, there is no information on the recognized sequence of the states for word Wi (word level alignment). Because Prob (Wi) includes intra-model probabilities, thesecond element compensates their influence on CM, since CM will decrease when the length of word Wi increases. Intramodel probabilities were set to 0.5, and had only a minor impact on the recognition accuracy (less than 1 %). The last element is the Experiments were performed on the Slovenian speech database SpeechdatII [3] and were divided into three pans. First we investigated the influence of the search space size and the number of Gaussian mixtures on the acoustic CM. Reduction of the search space has a small inpact on ability to differentiate between correct and OOV words (decreased). The different number of Gaussian mixtures in our experiments shows no influence on acoustic CM. Then we performed evaluation with acoustic CM. We used the ROC curve (Receiver Operation Characteristics) to show the efficiency of acoustic CM (Figure 4), since it shows the relation between correctly accepted correct hypotheses and incorrectly accepted incorrect hypotheses. In the last part, non-linear classifiers rejected OOVwords and incorrectly recognized words. The CM efficiency was estimated with the CER - Confusion Error Rate (Eq.3), ERI (Eq.4) and ER2 (Eq.5) errorrates. Calculation of acoustic CM presented less than 3% of the total recognition process time and CM correctly rejected 75% of OOV words and achieved a 12,5% CER. The best results were achieved with a non-linear classifier based on neural networks (Table 2a). System decreased CER for 2.2% absolute, when compared to the baseline system with only acoustic confidence measure (Table 1b).
OPOMBE	Povzetek ; Abstract // Bibliografija: str. 133
OSTALI NASLOVI	Architecture of ASR System with confidence measure support
PREDMETNE OZNAKE	// avtomatsko razpoznavanje govora // napake // šum // algoritmi
UDK	004.9:007.5

izvedba, lastnina in pravice: NUK 2010