Skip Navigation Linksdomov > napredno iskanje > rezultati > izpis
Zapis SUTRS

VRSTA GRADIVAanalitična raven (sestavni del), tekstovno gradivo, tiskano, 1.01 - izvirni znanstveni članek
DRŽAVA IZIDASlovenija
LETO IZIDA2002
JEZIK BESEDILA/IZVIRNIKAslovenski
PISAVAlatinica
AVTORRotovnik, Tomaž - avtor
ODGOVORNOSTHorvat, Bogomir - avtor // Kačič, Zdravko - avtor
NASLOVAnaliza uspešnosti razpoznavalnikov ISIP, HTK in Julius pri razpoznavanju tekočega slovenskega govora
V PUBLIKACIJIElektrotehniški vestnik. - ISSN 0013-5852. -ǂLetn. ǂ69, ǂšt. ǂ5 (2002), str. 253-258.
KRATKA VSEBINAV članku obravnavamo uporabo treh razpoznavalnikov, HTK, ISIP in Julius, pri razpoznavanju tekočega slovenskega govora. Najprej predstavimo splošno zasnovo razpoznavalnikov innjihove razlike. Nato opišemo postopek graditve sistema z velikim slovarjembesed za slovenski jezik, ki spada v skupino pregibnih jezikov. Pri tem uporabimo dva različna tipa jezikovnih modelov in slovarjev. Besedni jezikovni modeli, ki so najpogosteje uporabljeni pri modeliranju nepregibnih jezikov (npr. angleškega jezika) in so manj primerni za pregibne jezike, so uporabljeni za referenčni sistem. Kot novost smo za primerjavo uporabili podbesedne modele (osnova-končnica). Prednost le-teh je najbolj vidna pri razpoznavanju pregibnih jezikov. Na koncu podamo rezultate razpoznavanja. Primerjamo natančnost razpoznavanja, porabo ponmilnika in hitrost razpoznavanja. Rezultati kažejo na večjo uspešnost podbesednih modelov pri razpoznavanju tekočega slovenskega govora. Z razpoznavalnikom Julius smo dosegli najhitrejše razpoznavanje (3.6 RT) in pri tem ohranili natančnost razpoznavanja. // The paper is concerned with Slovenian continuous speech recognition and compares recognition results ofdifferent speech decoders (ISIP, HTK and Julius). Slovenian language is, like other Slavic languages, a highly inflectional language. Its rich morphology causes a big problem in Large Vocabulary Continuous Speech Recognition (LVCSR). Experiments have shown that to setup the Slovenian corpus a vocabulary of 600K words is needed, in order to achieve 99% of thetraining-set coverage in comparison with the English language, where 60K words achieve the same order of coverage. At the moment the most advanced systems for speech recognition can handie from 20K to 60K of words; thus the only possible thing to do in case of inflectional languages is to restrict the vocabulary size. However, the systems had been developed mainly for recognition of the English language, which is a poorly inflectional language. In this paper we will therefore present these systems in speech recognition based on an example of an inflectional language, the Slovenian. Our experiments were based on new language models (namely morphological models). Inflectional change to a word mostly affectsits ending, whereas the stem remains unchanged. Smaller lexical units (stems and endings) were used to solve the problem of high OOV rate [1]. Tocompare the decoders, we employed a two-pass decoding strategy for all of them. For the first recognition pass all three decoders used a derivative of standard time-synchronous Viterbi beam search decoder with a pre-compiled static recognition network [3]. Word-graph algorithm was applied to ISIP and HTK decoders at the second pass. Word graph is constructed as a result of keeping more than 1-best hypothesis at first pass. First pass defines possible word strings, which can be set as grammarconstraints in the second pass decoding. Julius decoder at the first pass works on the basis of word-trellis index method [9]. The second pass performs a best-first stack decoding search [10] in backward direction, using the word trellis index as both heuristics and word prediction. Decoder also enables the use of Phone-Tied Models (PTM) [11].Training of acoustical models was performed with SNABI speech database [2]. Acoustic models were based on Hidden Markov models (HMM) with three emitting states and a left-right topology [12]. An acoustic optimiser based on expectation-maximisation (EM) with Baum-Welch algorithm was used for rough parameter estimation. Basic procedure for building context dependent modelsis shown in Figure 1. Word internal triphone models with 16 Gaussians mixtures for each state and each model were built for all recognition decoders. Modified Perl script for training is described in [13]. Bigram, trigram, reversed trigram, and fourgram backoff language models were applied for recognition. Trigram, reversed trigram, and fourgram versions were used only for rescoring, whereas with the bigram language model we generated word graphs and a word trellis index. Words in pronunciation dictionary were evaluated from 20000 most common words in text corpora Večer. Their transcriptions were made automatically under basic grammaticalprinciples using 30 phones. For the second set of experiments, words were decomposed into smaller units, based on stems and endings. 8497 different basic units were obtained. Out-of-vocabulary rate was decreased as viewed in Table 1. The stem-ending vocabulary also contained 134 homographs from stems and 31 homographs from endings (Table 2). Homographs are words or parts of words with the same orthographical but different phonetic transcription. Decomposition of words into stems and endings is shown in Table 3. Recognition performed at the aub-word level in Slovenian language showed many advantages over the word-level recognition. The Julius decodes has met most of the requirements important for large vocabulary continuous speech recognition: accuracy (Figure 4), memory consumption (Figure 2) and real-time performance (Figure 3). With the sub-word level models these requirements as well as OOV rate are significantly improved.
OPOMBEPovzetek ; Abstract // Bibliografija: str. 258
OSTALI NASLOVIUsing HTK, ISIP and Julius decoders in Slovenian large vocabulary continuous speech recognition
PREDMETNE OZNAKE// razpoznavanje govora // modeliranje // modeli // iskalni algoritmi // slovenski jezik
UDK004.9:612.78

izvedba, lastnina in pravice: NUK 2010