Κατέβασμα παρουσίασης
Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε
ΔημοσίευσεTitania Papageorgiou Τροποποιήθηκε πριν 9 χρόνια
1
Language Models For Speech Recognition
2
Speech Recognition u : sequence of acoustic vectors uFind the word sequence so that: uThe task of a language model is to make available to the recognizer adequate estimates of the probabilities
3
Language Models u u u u
4
N-gram models uMake the Markov assumption that only the prior local context – the last (N-1) words – affects the next word uN=3trigrams uN=2bigrams uN=1 unigrams
5
Parameter estimation Maximum Likelihood Estimator uN=3trigrams uN=2bigrams uN=1 unigrams uThis will assign zero probabilities to unseen events
6
Number of Parameters uFor a vocabulary of size V, a 1-gram model has V-1 independent parameters uA 2-gram model has V 2 -1 independent parameters uIn general, an n-gram model has V n -1 independent parameters Typical values for a moderate size vocabulary of 20000 words are: ModelParameters 1-gram20000 2-gram20000 2 = 400 million 3-gram20000 3 = 8 trillion
7
Number of Parameters u|V|=60.000N=35M Eleftherotypia daily newspaper Count1-grams2-grams3-grams 1160.2733.877.97613.128.073 251.725784.0121.802.348 327.171314.114562.264 >0390.7965.834.63216.515.051 >=0390.79636x10 8 216x10 12 uIn a typical training text, roughly 80% of trigrams occur only once Good-Turing estimate: ML estimates will be zero for 37.5% of the 3-grams and for 11% of the 2-grams
8
Problems uData sparseness: we have not enough data to train the model parameters Solutions uSmoothing techniques: accurately estimate probabilities in the presence of sparse data –Good-Turing, Jelinek-Mercer (linear interpolation), Katz (backing-off) uBuild compact models: they have fewer parameters to train and thus require less data –equivalence classification of words (e.g. grammatical rules (noun, verb, adjective, preposition), semantic labels (city, name, date))
9
Smoothing uMake distributions more uniform uRedistribute probability mass from higher to lower probabilities
10
Additive Smoothing uFor each n-gram that occurs r times, pretend that it occurs r+1 times ue.gbigrams
11
Good-Turing Smoothing uFor any n-gram that occurs r times, pretend that it occurs r* times is the number of n-grams which occurs r times uTo convert this count to a probability we just normalize uTotal probability of unseen n-grams
12
Example r(=MLE)nrnr r * (=GT) 03.594.165.3680.001078 13.877.9760.404 2784.0121.202 3314.1142.238 4175.7203.187 5112.0064.199 678.3915.238 758.6616.270
13
u uGood-Turing uIntuitively Jelinek-Mercer Smoothing (linear interpolation) uInterpolate a higher-order model with a lower-order model uGiven fixed p ML, it is possible to search efficiently for the λ that maximize the probability of some data using the Baum-Welch algorithm
14
Katz Smoothing (backing-off) uFor those events which wave been observed in the training data we assume some reliable estimate of the probability uFor the remaining unseen events we back-off to some less specific distribution u is chosen so that the total probability sums to 1
15
Witten-Bell Smoothing uModel the probability of new events, estimating the probability of seeing such a new event as we proceed through the training corpus (i.e. the total number of word types in the corpus)
16
Absolute Discounting uSubtract a constant D from each nonzero count
17
Kneser-Ney Lower order distribution not proportional to to the number of occurrences of a word, but to the number of different words that it follows
18
Modified Kneser-Ney
19
Measuring Model Quality uConsider the language as an information source L, which emits a sequence of symbols w i from a finite alphabet (the vocabulary) uThe quality of a language model M can be judged by its cross entropy with regard to the distribution P T (x) of some hitherto unseen text T: uIntuitively speaking cross entropy is the entropy of T as “perceived” by the model M
20
Perplexity uPerplexity: uIn a language with perplexity X, every word can be followed be X different words with equal probabilities
21
Elements of Information Theory uEntropy uMutual Information pointwise Kullback-Leiblel (KL) divergence
22
The Greek Language Highly inflectional language uA Greek vocabulary of 220K words is needed in order to achieve 99.6% lexical coverage EnglishFrenchGreekGerman SourceWall Street Journal Le MondeEleytherotypiaFrankfurter Rundschau Corpus size37.2 M37.7 M35 M31.5 M Distinct words165 K280 K410 K500 K Vocabulary size60 K Lexical coverage99.6 %98.3 %96.5 %95.1 %
23
Perplexity EnglishFrenchGreekGerman Vocabulary Size20 K 64 K 2-gram PP198178232430 3-gram PP135119163336
24
Experimental Results 1M5M35M SmoothingPPWERPPWERPPWER Good-Turing34127.7124823.4816319.59 Witten-Bell35427.4225124.1716319.84 Absolute Discounting34428.4725624.2516920.78 Modified Kneser-Ney32826.7823721.9115618.57 1M5M35M OOV4.75%3.46%3.17%
25
Hit Rate hit rate % (1M)hit rate % (5M)hit rate % (35M) 1-gram27.316.47.4 2-gram52.549.940 3-gram20.233.752.6
26
Class-based Models uSome words are similar to other words in their meaning and syntactic function uGroup words into classes –Fewer parameters –Better estimates
27
Class-based n-gram models uSuppose that we partition the vocabulary into G classes uThis model produces text by first generating a string of classes g 1,g 2,…,g n and then converting them into the words w i, i=1,2,…n with probability p(w i |g i ) uAn n-gram model has V n -1 independent parameters (216x10 12 ) uA class-based model has G n -1+V-G parameters( 10 9 ) G n -1 of an n-gram model for a vocabulary of size G V-G of the form p(w i |g i )
28
Relation to n-grams
29
Defining Classes uManually –Use part-of-speech labels by linguistic experts or a tagger –Use stem information uAutomatically –Cluster words as part of an optimization method e.g. Maximize the log-likelihood of test text
30
Agglomerative Clustering uBottom-up clustering uStart with a separate cluster for each word uMerge that pair for which the loss in average MI is least
31
Example uSyntactical classes u verbs, past tense:άναψαν, επέλεξαν, κατέλαβαν, πλήρωσαν, πυροβόλησαν u nouns, neuter:άλογο, δόντι, δέντρο, έντομο, παιδί, ρολόι, σώμα u Adjectives, masculine:δημοκρατικός, δημόσιος, ειδικός, εμπορικός, επίσημος uSemantic classes u last names:βαρδινογιάννης, γεννηματάς, λοβέρδος, ράλλης u countries:βραζιλία, βρετανία, γαλλία, γερμανία, δανία u numerals:δέκατο, δεύτερο, έβδομο, εικοστό, έκτο, ένατο, όγδοο uSome not so well defined classes u ανακριβής, αναμεταδίδει, διαφημίσουν, κομήτες, προμήθευε u εξίσωση, έτρωγαν, και, μαλαισία, νηπιαγωγών, φεβρουάριος
32
Stem-based Classes u άγνωστ:άγνωστος, άγνωστου, άγνωστο, άγνωστον, άγνωστοι, άγνωστους, άγνωστη, άγνωστης, άγνωστες, άγνωστα, u βλέπ:βλέπω, βλέπεις, βλέπει, βλέπουμε, βλέπετε, βλέπουν u εκτελ: εκτελεί, εκτελούν, εκτελούσε, εκτελούσαν, εκτελείται, εκτελούνται u εξοχικ:εξοχικό, εξοχικά, εξοχική, εξοχικής, εξοχικές u ιστορικ:ιστορικός, ιστορικού, ιστορικό, ιστορικοί, ιστορικών, ιστορικούς, ιστορική, ιστορικής, ιστορικές, ιστορικά u καθηγητ:καθηγητής, καθηγητή, καθηγητές, καθηγητών u μαχητικ:μαχητικός, μαχητικού, μαχητικό, μαχητικών, μαχητική, μαχητικής, μαχητικά
33
Experimental Results GPP (1M)PP (5M)PP (35M) 1130914611503 133 (POS)104711431167 500--314 1000--266 2000--224 30000 (stem)383299215 60000328237156
34
Example uInterpolate class-based and word-based models
35
Experimental Results 1M5M35M GPPWERPPWERPPWER 133 (POS)32527.1123622.0015618.52 500----15118.63 1000----15018.61 2000----14918.65 30000 (stem)31926.9923222.0415418.44 6000032826.7823721.9115618.57
36
Hit Rate hit rate % (1M)hit rate % (5M)hit rate % (35M) 1-gram21.312.15.1 2-gram5650.437.6 3-gram22.737.657.4 hit rate % (1M)hit rate % (5M)hit rate % (35M) 1-gram27.316.47.4 2-gram52.549.940 3-gram20.233.752.6
37
Experimental Results 1M5M35M ModelPPWERPPWERPPWER ME 3gram33126.8323921.9415818.60 ME 3gram+stem32026.5422721.6614318.29 1M5M35M ModelPPWERPPWERPPWER BO 3gram32826.7823721.9115618.57 Interp. 3gram+stem31926.9923222.0415418.44
38
Where do we go from here? uUse syntactic information The dog on the hill barked uConstraints
Παρόμοιες παρουσιάσεις
© 2024 SlidePlayer.gr Inc.
All rights reserved.