Language Models For Speech Recognition. Speech Recognition u : sequence of acoustic vectors uFind the word sequence so that: uThe task of a language model.

Slides:



Advertisements
Παρόμοιες παρουσιάσεις
Ancient Greek for Everyone: A New Digital Resource for Beginning Greek Unit 4: Conjunctions 2013 edition Wilfred E. Major
Advertisements

Γειά σας. Say: take a pencil. Πάρε ένα μολύβι. Nick, give me my book.
“ Ἡ ἀ γάπη ἀ νυπόκριτος. ἀ ποστυγο ῦ ντες τ ὸ πονηρόν, κολλώμενοι τ ῷ ἀ γαθ ῷ, τ ῇ φιλαδελφί ᾳ ε ἰ ς ἀ λλήλους φιλόστοργοι, τ ῇ τιμ ῇ ἀ λλήλους προηγούμενοι.
Translation Tips LG New Testament Greek Fall 2012.
ΗΥ Παπαευσταθίου Γιάννης1 Clock generation.
Ancient Greek for Everyone: A New Digital Resource for Beginning Greek Unit 3 part 2: Feminine Nouns 2015 edition Wilfred E. Major
Week 11 Quiz Sentence #2. The sentence. λαλο ῦ μεν ε ἰ δότες ὅ τι ὁ ἐ γείρας τ ὸ ν κύριον Ἰ ησο ῦ ν κα ὶ ἡ μ ᾶ ς σ ὺ ν Ἰ ησο ῦ ἐ γερε ῖ κα ὶ παραστήσει.
Πολυώνυμα και Σειρές Taylor 1. Motivation Why do we use approximations? –They are made up of the simplest functions – polynomials. –We can differentiate.
1 BrowseRank: Letting Web Users Vote for Page Importance SIGIR 2008 Best Student Paper Award.
Lesson 6c: Around the City I JSIS E 111: Elementary Modern Greek Sample of modern Greek alphabet, M. Adiputra,
Αριθμητική Επίλυση Διαφορικών Εξισώσεων 1. Συνήθης Δ.Ε. 1 ανεξάρτητη μεταβλητή x 1 εξαρτημένη μεταβλητή y Καθώς και παράγωγοι της y μέχρι n τάξης, στη.
ΕΥΡΩΠΑΪΚΑ ΣΧΟΛΕΙΑ. SCHOOLS OF EUROPEAN EDUCATION.
ΔΕΥΤΕΡΟ ΣΕΜΙΝΑΡΙΟ ΕΠΙΜΟΡΦΩΤΩΝ ΑΘΗΝΑ, ΣΕΠΤΕΜΒΡΙΟΣ 2011 Ο.ΕΠ.ΕΚ Αρχική Συνεδρία Γ. Τύπας, Σύμβουλος Παιδαγωγικού Ινστιτούτου και μέλος του Δ.Σ. του Ινστιτούτου.
1 Αποτελέσματα κλάδου – ‘Α τρίμηνο 2015 Το α’ τρίμηνο του 2015 ο κλάδος παρουσιάζει τάσεις σταθεροποίησης στα έσοδα του ενώ οι επενδύσεις αυξάνονται με.
Introduction to Latent Variable Models. A comparison of models X1X1 X2X2 X3X3 Y1Y1 δ1δ1 δ2δ2 δ3δ3 Model AModel B ξ1ξ1 X1X1 X2X2 X3X3 δ1δ1 δ2δ2 δ3δ3.
Chapter 1(a) What I expect you to know…. Vocabulary Verbs: ̉έστι(ν), λέϒει, οι̉κει̂, πονει̂, ϕιλει̂, χαίρει Nouns: ο͑ α̉ργός, ο͑ ά̉νθρωπος, ο͑ αυ̉τουργός,
Guide to Business Planning The Value Chain © Guide to Business Planning A principal use of value chain analysis is to identify a strategy mismatch between.
Guide to Business Planning The Value System © Guide to Business Planning The “value system” is also referred to as the “industry value chain”. In contrast.
Μαθαίνω με “υπότιτλους”
Αντισταθμιστική ανάλυση
Αντικειμενοστραφής Προγραμματισμός ΙΙ
Ερωτήσεις –απαντήσεις Ομάδων Εργασίας
Φάσμα παιδαγωγικής ανάπτυξης
Jane Austen Pride and Prejudice (περηφάνια και προκατάληψη)
JSIS E 111: Elementary Modern Greek
ΠΑΙΔΑΓΩΓΙΚΗ ΕΙΣΑΓΩΓΗ ΣΤΙΣ ΘΕΩΡΙΕΣ ΜΑΘΗΣΗΣ Μάριος Κουκουνάρας-Λιάγκης
Matrix Analytic Techniques
Ψηφιακeς ιδEες και αξIες
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ
Class X: Athematic verbs II
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ
ΤΜΗΜΑ ΔΙΟΙΚΗΣΗΣ ΕΠΙΧΕΙΡΗΣΕΩΝ
Adjectives Introduction to Greek By Stephen Curto For Intro to Greek
(ALPHA BANK – EUROBANK – PIRAEUS BANK)
ECTS-DS Labels Selection 2011 Αθήνα, 28/11/2011
Ποιος: ταυτότητα του ομιλητή Τι: φύση της γλώσσας (γνώση/ χρήση)
Γλωσσική ανάπτυξη στη Γ2
2013 edition Wilfred E. Major
Μία πρακτική εισαγωγή στην χρήση του R
ΥΠΟΥΡΓΕΙΟ ΠΑΙΔΕΙΑΣ ΚΑΙ ΠΟΛΙΤΙΣΜΟΥ
ΑΓΓΛΙΚΑ Ε’ ΔΗΜΟΤΙΚΟΥ English 5th Grade -Writing Activities-
Find: φ σ3 = 400 [lb/ft2] CD test Δσ = 1,000 [lb/ft2] Sand 34˚ 36˚ 38˚
aka Mathematical Models and Applications
GLY 326 Structural Geology
Find: angle of failure, α
ΤΙ ΕΙΝΑΙ ΤΑ ΜΟΆΙ;.
Find: ρc [in] from load γT=110 [lb/ft3] γT=100 [lb/ft3]
Find: ρc [in] from load γT=106 [lb/ft3] γT=112 [lb/ft3]
Class V: Personal Pronouns and 3rd Decl. Cont. (chs )
Class X: Verbal Roots and Imperfect © Dr. Esa Autero
τ [lb/ft2] σ [lb/ft2] Find: c in [lb/ft2] σ1 = 2,000 [lb/ft2]
Find: Force on culvert in [lb/ft]
Διάλεξη #10: Εκτέλεση Java χωρίς το BlueJ
Class IV Aorist Adverbial Participle © Dr. Esa Autero
Υπο-τύποι και πολυμορφισμός
ΜΥΕ003: Ανάκτηση Πληροφορίας
3Ω 17 V A3 V3.
JSIS E 111: Elementary Modern Greek
2013 edition Wilfred E. Major
Find: ρc [in] from load (4 layers)
Εθνικό Μουσείο Σύγχρονης Τέχνης Faceforward … into my home!
CPSC-608 Database Systems
Noun Inflection.
Erasmus + An experience with and for refugees Fay Pliagou.
Review.
Class X: Athematic verbs II © Dr. Esa Autero
Nominative & Accusative Definite Article
Chapter 34 Indicative of δίδωμι.
Μεταγράφημα παρουσίασης:

Language Models For Speech Recognition

Speech Recognition u : sequence of acoustic vectors uFind the word sequence so that: uThe task of a language model is to make available to the recognizer adequate estimates of the probabilities

Language Models u u u u

N-gram models uMake the Markov assumption that only the prior local context – the last (N-1) words – affects the next word uN=3trigrams uN=2bigrams uN=1 unigrams

Parameter estimation Maximum Likelihood Estimator uN=3trigrams uN=2bigrams uN=1 unigrams uThis will assign zero probabilities to unseen events

Number of Parameters uFor a vocabulary of size V, a 1-gram model has V-1 independent parameters uA 2-gram model has V 2 -1 independent parameters uIn general, an n-gram model has V n -1 independent parameters Typical values for a moderate size vocabulary of words are: ModelParameters 1-gram gram = 400 million 3-gram = 8 trillion

Number of Parameters u|V|=60.000N=35M Eleftherotypia daily newspaper Count1-grams2-grams3-grams > >= x x10 12 uIn a typical training text, roughly 80% of trigrams occur only once Good-Turing estimate: ML estimates will be zero for 37.5% of the 3-grams and for 11% of the 2-grams

Problems uData sparseness: we have not enough data to train the model parameters Solutions uSmoothing techniques: accurately estimate probabilities in the presence of sparse data –Good-Turing, Jelinek-Mercer (linear interpolation), Katz (backing-off) uBuild compact models: they have fewer parameters to train and thus require less data –equivalence classification of words (e.g. grammatical rules (noun, verb, adjective, preposition), semantic labels (city, name, date))

Smoothing uMake distributions more uniform uRedistribute probability mass from higher to lower probabilities

Additive Smoothing uFor each n-gram that occurs r times, pretend that it occurs r+1 times ue.gbigrams

Good-Turing Smoothing uFor any n-gram that occurs r times, pretend that it occurs r* times is the number of n-grams which occurs r times uTo convert this count to a probability we just normalize uTotal probability of unseen n-grams

Example r(=MLE)nrnr r * (=GT)

u uGood-Turing uIntuitively Jelinek-Mercer Smoothing (linear interpolation) uInterpolate a higher-order model with a lower-order model uGiven fixed p ML, it is possible to search efficiently for the λ that maximize the probability of some data using the Baum-Welch algorithm

Katz Smoothing (backing-off) uFor those events which wave been observed in the training data we assume some reliable estimate of the probability uFor the remaining unseen events we back-off to some less specific distribution u is chosen so that the total probability sums to 1

Witten-Bell Smoothing uModel the probability of new events, estimating the probability of seeing such a new event as we proceed through the training corpus (i.e. the total number of word types in the corpus)

Absolute Discounting uSubtract a constant D from each nonzero count

Kneser-Ney  Lower order distribution not proportional to to the number of occurrences of a word, but to the number of different words that it follows

Modified Kneser-Ney

Measuring Model Quality uConsider the language as an information source L, which emits a sequence of symbols w i from a finite alphabet (the vocabulary) uThe quality of a language model M can be judged by its cross entropy with regard to the distribution P T (x) of some hitherto unseen text T: uIntuitively speaking cross entropy is the entropy of T as “perceived” by the model M

Perplexity uPerplexity: uIn a language with perplexity X, every word can be followed be X different words with equal probabilities

Elements of Information Theory uEntropy uMutual Information pointwise  Kullback-Leiblel (KL) divergence

The Greek Language  Highly inflectional language uA Greek vocabulary of 220K words is needed in order to achieve 99.6% lexical coverage EnglishFrenchGreekGerman SourceWall Street Journal Le MondeEleytherotypiaFrankfurter Rundschau Corpus size37.2 M37.7 M35 M31.5 M Distinct words165 K280 K410 K500 K Vocabulary size60 K Lexical coverage99.6 %98.3 %96.5 %95.1 %

Perplexity EnglishFrenchGreekGerman Vocabulary Size20 K 64 K 2-gram PP gram PP

Experimental Results 1M5M35M SmoothingPPWERPPWERPPWER Good-Turing Witten-Bell Absolute Discounting Modified Kneser-Ney M5M35M OOV4.75%3.46%3.17%

Hit Rate hit rate % (1M)hit rate % (5M)hit rate % (35M) 1-gram gram gram

Class-based Models uSome words are similar to other words in their meaning and syntactic function uGroup words into classes –Fewer parameters –Better estimates

Class-based n-gram models uSuppose that we partition the vocabulary into G classes uThis model produces text by first generating a string of classes g 1,g 2,…,g n and then converting them into the words w i, i=1,2,…n with probability p(w i |g i ) uAn n-gram model has V n -1 independent parameters (216x10 12 ) uA class-based model has G n -1+V-G parameters( 10 9 ) G n -1 of an n-gram model for a vocabulary of size G V-G of the form p(w i |g i )

Relation to n-grams

Defining Classes uManually –Use part-of-speech labels by linguistic experts or a tagger –Use stem information uAutomatically –Cluster words as part of an optimization method e.g. Maximize the log-likelihood of test text

Agglomerative Clustering uBottom-up clustering uStart with a separate cluster for each word uMerge that pair for which the loss in average MI is least

Example uSyntactical classes u verbs, past tense:άναψαν, επέλεξαν, κατέλαβαν, πλήρωσαν, πυροβόλησαν u nouns, neuter:άλογο, δόντι, δέντρο, έντομο, παιδί, ρολόι, σώμα u Adjectives, masculine:δημοκρατικός, δημόσιος, ειδικός, εμπορικός, επίσημος uSemantic classes u last names:βαρδινογιάννης, γεννηματάς, λοβέρδος, ράλλης u countries:βραζιλία, βρετανία, γαλλία, γερμανία, δανία u numerals:δέκατο, δεύτερο, έβδομο, εικοστό, έκτο, ένατο, όγδοο uSome not so well defined classes u ανακριβής, αναμεταδίδει, διαφημίσουν, κομήτες, προμήθευε u εξίσωση, έτρωγαν, και, μαλαισία, νηπιαγωγών, φεβρουάριος

Stem-based Classes u άγνωστ:άγνωστος, άγνωστου, άγνωστο, άγνωστον, άγνωστοι, άγνωστους, άγνωστη, άγνωστης, άγνωστες, άγνωστα, u βλέπ:βλέπω, βλέπεις, βλέπει, βλέπουμε, βλέπετε, βλέπουν u εκτελ: εκτελεί, εκτελούν, εκτελούσε, εκτελούσαν, εκτελείται, εκτελούνται u εξοχικ:εξοχικό, εξοχικά, εξοχική, εξοχικής, εξοχικές u ιστορικ:ιστορικός, ιστορικού, ιστορικό, ιστορικοί, ιστορικών, ιστορικούς, ιστορική, ιστορικής, ιστορικές, ιστορικά u καθηγητ:καθηγητής, καθηγητή, καθηγητές, καθηγητών u μαχητικ:μαχητικός, μαχητικού, μαχητικό, μαχητικών, μαχητική, μαχητικής, μαχητικά

Experimental Results GPP (1M)PP (5M)PP (35M) (POS) (stem)

Example uInterpolate class-based and word-based models

Experimental Results 1M5M35M GPPWERPPWERPPWER 133 (POS) (stem)

Hit Rate hit rate % (1M)hit rate % (5M)hit rate % (35M) 1-gram gram gram hit rate % (1M)hit rate % (5M)hit rate % (35M) 1-gram gram gram

Experimental Results 1M5M35M ModelPPWERPPWERPPWER ME 3gram ME 3gram+stem M5M35M ModelPPWERPPWERPPWER BO 3gram Interp. 3gram+stem

Where do we go from here? uUse syntactic information The dog on the hill barked uConstraints