Building Prosodic Structures in a Concept-to-Speech System Gerasimos Xydas, Dimitris Spiliotopoulos & Georgios Kouroupetroglou Speech Group Dep. of Informatics.

Slides:



Advertisements
Παρόμοιες παρουσιάσεις
Ancient Greek for Everyone: A New Digital Resource for Beginning Greek Unit 4: Conjunctions 2013 edition Wilfred E. Major
Advertisements

Greek Prepositions προθ έ σεις By Josh Parsley
2013 edition Wilfred E. Major
ΙΔΡΥΜΑ ΜΕΙΖΟΝΟΣ ΕΛΛΗΝΙΣΜΟΥ
2 Dec 2013 Ε. Π. Ανταγωνιστικότητα και Επιχειρηματικότητα (ΕΠΑΝ ΙΙ), ΠΕΠ Μακεδονίας – Θράκης, ΠΕΠ Κρήτης και Νήσων Αιγαίου, ΠΕΠ Θεσσαλίας – Στερεάς Ελλάδας.
1 Please include the following information on this slide: Παρακαλώ, συμπεριλάβετε τις παρακάτω πληροφoρίες στη διαφάνεια: Name Balafouti MariaWhich of.
SCHOOL YEAR Ms Kefallinou. Language A: Language and Literature is directed towards developing and understanding the constructed nature of meanings.
NT Greek Grammar (Macnair Ch. 1-4)
ΣYMBOΛIKOΣ ΥΠΟΛΟΓΙΣΜΟΣ. ΣYMBOΛIKOΣ ΥΠΟΛΟΓΙΣΜΟΣ - Παράδειγμα %polynomial (Expression, Variable) polynomial (X, X). polynomial (Term, X) :- number (Term).
C.W. Shelmerdine Introduction to Greek 2 nd edition (Newburyport, MA: Focus, 2008) Chapter 5.
Ancient Greek for Everyone: Unit 3: Greek Nouns supplement: Tips on Translating Greek into English GREK 1001 Fall 2013 M-Th 9:30-10:20 Coates 236 Wilfred.
2013 edition Wilfred E. Major
Business Process Management and Knowledge Toolkit
Some information about our place. Greece is a small country on the south of Europe. The peninsula, where Greece is located, is called Balkan.
Βάσεις Δεδομένων Ευαγγελία Πιτουρά 1 Distributed Database Systems.
Hellenic Ministry for the Environment, Spatial Planning and Public Works Greek Experience on the Implementation of IPPC Directive Alexandros Karavanas.
2013 edition Wilfred E. Major
Τεχνολογία ΛογισμικούSlide 1 Έλεγχος Καταψύκτη (Ada) Τεχνολογία ΛογισμικούSlide 39 with Pump, Temperature_dial, Sensor, Globals, Alarm; use Globals ; procedure.
Serious Games Purposes
Τι θα φέρει το Σύννεφο στη Διαχείριση Δεδομένων: Προκλήσεις και Ευκαιρίες Ελληνικό Συμπόσιο Διαχείρισης Δεδομένων 2010 Ευαγγελία Πιτουρά Τμήμα Πληροφορικής,
Εισαγωγικό Φροντηστήριο Διαχείριση Περιεχομένου Παγκόσμιου Ιστού και Γλωσσικά Eργαλεία.
Βάσεις Δεδομένων Ευαγγελία Πιτουρά 1 Data Mining.
Ancient Greek for Everyone: A New Digital Resource for Beginning Greek as taught at Louisiana State University Fall 2013 Richard Warga Unit 18: Vocative.
Η Συμμετοχή των Μεταναστών στη Δημόσια Ζωή: Εμπειρίες από την Ελλάδα και την Ευρώπη Immigrant Participation in Public Life: European & Greek experiences.
Πανεπιστήμιο Κύπρου – Τμήμα Πληροφορικής EPL602 Foundations of Web Technologies jQuery Mobile News Site Presented by: Christodoulos Michael Dimitris Stokkos.
1 Please include the following information on this slide: Παρακαλώ, συμπεριλάβετε τις παρακάτω πληροφoρίες στη διαφάνεια: Name Giannakodimou Aliki Kourkouta.
Developing Human Values Through the Cross-curricular Approach.
6 Η ΠΑΡΟΥΣΙΑΣΗ: ΠΑΝΤΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΟΙΝΩΝΙΚΩΝ ΚΑΙ ΠΟΛΙΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ: ΕΠΙΚΟΙΝΩΝΙΑΣ, ΜΕΣΩΝ ΚΑΙ ΠΟΛΙΤΙΣΜΟΥ ΜΑΘΗΜΑ: ΕΙΣΑΓΩΓΗ ΣΤΗ ΔΙΑΦΗΜΙΣΗ.
Week 11 Quiz Sentence #2. The sentence. λαλο ῦ μεν ε ἰ δότες ὅ τι ὁ ἐ γείρας τ ὸ ν κύριον Ἰ ησο ῦ ν κα ὶ ἡ μ ᾶ ς σ ὺ ν Ἰ ησο ῦ ἐ γερε ῖ κα ὶ παραστήσει.
Guide to Business Planning The Value Chain © Guide to Business Planning A principal use of value chain analysis is to identify a strategy mismatch between.
Μαθαίνω με “υπότιτλους”
Prepositions and Review
Αντικειμενοστραφής Προγραμματισμός ΙΙ
Αντικειμενοστραφής Προγραμματισμός ΙΙ
ΠΑΙΔΑΓΩΓΙΚΗ ΕΙΣΑΓΩΓΗ ΣΤΙΣ ΘΕΩΡΙΕΣ ΜΑΘΗΣΗΣ Μάριος Κουκουνάρας-Λιάγκης
Adjectives Introduction to Greek By Stephen Curto For Intro to Greek
Απ’ το ΚΕΔΔΥ στο ΚΕΔΔΥ Ξάνθη 21/3/2017.
Ποιος: ταυτότητα του ομιλητή Τι: φύση της γλώσσας (γνώση/ χρήση)
PLANNING SHEET: WRITE YOUR OWN GOTHIC STORY
ΙΟΝΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΤΜΗΜΑ ΑΡΧΕΙΟΝΟΜΙΑΣ – ΒΙΒΛΙΟΘΗΚΟΝΟΜΙΑΣ Μεταπτυχιακό Πρόγραμμα Σπουδών στην Επιστήμη της Πληροφορίας «Διοίκηση και Οργάνωση Βιβλιοθηκών.
Coherence.
GLY 326 Structural Geology
LEX Connector To introduce the new LEX wire-to-board, crimp style connector series for LED applications.
Taking the Initiative: Risk-Reduction Strategies and Decreased Malpractice Costs  Steven E. Raper, MD, JD, FACS, Deborah Rose, MBA, Mary Ellen Nepps, JD,
Monday, November 5 Turn in Federal Forms (2nd Period – Due Today!)
Typology of activities and scenarios using SRS, PELE and
BRC Science Highlight Chemical genomic-guided engineering of gamma-valerolactone(GVL)-tolerant yeast Objective GVL is a promising biomass deconstruction.
Overexpression of HORMAD1 induces the SAiCNA scar, chromosome instability, and changes in HR and NHEJ activity. Overexpression of HORMAD1 induces the SAiCNA.
Forming Public Opinion
Modeling Nosocomial Disease Outbreaks using Differential Equations and an Agent Based-Modeling Approach Cody FitzGerald, Adam Boucher Mathematics Department,
Objectives and structure of the workshop
Production of Supra-regular Spatial Sequences by Macaque Monkeys
Title of Poster in Calibri, Boldface, 60 Points
Overall Session Type, # and Title (i. e
West Windsor – Plainsboro Model United Nations Conference
Perkins V September 19, 2018 WTCS – Perkins V Planning Update
Linguistic Annotation Framework
3rd Meeting of the Eurostat Task Force on Goods sent abroad for processing 6 June 2012 Introduction to the issue Eurostat C1 National accounts methodology,
Matt Risley University of Cincinnati, OBAIS
An CNN-LSTM Attention Approach to Understanding User Query Intent from Online Health Communities Advisor: Jia-Ling, Koh Source: 2017 IEEE International.
To Teach Curricular Subjects
Chiltern Hills Academy
April News from Mrs. Barham
Unit 5: Working with Parents and Others in Early Years
Assignments, Expressions & Operators
Ton Spek Utrecht University The Netherlands Vienna-2019-GIG
Complements White Box Testing Finds a different class of errors
Inheritance and Polymorphism
SOURCE: IEEE TITLE: Software Defined Radio and Cognitive radio
The Case for the Study of the Effectiveness of Active Learning Strategies at Community Colleges David Lieberman, Department of Physics, CUNY/Queensborough.
Μεταγράφημα παρουσίασης:

Building Prosodic Structures in a Concept-to-Speech System Gerasimos Xydas, Dimitris Spiliotopoulos & Georgios Kouroupetroglou Speech Group Dep. of Informatics and Telecommunications University of Athens {gxydas, dspiliot, 1st Balkan Conference on Informatics Thessaloniki, November, 2003

Outline Prosody Concept-to-Speech system Concept-to-Speech system (SOLE-ML) Corpus Training the prosodic models Prosody prediction Conclusions

Prosody Prosodic events – Position and type of: Phrase breaks Pitch accents Phrase accents & Boundary tones – Prediction of the type and placement of the above.

Concept-to-Speech system Prosody generation Traditional Text-to-Speech systems handle plain text. Difficulties: – Statistical percentage failure (POS tagging, etc.) – Lack of underlying foci information – Only subset of intonation events identified and used

Concept-to-Speech system Concept-to-Speech systems handle abundance of information: Authoring component Natural language generator Speech synthesizer SOLE-ML XML MPIRO Authoring tool EXPRIMODEMOSTHeNES

Concept-to-Speech system Prosody generation Advantages: – Limited domain leading to concrete set of data – The NLG produces linguistically enriched texts (as opposed to plain text) – Error-free phrase and part-of-speech tagging – Use: Derive intonational focus points – Most importantly: Explore rhetorical relations in terms of prosody But: NLG systems usually deal with written text and fail to represent spoken language

C-t-S system (SOLE-ML) EXPRIMO  SOLE-ML  DEMOSTHeNES Represents – Enumerated word lists – Syntactic structure Phrase level (phrase type – sentence, NP, PP, etc.) Word level (part-of-speech – determiner, noun, verb, preposition, etc.) Punctuation, parentheses, etc. – Canned-text (portions of plain text, no extra information)

C-t-S system (SOLE-ML) cont. Error-free syntax information leads towards identification of intonational focus, but… … semantic, pragmatic issues affect also. EXPRIMO NLG contains valuable features inside the language generation stages, not supported by initial SOLE-ML specification. Thus, need for extension.

C-t-S system (SOLE-ML extended) Directly or indirectly imply emphasis. New specification (noun phrases): – Newness / given information: newness [new/old] – Number of times mentioned before: mentioned-count [integer] – Whether second argument to verb: arg2 [true/false] – Whether there is deixis: genitive-deixis, accusative-deixis [true/false] – Whether proper noun phrase: proper-group [true/false]

C-t-S system (SOLE-ML example) … που δημιουργήθηκε κατά τη διάρκεια της αρχαϊκής περιόδου … … …

Corpus (general) Training: 516 utterances, 5380 words, syllables Test: 1509 syllables Test data contains fair distribution of features of interest Male and female professional speakers.

Corpus (focus) FOCUS LEVELS: Strong focus:[newness=new] AND [arg2=true] AND [proper-group=true] AND [(genitive-deixis) OR (accusative-deixis)] Normal focus:[newness=old] AND [arg2=true] AND [proper-group=true] AND [(genitive-deixis) OR (accusative-deixis)] Weak focus:[newness=old]

Corpus (procedure) Text corpora annotated by DEMOSTHeNES XML export component. visualization: RTF format. Voice corpora segmented and hand-annotated using GRToBI by 3 expert linguists. Post-processing (groupings) – Pitch accents grouped (eliminate low frequency occurrences errors) – Phrase accents – boundary tones (co-occur in GRToBI) – Break indices (sandhi, mismatch, pause marks) – RESULT: a. more robust results, b. huge reduction of human annotation evaluation mismatch occurrences

Corpus (pitch accents) Feature accent 1 accent 2 accent 3 accent 4 accent 5 Main accentL*H*L*+HL+H*H*+L diacritics downstep!H*L*+!HL+!H*!H*+L weakwL*+H early>L*+H late<L*+H low pointwL* Occurrences %

Corpus (endtones) Feature endtone 1 endtone 2 endtone 3 endtone 4 endtone 5 endtone 6 endtone 7 endtone 8 Main toneL-H-L%H%L-L%L-H%H-L%H-H% Downstep diacritics !H-!H%L-!H%!H-L%!H-H% H-!H% !H-!H% Occurrences %

Corpus (break indices) Break indexOccurrences (%)

Training prosodic models Used Classification and Regression Trees, wagon software. Built 3 prosodic models: – Phrase break model (break indices assigned to syllables and placed at word boundaries) – Accent model (pitch accents assigned to stressed syllables) – Endtone model (end tones assigned to syllables and placed at phrase boundaries)

Training (features) 1/2 For each item plus two items before (p, pp) and two items after (n, nn), in Syllable, Word, and Phrase relation (40 parameters): Features (generic): – R:Sylstructure.parent.gpos (part-of-speech of word) – stress (lexical stress) – Syl_in (number of syllables since last phrase break) – Syl_out (number of syllables until next phrase break) – Ssyl_in (number of stressed syllables since last phrase break) – Ssyl_out (number of stressed syllables until next phrase break) – R:SylStructure.parent.R:Phrase.parent.punc (phrase punctuation) Features (SOLE specific) – R:SylStructure.parent.R:Phrase.parent.newness – R:SylStructure.parent.R:Phrase.parent.arg2 – R:SylStructure.parent.R:Phrase.parent.deixis Additional features: – R:SylStructure.parent.bi (break index) [Accent & Endtone models only] – accent [Endtone model only]

Training (features) 2/2 R:Sylstructure.parent.gpos tagset: VbVerB AjAdJective NoNoun AtArTicle CjConJuction PnProNoun PpPrePosition AdAdverb PtParticle

Training (phrase break) train0123ScoreCor. test / / / / Selected features: gpos, syl_in, syl_out, newness, deixis, stress, punc. Overall: %

Training (accent tone model) trainNONEL+H*L*+HH*+LH*L*ScoreCor. test NONE / L+H* / L*+H / H*+L / H* / L* / Selected features: gpos, syl_in, syl_out, bi, newness, arg2, deixis. Overall: %

Training (endtone model) trainNONEL-L%L-H%H-H%H-L-ScoreCor. test NONE / L-L% / L-H% / H-H% / H / L / Selected features: syl_in, bi, punc. Overall: %

Prosody prediction - example “Αυτό το έκθεμα είναι ένας στατήρας που δημιουργήθηκε κατά την διάρκεια της ελληνιστικής περιόδου.” [a - fto L+H* ] 1 – [to] 0 – [e H*+L - kTe – ma] 1 [i L*+H – ne] 1 – [e – nas] 0 – [sta - ti H*+L - ras] 2 - [H-] [pu] 0 – [Di - mi - u - rji L*+H - Ti – ce] 1 – [ka – ta] 0 – [ti] 0 – [Dja H* - rci – a] 1 [tis] 0 – [e - li - ni - sti - cis L*+H ] 1 – [pe - ri - o H*+L – Du] 3 - [L-L%]

Sample 1 exhibit11-1.xml Αυτό το έκθεμα είναι ένας στατήρας, που δημιουργήθηκε κατά τη διάρκεια της ελληνιστικής περιόδου. Χρονολογείται ανάμεσα στο 220 και το 189 π.Χ.. Στον εμπροσθότυπο του νομίσματος, κεφάλι Αθηνάς, μια δημοφιλής απεικόνιση στα νομίσματα του αρχαίου ελληνικού κόσμου, με κορινθιακό κράνος και στον οπισθότυπο, θηλυκή μορφή, προσωποποίηση της Αιτωλίας, καθισμένη σε μακεδονικές και γαλατικές ασπίδες. Η σκηνή αναφέρεται στη μάχη των Αιτωλών ενάντια στους Μακεδόνες και στους Γαλάτες. Αυτός ο στατήρας έχει φτιαχτεί από χρυσό και προέρχεται από την Αιτωλική Συμπολιτεία

Sample 2 exhibit25-1.xml Αυτό το έκθεμα είναι ένα ανάγλυφο, που δημιουργήθηκε κατά τη διάρκεια της ελληνιστικής περιόδου. Η ελληνιστική περίοδος καλύπτει το χρονικό διάστημα ανάμεσα στο 323 και το 31 π.Χ.. Αυτό το ανάγλυφο χρονολογείται ανάμεσα στο 313 και το 312 π.Χ.. Απεικονίζει το δαφνοστεφανωμένο Διόνυσο καθιστό και απέναντί του ένα σάτυρο να στέκεται κρατώντας οινοχόη. Στο επιστύλιο κρέμονται πέντε προσωπεία. Από αριστερά: το προσωπείο του δύστροπου πατέρα, της γριάς γυναίκας, του πονηρού δούλου, του αγένιου νέου και της νέας με την κοντή κόμη. Σήμερα αυτό το ανάγλυφο βρίσκεται στο Επιγραφικό Μουσείο Αθηνών.

Technology overview CONCEPT-TO-SPEECH SYSTEM: EXPRIMO natural language generator based on ILEX Domain data entered and updated through MPIRO Authoring Tool SOLE markup language facilitated linguistically enriched texts. DEMOSTHeNES Speech Composer system

Conclusion Concept-to-Speech system: – Enriched linguistic meta-information (XML, SOLE-ML) – Evidence of stress, intonational focus Corpus, CART training. Prosody models: – phrase breaks, – accent tone, – endtone. Results: – Large set of features-parameters, prediction improvement. – Restricted text allows belief that specific features useful for other texts. – Prosody models can be applied to plain text (large amount of untagged data).

University of Athens Speech group