Prosody, Speech Synthesis and Cognitive Linguistics

Please download to get full document.

View again

of 4
5 views
PDF
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Prosody, Speech Synthesis and Cognitive Linguistics
Document Share
Document Tags
Document Transcript
  498 Prosody, Speech Synthesis and Cognitive Linguistics  Tomasz Kuczmarski,  Adam Mickiewicz University, Poznań, Poland  Abstract  The current paper discusses a way to employ methods and utilities provided by computationalcognitive linguistics in speech technology research.First, the currently predominant methodologies arebriefly discussed. It is shown that some problemsregarding the naturalness of synthetic voices, such asprosody, are intractable within the ruling framework. Therefore, a special case of Construction Grammar(CxG) – Embodied Construction Grammar isproposed as a method to investigate spoken languageholistically. It is suggested that this approach mightshow some prosodic dependencies which can hardly be studied in another way. At the end of the articlesome intuitions about the anticipated results arecovered. 1.   Introduction Speech synthesis research began in 1779, whenChristian Kratzenstein created bellow-operatedmodels of vocal tracts that could produce one of thefollowing long vowels: [a ː  ], [e ː  ], [i ː  ], [o ː  ] and [u ː  ].Since then, the problem of artificial speech hasgained much attention and has been confrontedfrom many viewpoints, conforming to currentscientific trends. Many paradigms and resulting synthesizer technologies arose on grounds of numerous linguistic theories.Linguistics have recently experienced yet anothernotable paradigm shift, a shift toward a holisticcognitive approach to language, started by Lakoff [1][2], Fillmore [3] and Langacker [4]. Cognitivelinguistics has proven to provide better explanationto problems which the predominant Chomskyanschool considered as an irrelevant part of the  performance  [5]. The only field of linguistics that stayed relatively indifferent is the speech technology. In currentspeech synthesis research, most of the attention isgiven to the latter part of the abstract-to-physicalscheme. This so-called low level speech synthesis isan orthodoxy started by the discovery of acousticcues for the perception of phonetic segments and by the work of Klatt [6]. Most of the field was takenover by engineers who are on an everlasting questfor segmental „intelligibility” and utility of finalsystems.Nowadays, most state-of-the-art synthesissystems are able to generate voice of near-humanquality. However, there is still much to be done interms of naturalness. This in turn, needs solving problems which are intractable within the currently predominant paradigm. Prosody, being the mostimportant, is influenced by all aspects of language,including pragmatics. It also leans toward theabstract part of the abstract-to-physical rendering scheme. Therefore, prosody should not only be asubject of a holistic research but it also needs a highlevel speech synthesis framework. The recently emerged computational cognitive linguistics providesthe necessary methods and tools. 2.   Methods  Among the tools and methods offered by computational cognitive linguistics, frame semantics[7] and construction grammar (CxG) [8] [9] [10] [11]seem to be the most promising in terms of prosody research. Construction grammar is a family of theories and models of grammar. It constitutes amature holistic and usage-based framework thattreats all types of expressions and dimensions of language (syntax, semantics, pragmatics, discourse,morphology and phonology) as equally central tocapturing grammatical patterning. Within this theory a grammatical construction is apairing of form and content. This approachcorresponds to the foundation of general semiotics. The form is any combination of syntactic,morphological, or prosodic patterns, whereasmeaning is understood in terms of lexical semantics,pragmatics and discourse structure. Therefore, agrammar consists of intricate networks of overlapping and complementary patterns used forencoding and decoding of linguistic expressions. XIII International PhD WorkshopOWD 2011, 22–25 October 2011    499 Embodied Construction Grammar (ECG) is aspecial case of CxG, developed at ICSI, UCBerkeley, and the University of Hawaii within the The Neural Theory of Language (NTL) project – aninterdisciplinary research effort to answer thequestion: How does the brain compute the mind? Inthis formalism designed specifically for integrationinto a simulation-based model of languageunderstanding, conceptual representations are alsoconstrained to be grounded in the body’s perceptualand motor systems, and more precisely toparametrize mental simulations using those systems.Understanding an utterance thus involves at leasttwo distinct processes: analysis to determine whichconstructions the utterance instantiates, andsimulation according to the parameters specified by those constructions [12]. The so called topic-comment articulation and itsrelation to prosody has been widely studied by thePrague school [13] and has shown some interesting dependencies. However, little attention has beengiven to a multi-level investigation of coherence of prosodic structures and other dimensions of language, such as the predication structure of anutterance and the distribution of predicate and itsarguments within that utterance. It is suspected thatthis approach could yield promising results.Such research requires a number of preparations.First of all, a dedicated speech corpora should bebuilt. The problem here is that it was indirectly assumed that only „neutral” prosody is to beexamined. There is an ongoing discussion whethersuch „neutral” prosody exists at all. The opponentsargue that all speech is uttered in some context, which triggers unpredictable emotional responsesand variable pragmatic decisions during utteranceplanning. All this affects the resulting prosody.Proponents turn to various means in search of arguments. For example, results of the brainactivation study showed that recognizing emotionalintonations and discriminating their expressivenessleads to a predominant activation of the righthemisphere (RH) with right frontal preponderanceonly in the absence of linguistic task demands. Innerspeech performed in addition and concurrently withthe identification/discrimination task gave rise to abalanced RH/LH activation pattern with left frontalpreponderance. [14]. This might suggest that“emotional” prosody is only superimposed on the“neutral” prosody and may be somehow “filtered”out.One way of dealing with this problem would beto record the whole speech corpora in constantconditions, using a strictly defined scenario (reading a text aloud to some audience in case of applicationin text-reader systems) in as few number of sessionsas possible.Once the speech corpora is ready, a well definedECG and the underlying lexical semantics should bebuilt for the language in question, to serve as a parserfor input utterances and their initial annotations. The final step is to analyze the speech corpora interms of its ECG structures and theircorrespondence to prosodic contours. This might beattained by employing Artificial Neural Networks orHidden Markov Models. However, an in-depthexpert analysis performed by a human isindispensable.  Fig.1. Overview of the simulation-based language understanding model, consisting of two primary processes:analysis and simulation.  500 3.   Anticipated results anddiscussion  This unifying approach to prosody analysis seems very promising. However, at this point it is hard toexpect any definite results except for somecorrelations on the level of predicate structure of anutterance, which is a somewhat derivative of thePrague school's research mentioned above.Even if some results are attained, the gapbetween abstract and physical, phonological andphonetic is still to be filled. Since there is no linearrelation between the constituents of prosody;rhythm, stress and intonation, and their physical orphonetic counterparts; amplitude, timing andfundamental frequency (f0) of speech signal, even agood model of prosody might turn out to be uselessuntil the transition between phonology andphonetics in the process of speech production is welldefined. However, the cognitive framework mightbecome useful also in this case, as was proven by some investigation in cognitive phonetics [15] [16].  Fig.2. Overview of constructs in ECG. Bibliography 1.   [1] Lakoff G., Johnson M.:  Metaphors We Live By  ,1980, Chicago2.   [2] [11] Lakoff, G.: Women, Fire and Dangerous Things. What Categories Reveal about the Mind  , 1987,Chicago, The Chicago University Press3.   [3] [9] Fillmore, C., Kay, P., O’Connor, C. Regularity and idiomaticity: The case of 'let alone'  ,Language, 64-3: 501-538, 19884.   [4] Langacker, Ronald W.:  A View of Linguistic Semantics, Topics in Cognitive Linguistics  , 1988, New  York/Amsterdam: John Benjamins, 49-905.   [5] Chomsky, N.:  Aspects of the Theory of Syntax  ,Cambridge, MA: MIT Press, 19656.   [6] Allen, J., Hunnicutt, M., Klatt, D.: From Text toSpeech: The MITalk system  , Cambridge University Press, 19877.   [7] Fillmore, Charles J.: The Case for Case, Universals in Linguistic Theory  , 1968, New York: Holt,Rinehart, and Winston, 1-88.8.   [8] Goldberg, A. E.: Constructions: A construction  grammar approach to argument structure  , 1995,Chicago: University of Chicago Press9.   [10] Kay, P., Fillmore, C.J.: Construction Grammar and Linguistic Generalizations: The What’s X Doing Y? Construction  , Language 75: 1-33, 199910.   [12] Bergen, B. K., Chang, N. C.:  Embodied Construction Grammar in Simulation-Based Language Understanding  , ICSI Technical Report TR-02-004,February 2002  501 11.   [13] Hajicova, E.: Topic, focus and negation,Proceedings of Focus and natural language processing  , P.Bosch and R. van der Sandt (eds.), 1995,Heidelberg: IBM Deutschland12.   [14] Pihan, H., Altenmuller, E., Hertrick,I., Ackermann, H.: Cortical activation patterns of affective speech processing depend on concurrent demands on the subvocal rehearsal system: a DC-potential study  , Brain,200013.   [15] [16] Tatham, M., Morton, K.: In Honor of Ilse Lehiste  , eds. Robert Channon and Linda Shockey,1987, Foris Publications, Dordrecht Author:   MA Tomasz Kuczmarski Adam Mickiewicz University ul. Wieniawskiego 161-712 Poznań, Polandtel. (048)509079785e-mail: tkucz@amu.edu.pl
Search Related
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks