An Introduction to the Taiwanese Tone Group Parser

An Introduction to the Taiwanese Tone Group Parser

Text-to-speech system is one of the applications of computing linguistics. There are three major components in Taiwanese text-to-speech system: a tone group parser, a speech synthesizer and a speech engine. The implementation of speech synthesizer or speech engine depends on programming technique while the Taiwanese tone group parser more relies on the transformation of linguistic expertise and the representation of artificial intelligence. According to Indirect Reference Hypothesis, Elisabeth Selkirk suggests that phonological rules be not sensitive to syntax directly; a prosodic structure is required as a medium to connect phonology and syntax . The phenomenon of prosodic structure in Taiwanese is particularly significant. In the process of language acquisition, Taiwanese children learn useful information about syntactic structure through tone sandhi and tone groups. Although the tone sandhi in the Taiwanese sentences is arbitrary, it indeed conducts the forming of tone group, the idiosyncratic language structure. It is reasonable to assume an efficiency tone sandhi mechanism was gradually built when more and more linguistic knowledge accumulated in children's brain. During the past decade, we have used default mark of tone form, default POS mark and mode mark to design a symbol system. The symbol system was promoted as a tool to convert language expertise as well as heuristic knowledge into a knowledge base for the implementation of the Taiwanese corpus and the tone sandhi processor.

Using ASUS S340 MC to test Taiwanese tone group parser for 113 words of Romanized Taiwanese text needs 1 second to complete the tone sandhi output. For testing 1050 words article, it takes 12 seconds to complete. After linking Taiwanese tone group parser and Taiwanese speech notepad, taking an article with 692 Taiwanese words as an input text, the output voice is 338 seconds, which takes 10 seconds to process. Input an article with 1711 Taiwanese words, and the output voice is 890 seconds, which takes 25 seconds. On average, about 33 seconds of speech can be synthesized per second.

The idea of using symbol system comes from the modification process of the tone sandhi processor that we built in earlier version. The symbol system was promoted as a tool to convert language expertise as well as heuristic knowledge into a knowledge base for the implementation of the Taiwanese corpus and the tone sandhi processor.The symbol system was used for converting the Taiwanese linguistic knowledge and heuristics into the knowledge base so that it can be applied by rules to solve the problems of the Taiwanese allotone selection. The symbol system must be precise enough to offer a tool for the Taiwanese experts to tag tokens in the corpus.

The current symbol system consists of default mark of tone form, default POS mark and mode mark. Each record in the corpus was tagging with a symbol that includes these three attributes. This symbol was used to connect the corpus and the tone sandhi processor to pick up accuracy tone form among possible tone forms for each word in a sentence. By means of the symbol system and rule inference, homonyms or multiple-POS words such as ti7 or be2 in (1) and (2) can be assigned accuracy tone form through tone sandhi processing.
(1) Ti7 (chopstick, noun, lexical tone) khng3 ti7 (at/on/in, preposition, sandhi tone) ti7-lang7 (chopstick case, noun, sandhi tone) lai7. (The chopsticks are in the chopstick case.)
(2) Tsit tsiah be2 (horse, noun, lexical tone) be2 (buy, verb, sandhi tone) beh kah goo7-ban7. (The horse was bought for 50000 bucks.)

Sometimes the symbol system can not deal with the allotone selections of some words. For example, In (3) and (4), the word ke (chicken /more than) represented distinct part of speech (POS) and meanings with different tone forms.
(3) Tsit tsiah (sandhi tone) ke (lexical tone, noun) tsit8 kong kin. (The chicken is one kg.)
(4) Tsit tsiah (lexical tone) ke (sandhi tone, verb) tsit8 kong kin. (The chicken is one kg more.)
The case reminds us that we can use human parser to make syntactic analysis or allotone selection with the Chinese homonyms or context. However, to the computer, the Romanized Taiwanese sentences (3) and (4) are identical. It is impossible to make syntactic analysis and allotone selection unless the autonomous semantic mapping is made. And that should be acted by a strong AI machine.

In practice, the method of using linguistic theory to implement the Taiwanese tone group parser is a way to apply engineering technique to build the experiment environment of simulation for language acquisition. Our exploration to solve the problem of allotone selection by integrating traditional knowledge representation technique as well as the attributes analysis method of words witnessed AI development tools could be the help for us to understand the process of human language acquisition.

The tone group derived from the Taiwanese tone sandhi is not only a unique prosoic unit, but also a natural constituent of a syntactic unit that can be said to be a gem of human language. Using Taiwanese as a medium language together with the use of Taiwanese tone groups as semantic and syntactic units to build a multilingual corpus (e.g. Taiwanese, Japanese, Chinese, English, etc.) have been proven to improve the accuracy of translations between languages.

Using Tacotron2+Waveglow to implement a Taiwanese text-to-speech system can get natural speech output like human voice. However, the lower tone sandhi accuracy rate and lower synthesis speed are the weakness that can not be ignored. The latest version of Taiwanese tone group parser with the real tone output text that was hooked up to Taiwanese Speech Notepad has better tone sandhi accuracy rate and higher performance. For example, taking an article with 692 Taiwanese words as input file, the output voice is 338 seconds, which takes 10 seconds. Input an article with 1711 Taiwanese words, and the output voice is 890 seconds, which takes 25 seconds. On average, about 33 seconds of speech can be synthesized per second. The average tone sandhi accuracy may reach 98%. On the other hand, instead of using Romanized Taiwanese, using real tone text for input text and training materials is an effective way to solve the tone sandhi problem in the implement of Taiwanese text-to-speech system using deep learning methods. In practical, the Taiwanese tone group parser is essentially an automatic tagging program for Taiwanese tone form that can be applied to Taiwanese text-to-speech systems.

We sincerely hope that more people will participate in the study of Taiwanese tone group. A work-in-process version of artificial tone group parser that includes a knowledge base and an executable program file for Microsoft Windows system (XP/Win7) can be download for evaluation. Download Taiwanese Tone Group Parser (If you cannot download, please contact the author by e-mail)

download IJCLCLP paper

[ Works in Taiwanese | Taiwanese Speech Notepad | Home | Original Programs ]