An Introduction to the Taiwanese Tone Group Parser
[Click here for Chinese Version]
An Introduction to the Taiwanese Tone Group Parser
Text-to-speech system is one of the applications of computing linguistics. There are three major components in Taiwanese text-to-speech system: a tone group parser, a speech synthesizer and a speech engine. The implementation of speech synthesizer or speech engine depends on programming technique while the Taiwanese tone group parser more relies on the transformation of linguistic expertise and the representation of artificial intelligence. According to Indirect Reference Hypothesis, Elisabeth Selkirk suggests that phonological rules be not sensitive to syntax directly; a prosodic structure is required as a medium to connect phonology and syntax . The phenomenon of prosodic structure in Taiwanese is particularly significant. In the process of language acquisition, Taiwanese children learn useful information about syntactic structure through tone sandhi and tone groups. Although the tone sandhi in the Taiwanese sentences is arbitrary, it indeed conducts the forming of tone group, the idiosyncratic language structure. It is reasonable to assume an efficiency tone sandhi mechanism was gradually built when more and more linguistic knowledge accumulated in children°¶s brain. During the past decade, we have used default mark of tone form, default POS mark and mode mark to design a symbol system. The symbol system was promoted as a tool to convert language expertise as well as heuristic knowledge into a knowledge base for the implementation of the Taiwanese corpus and the tone sandhi processor.
The idea of using symbol system comes from the modification process of the tone sandhi processor that we built in earlier version. The symbol system was promoted as a tool to convert language expertise as well as heuristic knowledge into a knowledge base for the implementation of the Taiwanese corpus and the tone sandhi processor.The symbol system was used for converting the Taiwanese linguistic knowledge and heuristics into the knowledge base so that it can be applied by rules to solve the problems of the Taiwanese allotone selection. The symbol system must be precise enough to offer a tool for the Taiwanese experts to tag tokens in the corpus.
The current symbol system consists of default mark of tone form, default POS mark and mode mark. Each record in the corpus was tagging with a symbol that includes these three attributes. This symbol was used to connect the corpus and the tone sandhi processor to pick up accuracy tone form among possible tone forms for each word in a sentence. By means of the symbol system and rule inference, homonyms or multiple-POS words such as ti7 or be2 in (1) and (2) can be assigned accuracy tone form through tone sandhi processing.
(1) Ti7 (chopstick, noun, lexical tone) khng3 ti7 (at/on/in, preposition, sandhi tone) ti7-lang7 (chopstick case, noun, sandhi tone) lai7. (The chopsticks are in the chopstick case.)
(2) Tsit tsiah be2 (horsae, noun, lexical tone) be2 (buy, verb, sandhi tone) beh kah goo7-ban7. (The horse was bought for 50000 bucks.)
Sometimes the symbol system can not deal with the allotone selections of some words. For example, In (3) and (4), the word °ßke (chicken /more than)°® represented distinct part of speech (POS) and meanings with different tone forms.
(3) Tsit tsiah (sandhi tone) ke (lexical tone, noun) tsit8 kong kin. (The chicken is one kg.)
(4) Tsit tsiah (lexical tone) ke (sandhi tone, verb) tsit8 kong kin. (The chicken is one kg more.)
The case reminds us that we can use human parser to make syntactic analysis or allotone selection with the Chinese homonyms "¬Ż/•[" or context. However, to the computer, the Romanized Taiwanese sentences (3) and (4) are identical. It is impossible to make syntactic analysis and allotone selection unless the autonomous semantic mapping is made. And that should be acted by a strong AI machine.
In practice, the method of using linguistic theory to implement the Taiwanese tone group parser is a way to apply engineering technique to build the experiment environment of simulation for language acquisition.
Our exploration to solve the problem of allotone selection by integrating traditional knowledge representation technique as well as the attributes analysis method of words witnessed AI development tools could be the help for us to understand the process of human language acquisition.
The tone group derived from the Taiwanese tone sandhi is not only a unique prosoic unit, but also a natural syntactic structure that can be said to be a gem of human language. We sincerely hope that more people will participate in the study of Taiwanese tone group. A work-in-process version of artificial tone group parser that includes a knowledge base and an executable program file for Microsoft Windows system (XP/Win7) can be download for evaluation.
Download Taiwanese Tone Group Parser
download IJCLCLP paper
Works in Taiwanese |
Taiwanese Speech Notepad |
Original Programs ]
This website is sponsored
by VIKON Corp.
Copyright July 1996.
All rights reserved.
Last update : 2013-01-05
Any comments? mail to: