ACM Transactions on Asian Language Information Processing (TALIP), Volume 1 Issue 3, September 2002

A word-based approach for modeling and discovering temporal relations embedded in Chinese sentences
Wenjie Li, Kam-Fai Wong
Pages: 173-206
DOI: 10.1145/772755.772756
Conventional information extraction systems cannot effectively mine temporal information. For example, users' queries on how one event is related to another in time could not be handled effectively. For this reason, it is important to capture and...

Automatic corpus-based tone and break-index prediction using K-ToBI representation
Jin-Seok Lee, Byeongchang Kim, Gary Geunbae Lee
Pages: 207-224
DOI: 10.1145/772755.772757
In this article we present a prosody generation architecture based on K-ToBI (Korean Tone and Break Index) representation. ToBI is a multitier representation system based on linguistic knowledge that transcribes events in an utterance. The TTS...

A comparison of Chinese document indexing strategies and retrieval models
Robert W. P. Luk, K. L. Kwok
Pages: 225-268
DOI: 10.1145/772755.772758
With the advent of the Internet and intranets, substantial interest is being shown in Asian language information retrieval; especially in Chinese, which is a good example of an Asian ideographic language (other examples include Japanese and Korean)....

A language and character set determination method based on N-gram statistics
Izumi Suzuki, Yoshiki Mikami, Ario Ohsato, Yoshihide Chubachi
Pages: 269-278
DOI: 10.1145/772755.772759
An N-gram-based language, script, and encoding scheme-detection method is introduced in this article. The method detects language, script, and encoding schemes using a target text document encoded by computer by checking how many byte sequences of...