ACM Transactions on Asian Language Information Processing (TALIP), Volume 6 Issue 3, November 2007

Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system
S. Saraswathi, T. V. Geetha
Article No.: 9
DOI: 10.1145/1290002.1290003

This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large...

Developing lexicographic sorting: An example for Urdu
Sarmad Hussain, Sana Gul, Afifah Waseem
Article No.: 10
DOI: 10.1145/1290002.1290004

Collation or lexicographic sorting is essential to develop multilingual computing. This paper presents the challenges faced in developing collation sequence for a language. The paper discusses both theoretical linguistic and practical...

Topic tracking based on bilingual comparable corpora and semisupervised clustering
Fumiyo Fukumoto, Yoshimi Suzuki
Article No.: 11
DOI: 10.1145/1290002.1290005

In this paper, we address the problem of skewed data in topic tracking: the small number of stories labeled positive as compared to negative stories and propose a method for estimating effective training stories for the topic-tracking task....