ACM Transactions on

Asian and Low-Resource Language Information Processing (TALLIP)

Latest Articles

Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques

Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes... (more)

A Hybrid Feature Extraction Algorithm for Devanagari Script

The efficiency of any character recognition technique is directly dependent on the accuracy of the generated feature set that could uniquely represent... (more)

Improving Handwritten Arabic Character Recognition by Modeling Human Handwriting Distortions

Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and the... (more)

A Constraint Approach to Pivot-Based Bilingual Dictionary Induction

High-quality bilingual dictionaries are very useful, but such resources are rarely available for lower-density language pairs, especially for those... (more)


New Name, Expanded Scope

This page provides information about the journal Transactions on Asian and Low-Resource Language Information Processing (TALLIP), a publication of the Association for Computing Machinery (ACM).

The journal was formerly known as the Transactions on Asian Language Information Processing (TALIP): see the editorial charter for information on the expanded scope of the journal.  

ACM Author Options
New options for ACM authors to manage rights and permissions for their work: ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights.    


Forthcoming Articles
Bilingual Semantic Role Labeling Inference via Dual Decomposition

This paper proposes an approach to infer bilingual Semantic Role Labeling (SRL) efficiently. As conveying the same meaning, translated bi-texts should have the same predicate semantic structure. However, it is very difficult to obtain consistent SRL results on both sides of bi-texts in monolingual SRL systems. Moreover, both sides of bi-texts usually contain complementary language cues, which can be used to improve over monolingual SRL systems. Thus, it is a better way to jointly infer bilingual Semantic Role Labeling. However, existing methods for joint bilingual SRL require high inference costs. In this paper, we utilize a simple but efficient technique - Lagrange Dual Decomposition to search for consistent results for both sides of bi-texts. On the other hand, intuitively the bilingual complementary cues could also provide the guidance for argument identification. To achieve this goal, we propose a method called Bi-Directional Projection (BDP) to recover arguments discarded in the argument identification phase of the monolingual SRL systems. We evaluate our method on a standard parallel benchmark - the OntoNotes dataset. The experimental results show that our method yields significant improvements over the state-of-the-art monolingual systems. In addition, our approach is also better and faster than existing methods due to Bi-Directional Projection and Lagrange Dual Decomposition.

Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation

Collective Web-based Parenthetical Translation Extraction Using Markov Logic Networks

Printed Text Image Database for Sindhi OCR

Document Image Understanding (DIU) and Electronic Document Management are active fields of research involving image understanding, interpretation, efficient handling, and routing of documents as well their retrieval. Research on most of the non-cursive scripts (Latin), have matured whereas research on the cursive (connected) scripts is still moving towards perfection. Many researchers are currently working on the cursive scripts (Arabic and other scripts adopting it) around the world so that the difficulties and challenges in document understanding and handling of these scripts can be overcome. Sindhi script has the largest extension of the original Arabic alphabet among languages adopting Arabic script; it contains 52 characters compared to 28 characters in Arabic alphabet in order to accommodate more sounds for the language. There are 24 differentiating characters with some possessing four dots. For Sindhi OCR research and development a database is needed for training and testing of Sindhi text images. We have developed a large database containing 4 billion 57 million words and 15 billion 275 million characters in 150 various fonts in 4 font weight and 4 styles. The database contents were collected from various sources including websites, books, theses and others. A custom built application was also developed to create a text image from a text document that supports various fonts and sizes. The database considers the words, characters, characters with spaces and lines. The database is freely available as a partial or full database by sending email to one of the authors.


This paper presents an elegant technique for extracting low level stroke features, like line segments, curve segments, end points and junction points from the off line printed text using template matching approach. The proposed feature are used to classify the subset of characters from Gujarati character set. The dataset consist of approximately 16000 middle zone symbols of 42 different character classes. The symbols are collected from three different sources, namely machine printed book, laser printed document and news papers, in order to add varieties in terms of size, font type, style, ink variation and boundary deformation. The experiment shows that the features are quite robust against the variations and the results obtained are comparable with other existing work.

Inter-, Intra-, and Extra-chunk Pre-reordering for Statistical Japanese-to-English Machine Translation

A rule-based pre-reordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-reordered to an English-like order at the morpheme level for a statistical machine translation system during the training and decoding phase to resolve the reordering problem. In this paper, extra-chunk pre-reordering of morphemes is proposed, which allows Japanese functional morphemes to move across chunk boundaries. This contrasts with the intra-chunk reordering used in previous approaches, which restricts the reordering of morphemes within a chunk. Linguistically oriented discussions show that correct pre-reordering can not be realized without extra-chunk movement of morphemes. The proposed approach is compared with five rule-based pre-reordering approaches designed for Japanese-to-English translation and with a language independent statistical pre-reordering approach on a standard patent data set and on a news data set obtained by crawling Internet news sites. Two state-of-the-art statistical machine translation systems, one phrase-based and the other hierarchical phrase-based, are used in experiments. Experimental results show that the proposed approach markedly outperforms the compared approaches on automatic reordering measures (Kendall's tau, Spearman's rho, fuzzy reordering score, and test set RIBES) and on the automatic translation precision measure of test set BLEU score.

Word Segmentation for Burmese (Myanmar)

Experiments of various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrate that statistical and machine learning approaches have significantly better performance than dictionary-based approaches. We believe this note is the first systematic comparison of word segmentation approaches for Burmese, based on an annotated corpus with relatively considerable size (containing approximately half million words). This work intends to discover the properties and proper approaches of Burmese textual processing and to promote further researches of the understudied language.

BenLem (a Bengali Lemmatizer) and its Role in WSD

A lemmatization algorithm for Bengali has been developed and its effectiveness for word sense disambiguation (WSD) is investigated. One of the key challenges for computer processing of agglutinative languages is to deal with the frequent morphological variations of the root words as they appear in the text. Therefore, designing of a lemmatizer is essential for developing many natural language processing (NLP) tools for such languages. In this experiment, Bengali which is the national language of Bangladesh and the second most popular language in the Indian subcontinent has been taken as a reference. In order to design the lemmatizer (BenLem), possible transformations through which surface words are formed from lemmas are studied so that suitable reverse transformations (along with contextual knowledge) can be applied on a surface word to get the corresponding lemma back. BenLem is found to be capable of handling both inflectional and derivational morphology in Bengali. It is evaluated on a set of $18$ news articles taken from FIRE Bengali News Corpus consisting of $3,338$ surface words (excluding proper nouns) and found to be about 82.68\% accurate. The role of the lemmatizer is then investigated for Bengali WSD. Fifty ($50$) news articles are randomly selected from the FIRE corpus and five most frequent polysemous Bengali words are considered for sense disambiguation. Different WSD systems are considered for this experiment and it is noticed that BenLem improves the performance of all the WSD systems and the improvement is statistically significant.

Fuzzy Hindi WordNet and Word Sense Disambiguation Using Fuzzy Graph Connectivity Measures


Publication Years 2015-2015
Publication Count 23
Citation Count 0
Available for Download 23
Downloads (6 weeks) 316
Downloads (12 Months) 1449
Downloads (cumulative) 1449
Average downloads per article 63
Average citations per article 0
First Name Last Name Award
Baoli Li ACM Senior Member (2012)
Limsoon Wong ACM Fellows (2013)
Dong Zhou ACM Senior Member (2012)

First Name Last Name Paper Counts
Chengqing Zong 2
Xiaodong Liu 2
Kevin Duh 2
Hideki Mima 1
Yuji Matsumoto 1
Chunghsien Wu 1
Jiajun Zhang 1
Nitin Ramrakhiyani 1
B Kumari 1
Kehjiann Chen 1
Yuming Hsieh 1
Hsinmin Wang 1
Fei Cheng 1
Mu Li 1
Hiroki Hanaoka 1
Chutamanee Onsuwan 1
Sadao Kurohashi 1
Ramisettyrajeshwara Rao 1
Isao Goto 1
Gina Levow 1
Shuling Huang 1
Seunghoon Na 1
Peishan Tsai 1
Suresh Sundaram 1
Angarai Ramakrishnan 1
Prasenjit Majumder 1
Thanaruk Theeramunkong 1
Kuanyu Chen 1
Arafat Awajan 1
Yusuke Miyao 1
Nongnuch Ketui 1
Shihhung Wu 1
Minghong Bai 1
Juifeng Yeh 1
Wenyi Chen 1
Maochuan Su 1
Ming Zhou 1
Sumire Uematsu 1
Takuya Matsuzaki 1
Eiichiro Sumita 1
Chaolin Liu 1
Xiaoqing Li 1
Masao Utiyama 1
Kehyih Su 1
Hanping Shen 1
Shujie Liu 1
Yūji Matsumoto 1
Lunghao Lee 1
Hsinhsi Chen 1

Affiliation Paper Counts
Indian Institute of Technology 1
Princess Sumaya University 1
Academia Sinica Taiwan 1
Indian Institute of Science 1
Japan National Institute of Information and Communications Technology 2
Kyoto University 2
Microsoft Research Asia 3
Nara Institute of Science and Technology 3
National Cheng Kung University 3
University of Tokyo 3
Thammasat University 3
Chinese Academy of Sciences 4

ACM Transactions on Asian and Low-Resource Language Information Processing

Volume 15 Issue 1, November 2015  Issue-in-Progress
Volume 14 Issue 4, October 2015 Special Issue on Chinese Spell Checking
Volume 14 Issue 3, June 2015
Volume 14 Issue 2, March 2015
Volume 14 Issue 1, January 2015

Volume 13 Issue 4, December 2014
Volume 13 Issue 3, September 2014
Volume 13 Issue 2, June 2014
Volume 13 Issue 1, February 2014

Volume 12 Issue 4, October 2013
Volume 12 Issue 3, August 2013
Volume 12 Issue 2, June 2013
Volume 12 Issue 1, March 2013

Volume 11 Issue 4, December 2012 Special Issue on RITE
Volume 11 Issue 3, September 2012
Volume 11 Issue 2, June 2012
Volume 11 Issue 1, March 2012

Volume 10 Issue 4, December 2011
Volume 10 Issue 3, September 2011
Volume 10 Issue 2, June 2011
Volume 10 Issue 1, March 2011

Volume 9 Issue 4, December 2010
Volume 9 Issue 3, September 2010
Volume 9 Issue 2, June 2010
Volume 9 Issue 1, March 2010

Volume 8 Issue 4, December 2009
Volume 8 Issue 3, August 2009
Volume 8 Issue 2, May 2009
Volume 8 Issue 1, March 2009

Volume 7 Issue 4, November 2008
Volume 7 Issue 2, June 2008
Volume 7 Issue 3, August 2008
Volume 7 Issue 1, February 2008

Volume 6 Issue 4, December 2007
Volume 6 Issue 3, November 2007
Volume 6 Issue 2, September 2007
Volume 6 Issue 1, April 2007

Volume 5 Issue 4, December 2006
Volume 5 Issue 3, September 2006
Volume 5 Issue 2, June 2006
Volume 5 Issue 1, March 2006

Volume 4 Issue 4, December 2005
Volume 4 Issue 3, September 2005
Volume 4 Issue 2, June 2005
Volume 4 Issue 1, March 2005

Volume 3 Issue 4, December 2004
Volume 3 Issue 3, September 2004
Volume 3 Issue 2, June 2004
Volume 3 Issue 1, March 2004 Special Issue on Temporal Information Processing

Volume 2 Issue 4, December 2003
Volume 2 Issue 3, September 2003
Volume 2 Issue 2, June 2003
Volume 2 Issue 1, March 2003

Volume 1 Issue 4, December 2002
Volume 1 Issue 3, September 2002
Volume 1 Issue 2, June 2002
Volume 1 Issue 1, March 2002
All ACM Journals | See Full Journal Index