ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 15 Issue 3, March 2016

Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation
Rui Wang, Masao Utiyama, Isao Goto, Eiichiro Sumita, Hai Zhao, Bao-Liang Lu
Article No.: 11
DOI: 10.1145/2843942

The Language Model (LM) is an essential component of Statistical Machine Translation (SMT). In this article, we focus on developing efficient methods for LM construction. Our main contribution is that we propose a Natural N-grams based...

BenLem (A Bengali Lemmatizer) and Its Role in WSD
Abhisek Chakrabarty, Utpal Garain
Article No.: 12
DOI: 10.1145/2835494

A lemmatization algorithm for Bengali has been developed and evaluated. Its effectiveness for word sense disambiguation (WSD) is also investigated. One of the key challenges for computer processing of highly inflected languages is to deal with the...

Enhancing Shift-Reduce Constituent Parsing with Action N-Gram Model
Hao Zhou, Shujian Huang, Junsheng Zhou, Yue Zhang, Huadong Chen, Xinyu Dai, Chuan Cheng, Jiajun Chen
Article No.: 13
DOI: 10.1145/2820902

Current shift-reduce parsers “understand” the context by embodying a large number of binary indicator features with a discriminative model. In this article, we propose the action n-gram model, which utilizes the action sequence to help...

Extracting Arabic Causal Relations Using Linguistic Patterns
Jawad Sadek, Farid Meziane
Article No.: 14
DOI: 10.1145/2800786

Identifying semantic relations is a crucial step in discourse analysis and is useful for many applications in both language and speech technology. Automatic detection of Causal relations therefore has gained popularity in the literature...

Bilingual Semantic Role Labeling Inference via Dual Decomposition
Haitong Yang, Yu Zhou, Chengqing Zong
Article No.: 15
DOI: 10.1145/2835493

This article focuses on bilingual Semantic Role Labeling (SRL); its goal is to annotate semantic roles on both sides of the parallel bilingual texts (bi-texts). Since rich bilingual information is encoded, bilingual SRL has been applied in many...

Modeling Monolingual Character Alignment for Automatic Evaluation of Chinese Translation
Maoxi Li, Mingwen Wang, Hanxi Li, Fan Xu
Article No.: 16
DOI: 10.1145/2815619

Automatic evaluation of machine translations is an important task. Most existing evaluation metrics rely on matching the same word or letter n-grams. This strategy leads to poor results on Chinese translations because one has to rely merely...

Using Bisect K-Means Clustering Technique in the Analysis of Arabic Documents
Diab Abuaiadah
Article No.: 17
DOI: 10.1145/2812809

In this article, I have investigated the performance of the bisect K-means clustering algorithm compared to the standard K-means algorithm in the analysis of Arabic documents. The experiments included five commonly used similarity and distance...

Arabic Cross-Language Information Retrieval: A Review
Bilel Elayeb, Ibrahim Bounhas
Article No.: 18
DOI: 10.1145/2789210

Cross-language information retrieval (CLIR) deals with retrieving relevant documents in one language using queries expressed in another language. As CLIR tools rely on translation techniques, they are challenged by the properties of highly...

Adaptation of Language Models for SMT Using Neural Networks with Topic Information
Yinggong Zhao, Shujian Huang, Xin-Yu Dai, Jiajun Chen
Article No.: 19
DOI: 10.1145/2816816

Neural network language models (LMs) are shown to be effective in improving the performance of statistical machine translation (SMT) systems. However, state-of-the-art neural network LMs usually use words before the current position as context and...

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation
Chenchen Ding, Keisuke Sakanushi, Hirona Touji, Mikio Yamamoto
Article No.: 20
DOI: 10.1145/2818381

A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a...