ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) - TALLIP Notes and Regular Papers, Volume 16 Issue 2, December 2016

Section: TALLIP Notes

A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis
Eziz Tursun, Debasis Ganguly, Turghun Osman, Ya-Ting Yang, Ghalip Abdukerim, Jun-Lin Zhou, Qun Liu
Article No.: 8
DOI: 10.1145/2968410

Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we...

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus
Long H. B. Nguyen, Dien Dinh, Phuoc Tran
Article No.: 9
DOI: 10.1145/2990191

Manually constructing an annotated Named Entity (NE) in a bilingual corpus is a time-consuming, labor--intensive, and expensive process, but this is necessary for natural language processing (NLP) tasks such as cross-lingual information retrieval,...

Section: TALLIP Notes

Boosted Web Named Entity Recognition via Tri-Training
Chien-Lung Chou, Chia-Hui Chang, Ya-Yun Huang
Article No.: 10
DOI: 10.1145/2963100

Named entity extraction is a fundamental task for many natural language processing applications on the web. Existing studies rely on annotated training data, which is quite expensive to obtain large datasets, limiting the effectiveness of...

A Discourse-Based Approach for Arabic Question Answering
Jawad Sadek, Farid Meziane
Article No.: 11
DOI: 10.1145/2988238

The treatment of complex questions with explanatory answers involves searching for arguments in texts. Because of the prominent role that discourse relations play in reflecting text producers’ intentions, capturing the underlying structure...

Word Re-Segmentation in Chinese-Vietnamese Machine Translation
Phuoc Tran, Dien Dinh, Long H. B. Nguyen
Article No.: 12
DOI: 10.1145/2988237

In isolated languages, such as Chinese and Vietnamese, words are not separated by spaces, and a word may be formed by one or more syllables. Therefore, word segmentation (WS) is usually the first process that is implemented in the machine...

Minimally Supervised Chinese Event Extraction from Multiple Views
Peifeng Li, Guodong Zhou, Qiaoming Zhu
Article No.: 13
DOI: 10.1145/2994600

Although several semi-supervised learning models have been proposed for English event extraction, there are few successful stories in Chinese due to its special characteristics. In this article, we propose a novel minimally supervised model for...

Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure
Arjun Atreya V, Ashish Kankaria, Pushpak Bhattacharyya, Ganesh Ramakrishnan
Article No.: 14
DOI: 10.1145/2997643

Retrievals in response to queries to search engines in resource-scarce languages often produce no results, which annoys the user. In such cases, at least partially relevant documents must be retrieved. We propose a novel multilingual framework,...