Asian and Low-Resource Language Information Processing (TALLIP)


ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 16 Issue 3, March 2017

Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences
Andrew Finch, Taisuke Harada, Kumiko Tanaka-Ishii, Eiichiro Sumita
Article No.: 15
DOI: 10.1145/3003726

This article proposes a technique for mining bilingual lexicons from pairs of parallel short word sequences. The technique builds a generative model from a corpus of training data consisting of such pairs. The model is a hierarchical nonparametric...

Comparison Study on Critical Components in Composition Model for Phrase Representation
Shaonan Wang, Chengqing Zong
Article No.: 16
DOI: 10.1145/3010088

Phrase representation, an important step in many NLP tasks, involves representing phrases as continuous-valued vectors. This article presents detailed comparisons concerning the effects of word vectors, training data, and the composition and...

Improving Transition-Based Dependency Parsing of Hindi and Urdu by Modeling Syntactically Relevant Phenomena
Riyaz Ahmad Bhat, Irshad Ahmad Bhat, Dipti Misra Sharma
Article No.: 17
DOI: 10.1145/3005447

In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly...

Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language
Arjun Das, Debasis Ganguly, Utpal Garain
Article No.: 18
DOI: 10.1145/3015467

In this article, we propose a word embedding--based named entity recognition (NER) approach. NER is commonly approached as a sequence labeling task with the application of methods such as conditional random field (CRF). However, for low-resource...

Implicit Discourse Relation Recognition for English and Chinese with Multiview Modeling and Effective Representation Learning
Haoran Li, Jiajun Zhang, Chengqing Zong
Article No.: 19
DOI: 10.1145/3028772

Discourse relations between two text segments play an important role in many Natural Language Processing (NLP) tasks. The connectives strongly indicate the sense of discourse relations, while in fact, there are no connectives in a large proportion...

Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from Wikipedia
Goutham Tholpadi, Chiranjib Bhattacharyya, Shirish Shevade
Article No.: 20
DOI: 10.1145/3038295

Identifying translations from comparable corpora is a well-known problem with several applications. Existing methods rely on linguistic tools or high-quality corpora. Absence of such resources, especially in Indian languages, makes this problem...