enter search term and/or author name
Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences
Andrew Finch, Taisuke Harada, Kumiko Tanaka-Ishii, Eiichiro Sumita
Article No.: 15
This article proposes a technique for mining bilingual lexicons from pairs of parallel short word sequences. The technique builds a generative model from a corpus of training data consisting of such pairs. The model is a hierarchical nonparametric...
Comparison Study on Critical Components in Composition Model for Phrase Representation
Shaonan Wang, Chengqing Zong
Article No.: 16
Phrase representation, an important step in many NLP tasks, involves representing phrases as continuous-valued vectors. This article presents detailed comparisons concerning the effects of word vectors, training data, and the composition and...
Improving Transition-Based Dependency Parsing of Hindi and Urdu by Modeling Syntactically Relevant Phenomena
Riyaz Ahmad Bhat, Irshad Ahmad Bhat, Dipti Misra Sharma
Article No.: 17
In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly...
Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language
Arjun Das, Debasis Ganguly, Utpal Garain
Article No.: 18
In this article, we propose a word embedding--based named entity recognition (NER) approach. NER is commonly approached as a sequence labeling task with the application of methods such as conditional random field (CRF). However, for low-resource...
Implicit Discourse Relation Recognition for English and Chinese with Multiview Modeling and Effective Representation Learning
Haoran Li, Jiajun Zhang, Chengqing Zong
Article No.: 19
Discourse relations between two text segments play an important role in many Natural Language Processing (NLP) tasks. The connectives strongly indicate the sense of discourse relations, while in fact, there are no connectives in a large proportion...
Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from Wikipedia
Goutham Tholpadi, Chiranjib Bhattacharyya, Shirish Shevade
Article No.: 20
Identifying translations from comparable corpora is a well-known problem with several applications. Existing methods rely on linguistic tools or high-quality corpora. Absence of such resources, especially in Indian languages, makes this problem...
A Hybrid Model for Chinese Spelling Check
Hai Zhao, Deng Cai, Yang Xin, Yuzhu Wang, Zhongye Jia
Article No.: 21
Spelling check for Chinese has more challenging difficulties than that for other languages. A hybrid model for Chinese spelling check is presented in this article. The hybrid model consists of three components: one graph-based model for generic...