ACM DL

ACM Transactions on

Asian and Low-Resource Language Information Processing (TALLIP)

Menu
Latest Articles

Model Generation of Accented Speech using Model Transformation and Verification for Bilingual Speech Recognition

Nowadays, bilingual or multilingual speech recognition is confronted with the accent-related problem caused by non-native speech in a variety of real-world applications. Accent modeling of non-native speech is definitely challenging, because the acoustic properties in highly-accented speech pronounced by non-native speakers are quite divergent. The aim of this study is to generate highly Mandarin-accented English models for speakers whose mother tongue is Mandarin. First, a two-stage, state-based verification method is proposed to extract the state-level, highly-accented speech segments automatically. Acoustic features and articulatory features are successively used for robust verification of the extracted speech segments. Second, Gaussian components of the highly-accented speech models... (more)

Keyword Extraction from Arabic Documents using Term Equivalence Classes

The rapid growth of the Internet and other computing facilities in recent years has resulted in the creation of a large amount of text in electronic form, which has increased the interest in and importance of different automatic text processing applications, including keyword extraction and term indexing. Although keywords are very useful for many applications, most documents available online are not provided with keywords. We describe a method for extracting keywords from Arabic documents. This method identifies the keywords by combining linguistics and statistical analysis of the text without using prior knowledge from its domain or information from any related corpus. The text is preprocessed to extract the main linguistic information, such as the roots and morphological patterns of... (more)

Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words

This article describes a postprocessing strategy for online, handwritten, isolated Tamil words. Contributions have been made with regard to two issues hardly addressed in the online Indic word recognition literature, namely, use of (1) language models exploiting the idiosyncrasies of Indic scripts and (2) expert classifiers for the disambiguation of confused symbols.

The input word is first segmented into its individual symbols, which are recognized using a primary support vector machine (SVM) classifier. Thereafter, we enhance the recognition accuracy by utilizing (i) a bigram language model at the symbol or character level and (ii) expert classifiers for reevaluating and disambiguating the different sets of confused symbols. The symbol-level bigram model is used in a... (more)

Towards Machine Translation in Semantic Vector Space

Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus and may be not reliable due to the data sparseness problem. To address this issue, we propose measuring the quality of the translation rules and their composition in the semantic vector embedding space (VES). We present a recursive neural network (RNN)-based translation framework, which includes two submodels. One is the bilingually-constrained recursive auto-encoder, which is proposed to convert the lexical translation rules into compact real-valued vectors in... (more)

NEWS

New Name, Expanded Scope

This page provides information about the journal Transactions on Asian and Low-Resource Language Information Processing (TALLIP), a publication of the Association for Computing Machinery (ACM).

The journal was formerly known as the Transactions on Asian Language Information Processing (TALIP): see the editorial charter for information on the expanded scope of the journal.  

ACM Author Options
New options for ACM authors to manage rights and permissions for their work: ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights.    

 

Forthcoming Articles

A Constraint Approach to Pivot-based Bilingual Dictionary Induction

Bigram language models and reevaluation strategy for improved recognition of online handwritten Tamil words

Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

Interest in statistical approaches for Korean morphological analyses has recently been shown. However, previous studies have been mostly based on generative models, including a hidden Markov model (HMM), without utilizing discriminative models such as a conditional random field (CRF). In this paper, we present a two-stage discriminative approach based on CRFs for a Korean morphological analysis. Similar to methods used for Chinese, we perform two disambiguation procedures based on CRFs: 1) morpheme segmentation and 2) POS tagging. In morpheme segmentation, an input sentence is segmented into sequences of morphemes, where a morpheme unit is either atomic or compound. In the POS tagging procedure, each morpheme (atomic or compound) is assigned a POS tag. Once the POS tagging is complete, we carry out a post-processing of the compound morphemes, where each compound morpheme is further decomposed into atomic morphemes, which is based on pre-analyzed patterns and generalized HMMs obtained from the given tagged corpus. Experimental results show the promise of our proposed method.

A Constraint Approach to Pivot-based Bilingual Dictionary Induction

High quality bilingual dictionaries are very useful, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language to link two other languages is a well-known solution, and usually requires only two input bilingual dictionaries A-B and B-C to automatically induce the new one, A-C. This approach, however, has never been demonstrated to utilize the complete structures of the input bilingual dictionaries, and this is a key failing because the dropped meanings negatively influence the result. This paper proposes a constraint approach to pivot-based dictionary induction where language A and C are closely related. We create constraints from language similarity and model the structures of the input dictionaries as a Boolean optimization problem which is then formulated within the Weighted Partial Max-SAT framework, an extension of Boolean Satisfiability (SAT). All of the encoded CNF (Conjunctive Normal Form), the predominant input language of modern SAT/MAX-SAT solvers, formulas are evaluated by a solver to produce the target (output) bilingual dictionary. Moreover, we discuss alternative formalizations as a comparison study. We designed a tool that uses Sat4j library as the default solver to implement our method, and conducted an experiment in which the induced bilingual dictionary achieved better quality than the baseline method.

Multilingual Topic Models for Bilingual Dictionary Extraction

A Unified Model for Solving the OOV Problems of Chinese Word Segmentation

A Hybrid Feature Extraction Algorithm For Devanagari Script

Pre-ordering Using a Target Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

Bibliometrics

Publication Years 2015-2015
Publication Count 9
Citation Count 0
Available for Download 9
Downloads (6 weeks) 231
Downloads (12 Months) 386
Downloads (cumulative) 386
Average downloads per article 43
Average citations per article 0
First Name Last Name Award
Baoli Li Senior Member (2012)
Limsoon Wong Fellows (2013)
Dong Zhou Senior Member (2012)

First Name Last Name Paper Counts
Hideki Mima 1
Nitin Ramrakhiyani 1
B Kumari 1
Hiroki Hanaoka 1
Chutamanee Onsuwan 1
Ramisettyrajeshwara Rao 1
Prasenjit Majumder 1
Thanaruk Theeramunkong 1
Yusuke Miyao 1
Nongnuch Ketui 1
Sumire Uematsu 1
Takuya Matsuzaki 1

Affiliation Paper Counts
University of Tokyo 3
Thammasat University 3

ACM Transactions on Asian and Low-Resource Language Information Processing
Archive


2015
Volume 14 Issue 2, March 2015
Volume 14 Issue 1, January 2015

2014
Volume 13 Issue 4, December 2014
Volume 13 Issue 2, June 2014

2013
Volume 12 Issue 3, August 2013
Volume 12 Issue 2, June 2013
Volume 12 Issue 1, March 2013

2012
Volume 11 Issue 4, December 2012 Special Issue on RITE
Volume 11 Issue 3, September 2012
Volume 11 Issue 2, June 2012
Volume 11 Issue 1, March 2012

2011
Volume 10 Issue 4, December 2011
Volume 10 Issue 3, September 2011
Volume 10 Issue 2, June 2011
Volume 10 Issue 1, March 2011

2010
Volume 9 Issue 4, December 2010
Volume 9 Issue 3, September 2010
Volume 9 Issue 2, June 2010
Volume 9 Issue 1, March 2010

2009
Volume 8 Issue 4, December 2009
Volume 8 Issue 3, August 2009
Volume 8 Issue 2, May 2009
Volume 8 Issue 1, March 2009

2008
Volume 7 Issue 4, November 2008
Volume 7 Issue 3, August 2008
Volume 7 Issue 2, June 2008
Volume 7 Issue 1, February 2008

2007
Volume 6 Issue 4, December 2007
Volume 6 Issue 3, November 2007
Volume 6 Issue 2, September 2007
Volume 6 Issue 1, April 2007

2006
Volume 5 Issue 4, December 2006
Volume 5 Issue 3, September 2006
Volume 5 Issue 2, June 2006
Volume 5 Issue 1, March 2006

2005
Volume 4 Issue 4, December 2005
Volume 4 Issue 3, September 2005
Volume 4 Issue 2, June 2005
Volume 4 Issue 1, March 2005

2004
Volume 3 Issue 4, December 2004
Volume 3 Issue 3, September 2004
Volume 3 Issue 2, June 2004
Volume 3 Issue 1, March 2004 Special Issue on Temporal Information Processing

2003
Volume 2 Issue 4, December 2003
Volume 2 Issue 3, September 2003
Volume 2 Issue 2, June 2003
Volume 2 Issue 1, March 2003

2002
Volume 1 Issue 4, December 2002
Volume 1 Issue 3, September 2002
Volume 1 Issue 2, June 2002
Volume 1 Issue 1, March 2002
 
All ACM Journals | See Full Journal Index