Sign language is the primary communication medium of the aurally handicapped community. Often, a sign gesture is mapped to a word or a phrase in a spoken language and named as a conversational sign. A fingerspelling sign is a special sign derived to show a single character that matches a character in the alphabet of a given language. This enables the deaf community to express words that do not have a conversational sign, such as a name, using a letter-by-letter technique. Sinhala Sign Language (SSL) uses a phonetic pronunciation mechanism to decode such words due to the presence of one or more modifiers after a consonant. Expressing numbers also has a similar notation, and it is broken down into parts before interpretation in sign gestures. This paper presents the variations implemented to make the 3D avatar-based interpreter system look similar to an actual fingerspelled SSL by a human interpreter. To accomplish the task, a phonetic English-based 3D avatar animation system is developed with Blender animation software. The conversion of Sinhala Unicode text to phonetic English and numbers written in digits to sign gestures is done with a Visual Basic.NET (VB .NET) application. The presented application has 61 SSL fingerspelling signs and 41 number signs. It is capable of interpreting any word written using the modern Sinhala alphabet without conversational signs and numbers that go up to the billions. This is a helpful tool in teaching SSL fingerspelling and number signs of SSL to deaf children.
Recently, quality estimation has been attracting increasing interest from the machine translation researchers, aiming at finding a good estimator for the quality of machine translation output. The common approach for quality estimation is to treat the problem as a supervised classification task using a quality-annotated parallel corpus, called quality estimation data, as training data. However, the available size of quality estimation data for training remains small, due to the exorbitant cost involved in the creation of such data. In addition, most conventional quality estimation approaches rely on manually designed features to model nonlinear relationships between feature vectors and corresponding quality labels. To overcome these problems, this paper proposes a novel neural network architecture for quality estimation task called the predictor-estimator that considers word prediction as an additional pre-task. The major component of the proposed neural architecture is a word prediction model based on a modified neural machine translation model a probabilistic model for predicting a target word conditioning on all the other source and target contexts. Our proposed quality estimation method sequentially trains the following two types of neural models: 1) Predictor: Neural word prediction model trained from parallel corpora and 2) Estimator: Neural quality estimation model trained from quality estimation data. To transfer word prediction task to quality estimation task, we generate quality estimation feature vectors from the word prediction model and feed them into the quality estimation model. The results of experiments conducted on WMT 15 and 16 quality estimation datasets indicate that our proposed method has potential as it achieves state-of-the-art performances in the various sub-challenges.
Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this paper, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features, and investigated the possibility of using topic-modeling-inspired features, to classify documents according to their authors. We have created a corpus from nearly all the literary works of three eminent Bengali authors, consisting of 3000 disjoint samples. Our models showed better performance than the state-of-the-art, with more than 98% test accuracy for the shallow features, and 100% test accuracy for the topic-based features.
Extractive summarization, a process that automatically picks exemplary sentences from a text (or spoken) document with the goal of concisely conveying key information therein, has seen a surge of attention from scholars and practitioners recently. Using a language modeling (LM) approach for sentence selection has been proven effective for performing unsupervised extractive summarization. However, one of the major difficulties facing the LM approach is to model sentences and estimate their parameters more accurately for each text (or spoken) document. We extend this line of research and make the following contributions in this work. First, we propose a position-aware language modeling framework using various granularities of position-specific information to better estimate the sentence models involved in the summarization process. Second, we explore disparate ways to integrate the positional cues into relevance models through a pseudo-relevance feedback procedure. Third, we extensively evaluate various models originated from our proposed framework and several well-established unsupervised methods. Empirical evaluation conducted on a broadcast news summarization task further demonstrates performance merits of the proposed summarization methods.
Some natural languages belong to the same family or share similar syntactic and/or semantic regularities. This opportunity persuades researchers to share computational models across languages and benefit from high-quality models to boost existing low-performance alternatives. We follow a similar idea in our research. In this paper, we describe statistical and neural machine translation (MT) engines trained on a language pair but is used to translate another language. First we train a reliable model with a high-resource language, then we exploit cross-lingual similarities and adapt the model to work for a close language with almost zero resources. Our proposed solution can be easily applied to any close language pair that we choose Turkish (Tr) and Azeri or Azerbaijani (Az) in this regard. Azeri suffers from the lack of resources as there is almost no bilingual corpus and MT system for this language. To the best of our knowledge, this is the first time that an Azeri MT system is developed and evaluated. By use of our models, we are able to train an engine for the Az->English (En) direction with a BLEU score of 22.30.