enter search term and/or author name
Document Image Understanding (DIU) and Electronic Document Management are active fields of research involving image understanding, interpretation, efficient handling, and routing of documents as well as their retrieval. Research on most of the...
Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrate that...
From Image to Translation: Processing the Endangered Nyushu Script
Tongtao Zhang, Aritra Chowdhury, Nimit Dhulekar, Jinjing Xia, Kevin Knight, Heng Ji, Bülent Yener, Liming Zhao
Article No.: 23
The lack of computational support has significantly slowed down automatic understanding of endangered languages. In this paper, we take Nyushu (simplified Chinese: 女书; literally: “women’s writing”) as a case study...
Although query log analysis provides crucial insights about Web users’ search interests, conducting such analyses is almost impossible for some languages, as large-scale and public query logs are quite scarce. In this study, we first survey...
Classification of Printed Gujarati Characters Using Low-Level Stroke Features
Mukesh M. Goswami, Suman K. Mitra
Article No.: 25
This article presents an elegant technique for extracting the low-level stroke features, such as endpoints, junction points, line elements, and curve elements, from offline printed text using a template matching approach. The proposed features are...
A Four-Tier Annotated Urdu Handwritten Text Image Dataset for Multidisciplinary Research on Urdu Script
Prakash Choudhary, Neeta Nain
Article No.: 26
This article introduces a large handwritten text document image corpus dataset for Urdu script named CALAM (Cursive And Language Adaptive Methodologies). The database contains unconstrained handwritten sentences along with their structural...
A Fast and Compact Language Model Implementation Using Double-Array Structures
Jun-Ya Norimatsu, Makoto Yasuhara, Toru Tanaka, Mikio Yamamoto
Article No.: 27
The language model is a widely used component in fields such as natural language processing, automatic speech recognition, and optical character recognition. In particular, statistical machine translation uses language models, and the translation...
Learning Generalized Features for Semantic Role Labeling
Haitong Yang, Chengqing Zong
Article No.: 28
This article makes an effort to improve Semantic Role Labeling (SRL) through learning generalized features. The SRL task is usually treated as a supervised problem. Therefore, a huge set of features are crucial to the performance of SRL systems....
Bangla Handwritten Character Segmentation Using Structural Features: A Supervised and Bootstrapping Approach
Tapan Kumar Bhowmik, Swapan Kumar Parui, Utpal Roy, Lambert Schomaker
Article No.: 29
In this article, we propose a new framework for segmentation of Bangla handwritten word images into meaningful individual symbols or pseudo-characters. Existing segmentation algorithms are not usually treated as a classification problem. However,...