Collective Web-based Parenthetical Translation Extraction Using Markov Logic Networks
This special issue contains four papers based on and expanded from systems participating in the SIGHAN-7 Chinese Spelling Check Bakeoff. We provide an overview of the approaches and designs for Chinese spelling checkers presented in these articles. We conclude this introductory paper with a summary of possible future directions.
A rule-based pre-reordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-reordered to an English-like order at the morpheme level for a statistical machine translation system during the training and decoding phase to resolve the reordering problem. In this paper, extra-chunk pre-reordering of morphemes is proposed, which allows Japanese functional morphemes to move across chunk boundaries. This contrasts with the intra-chunk reordering used in previous approaches, which restricts the reordering of morphemes within a chunk. Linguistically oriented discussions show that correct pre-reordering can not be realized without extra-chunk movement of morphemes. The proposed approach is compared with five rule-based pre-reordering approaches designed for Japanese-to-English translation and with a language independent statistical pre-reordering approach on a standard patent data set and on a news data set obtained by crawling Internet news sites. Two state-of-the-art statistical machine translation systems, one phrase-based and the other hierarchical phrase-based, are used in experiments. Experimental results show that the proposed approach markedly outperforms the compared approaches on automatic reordering measures (Kendall's tau, Spearman's rho, fuzzy reordering score, and test set RIBES) and on the automatic translation precision measure of test set BLEU score.