A Constraint Approach to Pivot-based Bilingual Dictionary Induction
Collective Web-based Parenthetical Translation Extraction Using Markov Logic Networks
High quality bilingual dictionaries are very useful, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language to link two other languages is a well-known solution, and usually requires only two input bilingual dictionaries A-B and B-C to automatically induce the new one, A-C. This approach, however, has never been demonstrated to utilize the complete structures of the input bilingual dictionaries, and this is a key failing because the dropped meanings negatively influence the result. This paper proposes a constraint approach to pivot-based dictionary induction where language A and C are closely related. We create constraints from language similarity and model the structures of the input dictionaries as a Boolean optimization problem which is then formulated within the Weighted Partial Max-SAT framework, an extension of Boolean Satisfiability (SAT). All of the encoded CNF (Conjunctive Normal Form), the predominant input language of modern SAT/MAX-SAT solvers, formulas are evaluated by a solver to produce the target (output) bilingual dictionary. Moreover, we discuss alternative formalizations as a comparison study. We designed a tool that uses Sat4j library as the default solver to implement our method, and conducted an experiment in which the induced bilingual dictionary achieved better quality than the baseline method.