Skip to content

Left to right decoding for Hiero Statistical Machine Translation

Notifications You must be signed in to change notification settings

msiahbani/lrhiero

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 

Repository files navigation

lrhiero

This is a left-to-right decoder for hierarchical phrase-based (hiero) SMT system.

Usage

python decoder_cp.py --config <lrhiero.ini> --inputfile <test.in> --outputfile <test.out> --ttable-file <test.rule>

Requirements

  • Python 2.6 or higher
  • SRILM For building language model (LM)
  • KenLM Library for querying the language model (a Python wrapper is included with the decoder and the source for the wrapper will be released soon)

Advance Features of the Decoder

Grammar

LR-Hiero uses a spesific form of SCFG (Synchronous Context-Free Grammar) rules which are prefix-lexicalized or in so-called Greibach Normal Form (GNF) on target side.

File Format

The content of rule file is for each line: source phrase, target phrase, and features.

aber X__1 sagen X__2 ||| but say X__1 X__2 ||| -0.558224 -0.000100005 -1.14023 -1.84913

Handling Unknown Words

Unknown words are copied verbatim to the output. For each unknown word/phrase 4 glue rules are generated and added to the grammar, therefore they may be placed out of order in the output. Unknown words are also scored by the language model.

Citation

If you use this decoder in you research, consider citing:

  • Efficient Left-to-Right Hierarchical Phrase-based Translation with Improved Reordering. Maryam Siahbani, Baskaran Sankaran and Anoop Sarkar. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2013). Oct 18-21, 2013. Seattle, USA.

Contacts

Maryam Siahbani, Anoop Sarkar

{msiahban,anoop}@sfu.ca

NatLang Lab, School of Computing Science Simon Fraser University, Burnaby, BC V5A 1S6. Canada

About

Left to right decoding for Hiero Statistical Machine Translation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 53.1%
  • Python 45.2%
  • Shell 1.7%