Skip To Content

Athabasca University

Section 4 : Probabilistic language models and machine translation

Commentary

Section Goals

  • To introduce probabilistic language models of natural language processing.
  • To introduce machine translation, a typical application of NLP, and statistical machine translation methods.

Learning Objectives

Learning Objective 1

  • Outline representation, smoothing, and evaluating of probabilistic language models.
  • Explain the methods of learning probabilities for probabilistic context free grammars (PCFGs), and exemplify a small application of them.
  • Describe a general schematic diagram of machine translation systems, and name the tasks for which machine translation can be useful.
  • Explain the principle of statistical machine translation, and how to learn probabilities for machine translation.
  • - Explain the following concepts or terms:
    • Corpus-based approach
    • Probabilistic language model
    • N-gram models
    • Unigram, bigram, and trigram models
    • Add-one smoothing
    • Linear interpolation smoothing
    • Probabilistic context free grammar (PCFG)
    • Lexicalized PCFG
    • Memory-based, interlingua-based, and transfer-based machine translation systems
    • Language model
    • Translation model
    • Sentence alignment

Objective Readings

Required readings:

Reading topics:

Probabilistic Language Models, Machine Translation (see Sections 22.1 and 23.4 of AIMA3ed)

Supplemental Readings

Krieger, H.-U. (2007). From UBGs to CFGs: A practical corpus-driven approach. Natural Language Engineering, 13(04), 317-351.

Basili, R., Pazienza, M. T., and Velardi, P. (1996). An empirical symbolic approach to natural language processing. Artificial Intelligence, 85(1-2), 59-99.

Manning, C. D., and Schutze, H. (2000). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

Objective Questions

  • What are the desirable properties and difficulties of rule-based methods for uncertain reasoning?
  • How can we learn different probabilistic models (such as a language model, or a fertility model) for statistical machine translation from a bilingual corpus?
  • How can the EM algorithm be used to improve the estimated probabilistic models?

Objective Activities

  • Explore research work or systems incorporating both probabilistic and logic representation and reasoning in language processing. Report your findings in the course conference.
  • Test some machine translation systems, such as Google Translate, to see how satisfactory a general translation system is at this time. To test, you can translate English into another language you know (e.g., French), and vice versa.
  • Explore the following probabilistic statistical language processing algorithms that are related to this section of the textbook.
    • CYK - Parse
  • Complete Exercise 22.1 & 23.10 of AIMA3ed.

Updated November 17 2015 by FST Course Production Staff