Skip To Content

Athabasca University

Unit 12: Natural Language Understanding and Statistical Language Processing

Commentary

This unit discusses natural language processing (NLP), or computational linguistics (CL), which is one of the most important subfields of AI, and is among the first few tasks that has been continuously explored since the advent of AI. Despite its fairly long history, NLP remains interesting and challenging. As huge amounts of information are accessible through the Web and other resources, especially in the form of natural language speech, text, or semi-structured text, NLP has been regarded as one of the key techniques for processing, sharing, accessing, transferring, and producing information and knowledge in the new era of knowledge. This unit introduces several main aspects of NLP, covering both natural language understanding (NLU) and statistical language processing, and includes syntactic analysis, semantic interpretation, disambiguation, language generation, discourse understanding, probabilistic language models, information retrieval, information extraction, and machine translation.

Unit Purpose

When you complete this unit, you will be able to

  • Explain the concepts, principles, and techniques of natural language processing, natural language understanding, and statistical language processing.
  • Describe the methods and algorithms associated with language processing, such as parsing, segmentation, grammar induction, PCFG learning, and indexing.
  • Discuss several application systems related to NLP techniques, such as search engine, question answering, speech recognition, machine translation, and text mining.

Section 1: Natural language, grammar, and parsing
Section 2: Semantic interpretation and disambiguation
Section 3: Discourse understanding and grammar induction
Section 4: Probabilistic language models and machine translation
Section 5: Information retrieval and information extraction

Readings

Supplemental Unit Readings

Books:

Jurafsky, D., and Martin, J.H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd Ed.). Upper Saddle River, NJ: Prentice Hall.

Unit 12: Natural Language Understanding and Statistical Language Processing

Manning, C. D., and Schutze, H. (2000). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

Manning, C. D., Raghavan, P., and Schutze, H. (2008). Introduction to information retrieval. New York, NY: Cambridge University Press.

Activities

  • Explore several language processing and information retrieval software tools, such as GATE (https://gate.ac.uk/), OpenNLP (http://opennlp.apache.org/), NLTK (https://www.nltk.org/), and Lucene (http://lucene.apache.org/), for more functions and components.
  • Discuss how the Google search engine can be improved, and describe the next generation of search engine (or even give it a better name).
  • Explore other potential or current applications related to NLP in business and education.

Updated December 16 2021 by FST Course Production Staff