In the past decade, the widening use of computers has had a profound influence on the way ordinary people communicate, search and store information. For the overwhelming majority of people and situations, the natural vehicle for such information is natural language. Text and to a lesser extent speech are crucial encoding formats for the information revolution.

In this course, you will be given insight into the fundamentals of how computers are used to represent, process and organize textual and spoken information. We will cover the theory and practice of human language technology, going behind the scenes of internet search engines, automatic translators, spam filters, spell-checkers, dialogue systems and more.


The course satisfies the GEC category 2B (Mathematical and Logical Analysis). It does so by using natural language systems to motivate students to exercise and develop a range of basic skills in formal and computational analysis. The course philosophy is to ground abstract concepts in real world examples. We introduce strings, regular expressions, finite-state and context-free grammars, as well as algorithms defined over these structures and techniques for probing and evaluating systems that rely on these algorithms. The course goes beyond merely subjective evaluation of systems, emphasizing analysis and reasoning to draw and argue for valid conclusions about the design, capabilities and behavior of natural language systems.


The basic requirement is regular attendance in class and active participation. There will be roughly one on-line quiz per topic, to ensure the material covered in class and in the readings is mastered. And there will be roughly one homework assignment per topic, which will give you the opportunity to explore new aspects of the topics discussed in class.


