Linguistics 5801: Computational Linguistics I

This introduction for graduates and advanced undergraduates provides an introduction to theory-driven computational linguistics (CL), focusing on syntax and parsing. The course includes some formal background and emphasizes linking the theoretical discussions to practical experience implementing algorithms and small grammars.

The course is part of the two-course introduction to CL. The second half, 5802, focuses on data-intensive, statistical CL and is offered in Spring.

Instructor: William Schuler

Meeting time: Tuesday and Thursday 9:35am-10:55am in Oxley 103

Prereqs: 3802 (Linguist 384), 5000 (601), CSE 3321, 3521, or 5052; or permission of instructor. Not open to students with credit for Linguist 684.01.

Web site: The updated syllabus, assignments, slides, etc. will be posted here, so check it regularly.

Network account: If you do not have one already, you will need a linguistics network account to obtain some of the resources required for this course. You can set this up with Jim Harmon in Oxley 118 during normal working hours.

Computer lab facilities: With your linguistics network account, you can use the linguistics computer lab in Oxley 218. The computers in this lab are installed with all software required for this course. If software does not appear to be working, you should contact Jim Harmon (Oxley 118) during normal working hours.

Textbook: (optional) Natural Language Toolkit Textbook -- a nice introduction to text processing in Python.

Course Content:

Wk Due Monday 11:59PM Lecture: Tuesday Lecture: Thursday
1 8/21
welcome, set notation, finite state automata
8/23 --- PS1 handout, sample Makefile
regular expressions, regular languages, tools: grep, sed, perl
2 8/28
from regular expressions to unix scripts, tools: make
programming concepts, implementing FSAs, tools: python
3 9/4
9/6 --- PS2 handout
data files and projects
4 9/10 PS1 due 9/11
program correctness, complexity, generalization
5 9/18
context-free grammars, context-free languages
6 9/24 PS2 due 9/25
string recursion, pushdown automata
9/27 --- PS3 handout
7 10/2
recursion in functions and data structures, tools:,
8 10/9
dynamic programming, implementing CFG recognizers
(autumn break)
9 10/15 PS3 due (postponed to 10/17) 10/16 --- PS4 handout
from recognition to parsing, semiring abstraction
from probabilistic CFGs to probability models
10 10/23
11 10/29 PS4 due 10/30 --- PS5 handout
sequence model inference, incremental parsing
12 11/6
13 11/12 PS5 due (postponed to 11/14) 11/13 --- PS6 handout simplewiki.gcg15.linetrees
14 11/20
15 11/27 step-through
16 12/3 PS6 due 12/4
information theory
(end of term)

Successful course participation involves:

Students with Disabilities:

Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations. I rely on the Office of Disability Services for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted the Office for Disability Services are encouraged to do so (292-3307;

Academic Misconduct:

You must do your homeworks, programming assignments, and examinations yourself, ON YOUR OWN. Copying another's work, or allowing (even negligently) others to copy your work, or possession of electronic computing devices in the testing area, is cheating and grounds for penalties.

Academic dishonesty is not allowed and will be reported to the University Committee on Academic Misconduct.