Topics in Language and Vision

Semester:Fall 2015
Meeting time:Tuesday/Thursday, 11:10-12:25
Room:Jennings 50
Instructor:Micha Elsner (melsner@ling.osu.edu)
Office hours:by appointment

Language and vision is a rapidly expanding area of study in which many exciting contributions, both technical and theoretical, remain to be made. In this seminar, we will work together to understand key questions about how language interacts with scene perception and understanding:

How do computer vision and language work together in automatic caption generation systems? How do listeners use information about visual contrasts (like "tall") to find the right object quickly? How do speakers select which information to include in referring expressions, and how should this inform our theories of pragmatics? How do groups or pairs of people collaboratively establish names for landmarks or abstract shapes in a shared scene?

After covering basic material in these areas, we will select more advanced topics to cover based on participants' interests. Students will be expected to read and comment on the selected research papers, and to complete a term project which could later serve as the basis for a qualifying paper or conference submission.

The course will cover both experimental and computational research. No programming is required. A one-credit option without a project is available with instructor permission.

Students with graduate coursework such as Ling 5801/2 (Comp Ling), 5451 (Pragmatics) or 5701 (Psycholinguistics), or courses of an equivalent level in Artificial Intelligence, Computer Vision or Perception should feel welcome to sign up for the course. Other interested students are encouraged to consult the instructor before signing up.

Students will be required to post a question on each reading to Carmen the day before class; these questions will be organized and raised in discussion by the leader for that day.

Grades will be based 50% on questions and discussions, 50% on projects.

Schedule

Introduction (.5 weeks)

T Aug 25

  • lecture

The human visual system (1 weeks)

R 27: Treisman, "Preattentive processing in vision"

T Sept 1: Rosenholtz et al, "A summary statistic representation in peripheral vision explains visual search"

"Visual world" experiments (1 weeks)

R 3: Sedivy et al, "Achieving incremental semantic interpretation through contextual representation"

T 8: Mitchell et al, "On the use of size modifiers when referring to visual objects", "Two approaches for generating size modifiers"

The computer visual system (1 week)

R 10: Heitz and Koller, "Learning spatial context: using stuff to find things" and may begin Kiapour et al, "Hipster wars: discovering elements of fashion styles"

T 15: Kiapour et al, "Hipster wars: discovering elements of fashion styles" and Berg et al, "Understanding and predicting importance in images"

Caption generation (1.5 weeks)

R 17: Hodosh et al "Framing image description as a ranking task: Data, models and evaluation metrics (ex. abstract)" and Mason and Charniak "Domain-specific image captioning"

T 22: Elliott and Keller "Image description using visual dependency relations"

R 24: Fang et al, "From captions to visual concepts and back", Karpathy and Fei-Fei "Deep visual-semantic alignments for generating image descriptions"

Referring expressions

T 29: Krahmer and van Deemter, "Computational REG: a survey"

R Oct 1: Mitchell et al "Generating expressions that refer to visible objects" (change this?), Viethen and Dale "Algorithms for generating referring expressions: do they do what people do?"

T 6: Paraboni et al "Overspecified reference in hierarchical domains: measuring the benefits" and Arts et al "Overspecification facilitates object identification"

R 8: Pechmann "Incremental speech production and referential overspecification"

T 13: Elsner et al, "A model of real-time processing of visual context during referring expression generation"

(R 15: Fall break)

Collaborative reference (1 week)

T 20 Garrod and Doherty, "Conversation, co-ordination and convention: an empirical investigation of how groups establish linguistic conventions"

R 22 Viethen, Dale and Guhe: "Referring in dialogue: alignment or construction?"

Class-proposed papers

T 27 R 29 T Nov 3 R 5 T 10 R 12 T 17 R 19 T 24 (R 26: Thanksgiving) T Dec 1 R 3 T 8

Collaboration policy

This is a seminar, so go ahead and discuss the papers and your projects outside of class; however, please write the questions you post to Carmen yourself. If you want to collaborate on a project, please see me; otherwise, your project writeup is expected to be your own work, although you are encouraged to use software, datasets and articles written by others, as long as you give credit where due using citations. See the COAM site http://oaa.osu.edu/coam.html

Disability policy

Any student who feels they may need an accommodation based on the impact of a disability should contact me privately to discuss their specific needs, and contact the Office for Disability Services at 614-292-3307 in room 150 Pomerene Hall to coordinate reasonable accommodations.