Talking to Data

The aim of this project is to provide users with a conversational interface to data sets that allow them to first describe what the data is about, where the various elements that they can ask about can be found, and then ask questions about the data.

Getting to insight through conversation

At the core of data-driven journalism is the ability to pull information, insight and understanding out of the data that surrounds us. Unfortunately, the mechanisms information extraction from data makes it difficult for new users to find the information they need in the data sets we have. In the absence of a skilled practitioner, users have to ramp into techniques that will allows them to find complex information in data sets.

The problem is simply that not everyone has these skills and the number of people who understand the questions that need to be asked how to get to the answers is small. We want to overcome this problem by developing a model that links the questions we ask with the processes that are used to answer them in a way that does not require expert knowledge of analytics.

The aim of this project is to provide users with a conversational interface to data sets that allow them to first describe what the data is about, where the various elements that they can ask about can be found, and then ask questions about the data. Users would not know about the details of the relationship between the questions they ask and the analytics that need to be run to answer them. Instead, the system itself would have that knowledge and apply it to produce both visualizations and text in response to questions.

We see this work as a partnership between CS and J in which the design and dynamic of the system will be guided by the journalism students and the development will be executed by the CS students. That is, Journalism will frame the questions and CS will build the machine to provide the answers.

Technical Approach:

  • Set up core data base as the platform

  • Select a visualization platform

  • Select a conversational platform (or not)

  • Develop a simple language generator to express core ideas

  • Develop a model of context to be used to better interpret series of questions.

Faculty and Staff Leads

Kris Hammond

Professor of Electrical Engineering and Computer Science

Prior to joining the faculty at Northwestern, Kris founded the University of Chicago’s Artificial Intelligence Laboratory. His research has been primarily focused on artificial intelligence, machine-generated content and context-driven information systems. Kris currently sits on a United Nations policy committee run by the United Nations Institute for Disarmament Research (UNIDIR). He received his PhD from Yale.

Project Details

2018 Winter

Important Questions

  • What are the questions (what do people need to know)?
  • How to present data in a way that allows someone connect it to the elements that interest them.
  • Who is going to use this?

Sample Milestones

  • Decide on data platform
  • Decide on visualization platform
  • Develop approach to mapping of things I am interested in to data elements
  • Select a first order model of the questions that people want to ask
  • Develop parser to figure out questions/parameters
  • Build out analytics strategy mapped to specific questions

Outcome

Students will learn about the relationship between data, analytics and meaning. The resulting system would be very interesting to journalist and to anyone who has data and wants to better understand it.

Presentation Slides

Students

Daniel Fernandez

Brandon Fujii

Crystal Gong

Alexander Morikado