The aim of this project is to provide users with a conversational interface to data sets that allow them to first describe what the data is about, where the various elements that they can ask about can be found, and then ask questions about the data.
Getting to insight through conversation
At the core of data-driven journalism is the ability to pull information, insight and understanding out of the data that surrounds us. Unfortunately, the mechanisms information extraction from data makes it difficult for new users to find the information they need in the data sets we have. In the absence of a skilled practitioner, users have to ramp into techniques that will allows them to find complex information in data sets.
The problem is simply that not everyone has these skills and the number of people who understand the questions that need to be asked how to get to the answers is small. We want to overcome this problem by developing a model that links the questions we ask with the processes that are used to answer them in a way that does not require expert knowledge of analytics.
The aim of this project is to provide users with a conversational interface to data sets that allow them to first describe what the data is about, where the various elements that they can ask about can be found, and then ask questions about the data. Users would not know about the details of the relationship between the questions they ask and the analytics that need to be run to answer them. Instead, the system itself would have that knowledge and apply it to produce both visualizations and text in response to questions.
We see this work as a partnership between CS and J in which the design and dynamic of the system will be guided by the journalism students and the development will be executed by the CS students. That is, Journalism will frame the questions and CS will build the machine to provide the answers.
Technical Approach:
Set up core data base as the platform
Select a visualization platform
Select a conversational platform (or not)
Develop a simple language generator to express core ideas
Develop a model of context to be used to better interpret series of questions.
Students will learn about the relationship between data, analytics and meaning. The resulting system would be very interesting to journalist and to anyone who has data and wants to better understand it.