Automation and Crowdsourcing to Support Factchecking
News sites like Politifact and Factcheck.org have defined a whole genre of journalism that pursues fact checking as a public activity, its own form of coverage and content. There is a large audience demand for factchecking. For instance, NPR’s live factcheck of the first 2016 presidential debates led to record site traffic. But journalists’ attention is limited, and given the endless sea of things people say, how should journalists go about identifying and ranking the most newsworthy and important things to fact check?
This project will seek to automate the monitoring for fact checkable claims from the Congressional Record, a very long and dense daily publication that records everything that goes on in congress. Using the Claimbuster API, claims that are initially deemed fact-checkable will be automatically extracted. Then, this project will utilize crowdsourcing techniques, such as Amazon’s Mechanical Turk, to rate the extracted facts on a variety of factors that will help rank the most important and newsworthy for journalists to investigate. Finally, these leads will be designed into a daily email that could be sent to interested journalists to help them decide what to fact check for the day.
Assistant Professor, Director of the Computational Journalism Lab
Northwestern University Asst Professor of Communication & Tow Center fellow. Computational journalism, algorithmic accountability, social computing.
How can we monitor the Congressional Record for fact checkable statements from politicians?
How can we use crowdsourcing to help rate and rank those statements so journalists can pick the most newsworthy and important to actually check?
How can the top fact check leads be designed into an attractive display sent to journalists every day?
Weeks 1-3: Develop scripts to scrape the Congressional Record, parse sentences using natural language processing, and score those statements using Claimbuster.
Weeks 4-6: Setup and test Mechanical Turk (MTurk) for rating of claims. Then develop scripts for ranking automatically extracted claims, submitting them to MTurk for rating, and aggregating those ratings.
Weeks 7-10: Re-ranking claims based on the automated and crowdsourced ratings. Design and develop an output HTML email that can be sent once per day with the top claims.
By the end of the quarter students will have built a functioning prototype of a monitoring system that can scan for fact checkable claims, rank them in a meaningful way to help identify the most interesting, and then present these to journalists in a compelling interface. Students can expect to learn about text processing, crowdsourcing, and user interface design to support journalistic tasks.