Turning Computers into Fact-Checkers
Facts illustration
October 24, 2014

This article summarizes a research paper presented at the 2014 Computation + Journalism Symposium. Jun Yang is a computer science professor at Duke University. Bill Adair is a professor of journalism and public policy at Duke University. Yang and Adair are both part of the uClaim/iCheck team, which developed the fact-checking systems described below. See the full collection of research summaries

By Bill Adair, Jun Yang and the uClaim/iCheck team

The fact-checkers at PolitiFact and FactCheck.org spend hours each day combing through news articles, transcripts of talk shows and campaign advertisements to find factual claims to check.

Each claim requires hours of additional research to check: clarifying vague statements, examining data in different ways and crafting counterarguments. It is laborious, and sometimes mundane.

Our new iCheck/uClaim project seeks to use computers to help with that work. Our goal is not to replace humans, but to free them from some of the drudgery so they can do more complex (and fulfilling!) journalism.

There is a great need for this. The digital age has bombarded news consumers with a sometimes overwhelming amount of information. But much of that information is dubious, which prompted the rise of fact-checkers.

By automating some of the fact-checking, we hope to expand its use in politics and demonstrate how it could be applied to coverage of other topics like sports and weather. We also want to automate the process of monitoring data to find interesting claims.

Our results so far have been surprising. For many claims based on structured data, we have found that we are able to capture the human intuition about their qualities—robustness, fairness, and uniqueness—in mathematical terms. This allows us to use computers to make calculations to determine not only whether a claim is correct, but also whether it is misleading. Knowing what makes a claim high-quality also allows us to use computers to find such claims from data automatically.

In essence, we’re using computational horsepower to do the same things that journalists do when they fact-check. Consider, for example, a claim about a legislator’s voting records: “She voted with the Democratic majority 90 percent of the time.”

A fact-checker would attempt to discover the time period in which the votes in question occurred (all the terms she has served? or just the most recent one?). A fact-checker would also examine other legislators’ voting records. If a well-known Republican voted with the Democratic majority nearly as often, perhaps the original claim isn’t really significant.

There could be millions of ways to tweak the claim and see how its conclusion changes. It would be difficult for journalist to try them all by hand, but that is an ideal task for computers.

Our project, demonstrated at the Computation + Journalism Symposium 2014, has two systems. iCheck is a computerized fact-checker for claims based on structured data. Our demo uses public databases of congressional voting records and Major League Baseball statistics. It checks claims such as: “In a single season, only six players have ever beaten Miguel Cabrera’s season of 197 hits, 30 home runs, and a .340 batting average,” and “Sen. Kay Hagan votes with the Democratic party 90 percent of the time.” Even if the claims are technically correct, iCheck can reveal if they present only partial or biased views of the data.

uClaim is like a tireless staffer who monitors data sources and looks for interesting claims. It uses computer algorithms to search for common claim templates. Examples include the Miguel Cabrera claim above, and ones involving streaks: “In the Detroit Piston’s victory against the Dallas Mavericks on Jan. 13, 1993, Dennis Rodman became the first player in NBA history to have at least 16 rebounds for 24 consecutive games.” uClaim detects interesting claims as they emerge, providing rapid responses that would be impossible or enormously expensive to do by hand.

We’re very encouraged about our progress. Data-driven fact checking and discovery are important new forms of journalism that can benefit from the advances of the digital age.

With the power of computers to carry out rapid, complex calculations over data, we’ve developed new tools that should make it easier for journalists to check more claims, find more facts, and better serve their readers.