How Hackers Can Help Journalists Do Their Jobs
Computer code
October 24, 2014

This article summarizes a research paper presented at the 2014 Computation + Journalism Symposium. Basile Simon is a “hacker-journalist” at BBC News Labs. See the full collection of research summaries.

By Basile Simon

More news organizations are creating labs to pursue innovation in news, and these labs are often populated with more engineers and developers than journos.

OpenNews, a partnership between Mozilla and the Knight Foundation, even organizes a 10-month fellowship to place “news nerds” in newsrooms around the world to trigger innovative projects.

At BBC News Labs, we also want to help journalists do their jobs. In our day-to-day work, we have the freedom to experiment and quickly prototype. We pass the best ideas and projects to production teams and to our newsroom to test. We have a fluid collaboration between world-renowned experts in journalism and experts in building things for the web.

We presented two papers to the Computation + Journalism 2014 Symposium, summarized here. The first describes a new, more modern way to reach our audience. The second offers a new way for our journalists to find the story.

Reaching the audience better with linked data

In 2012, a team of data-architects, engineers and web-savvy people got together to bet on the future. They thought that linked data would hold something very promising for news organizations in the foreseeable future. The Juicer project was born.

Linked data aims to build a more sensible web, by promoting links and associations between concepts and content. Some markup can be added by people who write for the web, to tell Google or the reader: “This article is about David Cameron, the Prime Minister, not some guy named David Cameron.”

Databases of facts can be linked together, compared and combined. A user may ask: “Show me all news articles published about Conservative MPs about topic x, and then show me these MPs’ voting records on the same topic, so I can compare.”

Linked data is a clear example of using a computer to accomplish something that would be out of reach had we relied on humans. As explained by BBC’s Jem Rayfield, in the early days of BBC linked data projects, this enabled the publishing of news aggregation pages ‘per athlete,’ ‘per sport’ and ‘per event’ for the 2012 Olympics. That is because of the automatic collation of tags to pieces of content, made possible by a relatively smart semantic engine extracting concepts out of our articles.

We’ve since expanded our semantic platform. It now ingests content coming from more than 50 UK publishers – several hundred by the end of the year – in addition to BBC News and BBC Parliament TV channels. We are rolling out new features almost every week.

A common criticism of linked data is that it remains an academic exercise, a nerds’ stunt. We are trying to push it beyond the realm of academia and explore ways it can be used in practice.

In a recent example, we created a prototype offering a 100 percent-linked-data coverage of the UK EU 2014 elections. It surfaced interesting insights into the way we associate things, as well as surprising results regarding a disrepresentation of certain political parties.

At BBC News, linked data is being taken to production and to audience-facing products. The next generation of our mobile apps, to be released around Christmas, will rely heavily on linked data to create sets of thematic pages for the user, better and automated delivery of local news and subscription to a set of topics by the user.

Our coverage of the 2015 General Election will also be powered by linked data: it would be quite difficult, even for the BBC, to keep updated 350 pages for each and every one of the constituencies. With linked data, we can aggregate the content dynamically.

Tailored dataset monitoring to get the story – and only the story

Computer-assisted reporting can be a very powerful tool for journalists, provided they can use the technology. We believe the solution is not only to train journalists to code, but also to provide tools created by our research and development departments.

Datastringer is our latest project in this direction.

It aims to make it easier to monitor datasets with a journalistic mindset.

We are currently testing Datastringer in close association with BBC journalists from a BBC London local station, as well as with other local partners in northern England. The question we ask them is “What’s your story? What would your headline be?”

The answer to this question provides everything we need to set up the stringer: a dataset or a set of datasets from which the information can be extracted, the way to process the data to achieve the desired result, and finally, the parameters which will define whether a story is “newsworthy” or not.

As an example, our journalists from BBC London were interested in how crime is evolving in the capital. “I would say that a 25 percent rise or fall compared to a year-long average is good enough for me to write about it,” one of them said.

The stringer we created for them gathers crime statistics from open government API from all across London and mashes up the numbers to calculate averages. Then, every day, it asks if new numbers have been published. If that is the case, it will compare the most recent numbers with its averages. If the difference is greater than plus or minus 25 percent, the parameter chosen by our journalists, it will send them an email alert.


These projects need to be seen as experiments. But there is a real value in these technologies and, more importantly, in the fruitful association of journalists and hackers. Could leveraging expert skills to create more efficient reporting and new ways to reach the audiences be the solution to the problems digital journalism is facing today?


Leave a Comment