Ky Harlin, BuzzFeed’s director of data science, runs simple tests on news stories, with extraordinary results.
For “The 35 Best Places to Visit Over the Summer,” one reader might see a picture of Paris. Another sees a picture of Tokyo.
Harlin runs complex permutations of these A/B tests and analyzes the results using pioneering algorithms he developed to predict when and why stories go viral. The work that he and his team of data scientists do at BuzzFeed has fueled substantial traffic growth for the five-year-old news site as readers widely share its humorous lists and animated GIFs through social networks.
The man behind the science of shareability at BuzzFeed came from an unlikely place—a medical imaging startup, where he was working in September 2010 when he was recruited by Jonah Peretti, BuzzFeed’s founder and CEO.
Peretti said he hired Harlin to work on improving the visibility of the startup’s content. He didn’t consider his previous work as different in terms of data analysis as one might think.
“There are actually lots of similarities between medical imaging and content publishing on a purely mathematically level,” Peretti said in an e-mail interview. “Both fields are looking for patterns in vast data sets. And during the interview, [Harlin] was clearly more interested in understanding how content spreads than medical imaging, so I knew he would be good.”
Harlin said he relished the jump to news and found it exciting to once again be in a startup atmosphere – this time working with both data and social media.
He created special mathematical formulas to analyze characteristics of the site’s content in an attempt to predict how viral each piece of content will be. Given the site’s remarkable success with social media and traffic growth, his viral-detection formulas are ones other organizations would like to get their hands on.
“The work on triggering algorithms was very important and something we use every day,” Peretti said. “And of course all the top secret projects, but I can’t tell you about those,” he joked.
While details of his algorithms are secret, Harlin offered a general description of his approach to data analysis at BuzzFeed:
“There are many variables we look at, both quantitative and descriptive,” he said. “Quantitative factors are things like the amount of times something’s been shared on Facebook, while descriptive factors are things like what’s contained in the text of the article. We employ machine learning algorithms that help us map out the relationship between those variables and shareability.”
Peretti said he was looking for a way to capitalize on A/B testing when he found Harlin. “I knew first-hand the power of A-B testing but also the limits of only testing one variable at a time,” Peretti said. “And I was also familiar with the complexity of social networks, the many confounding variables that influence how content spreads, and the inherent noise in complex systems. It just felt like more firepower was needed to understand what was actually happening and I was very lucky to find [Harlin].”
Dao Nguyen, director of growth at BuzzFeed, said she believes Harlin had a revolutionary impact on the company’s model.
“[Harlin] thinks about how we can better use data to make BuzzFeed a better experience for readers and advertisers,” she said. “He understands when the data makes sense, what data is noise, and when not to be using data at all. He has a really good overall sense of both the editorial and advertising side. That’s pretty new.”
BuzzFeed produces approximately 400-500 articles each day, according to Harlin. The content is produced by editors, contributors, and community members — which requires a lot of orchestration in terms of managing content and analyzing data presented.
“We have a pretty large editorial staff,” Harlin said. “We also have the community aspect at BuzzFeed…and community moderators who look for the best stuff, and then that stuff can actually get published to the front page as well. It is pretty high-volume. It is definitely a challenge in terms of trying to figure out what the good stuff is and what duds are. It’s fun to be able to work on those kinds of problems.”
On a day-to-day basis, Harlin said his job does not change too much. He works closely with the advertising staff as well as the editorial team. He spends a lot of time trying to solve challenges faced by each department at BuzzFeed and working “to connect all the different parts of the company with the data that we have.”
“That’s a pretty big part of my day-to-day job, addressing the needs of various parts of the company by using data,” he said. “The other big part of it is working on larger research problems–just trying to understand how content goes viral, which [is] usually longer term.”