The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Anand S


Automating news discovery in real-time

Submitted Jun 15, 2015

The breaking news segment is an intensely competitive market with players from the TV, radio, online, mobile and print space competing for attention. The ability to discover trends early and “break” them is an edge.

This session talks through some of the techniques used in an ongoing media engagement to automatically source real-time news, to cluster them, to filter the relevant ones, and build a storyline around these.


The session will cover:

  • How to source news in real time from social media (Twitter, Facebook, Google), online news media, financial markets and other sources
  • How to filter these based on their relative level of news-worthiness
  • How to cluster them based on similarity
  • How to identify related news to build a story around the topic


The session will focus more on technique than technology. WHile I will be sharing code, you can inspect that later. The talk itself will be layman friendly.

But to get the most out of the code, you’d need:

  • a working knowledge of REST APIs
  • enough Python knowledge to build a scraper
  • enough HTML/CSS/JS knowledge to build a Chrome plugin
  • enough stats to understand k-means clustering
  • whatever natural language processing you’ve learnt from a writing a few NLTK programs

Speaker bio

Anand is the Chief Data Scientist at He has advised and designed IT systems for media organizations such as the Times Group, the India Today Group, The Guardian, CNN-IBN, etc.

Anand and his team explore insights from data and communicates these as visual stories. Anand also builds the Gramener Visualisation Server -- Gramener’s flagship product.

Anand has an MBA from IIM Bangalore and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning