Automating news discovery in real-time

Jul 2015

13 Mon

14 Tue

15 Wed

16 Thu 08:30 AM – 06:35 PM IST

17 Fri 08:30 AM – 06:30 PM IST

18 Sat 09:00 AM – 06:30 PM IST

19 Sun

NIMHANS Convention center

Automating news discovery in real-time

Submitted Jun 15, 2015

Section: Full Talk Technical level: Beginner

The breaking news segment is an intensely competitive market with players from the TV, radio, online, mobile and print space competing for attention. The ability to discover trends early and “break” them is an edge.

This session talks through some of the techniques used in an ongoing media engagement to automatically source real-time news, to cluster them, to filter the relevant ones, and build a storyline around these.

Outline

The session will cover:

How to source news in real time from social media (Twitter, Facebook, Google), online news media, financial markets and other sources
How to filter these based on their relative level of news-worthiness
How to cluster them based on similarity
How to identify related news to build a story around the topic

Requirements

The session will focus more on technique than technology. WHile I will be sharing code, you can inspect that later. The talk itself will be layman friendly.

But to get the most out of the code, you’d need:

a working knowledge of REST APIs
enough Python knowledge to build a scraper
enough HTML/CSS/JS knowledge to build a Chrome plugin
enough stats to understand k-means clustering
whatever natural language processing you’ve learnt from a writing a few NLTK programs

Speaker bio

Anand is the Chief Data Scientist at Gramener.com. He has advised and designed IT systems for media organizations such as the Times Group, the India Today Group, The Guardian, CNN-IBN, etc.

Anand and his team explore insights from data and communicates these as visual stories. Anand also builds the Gramener Visualisation Server -- Gramener’s flagship product.

Anand has an MBA from IIM Bangalore and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at s-anand.net.

The Fifth Elephant 2015