Automating news discovery in real-time
The breaking news segment is an intensely competitive market with players from the TV, radio, online, mobile and print space competing for attention. The ability to discover trends early and “break” them is an edge.
This session talks through some of the techniques used in an ongoing media engagement to automatically source real-time news, to cluster them, to filter the relevant ones, and build a storyline around these.
The session will cover:
- How to source news in real time from social media (Twitter, Facebook, Google), online news media, financial markets and other sources
- How to filter these based on their relative level of news-worthiness
- How to cluster them based on similarity
- How to identify related news to build a story around the topic
The session will focus more on technique than technology. WHile I will be sharing code, you can inspect that later. The talk itself will be layman friendly.
But to get the most out of the code, you’d need:
- a working knowledge of REST APIs
- enough Python knowledge to build a scraper
- enough HTML/CSS/JS knowledge to build a Chrome plugin
- enough stats to understand k-means clustering
- whatever natural language processing you’ve learnt from a writing a few NLTK programs
Anand is the Chief Data Scientist at Gramener.com. He has advised and designed IT systems for media organizations such as the Times Group, the India Today Group, The Guardian, CNN-IBN, etc.
Anand and his team explore insights from data and communicates these as visual stories. Anand also builds the Gramener Visualisation Server – Gramener’s flagship product.
Anand has an MBA from IIM Bangalore and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at s-anand.net.