In 2014, infrastructure components such as Hadoop, Berkeley Data Stack and other commercial tools have stabilized and are thriving. The challenges have moved higher up the stack from data collection and storage to data analysis and its presentation to users. The focus for this year’s conference on analytics – the infrastructure that powers analytics and how analytics is done.
Talks will cover various forms of analytics including real-time and opportunity analytics, and technologies and models used for analyzing data.
Proposals will be reviewed using 5 criteria:
Domain diversity – proposals will be selected from different domains – medical, insurance, banking, online transactions, retail. If there is more than one proposal from a domain, the one which meets the editorial criteria will be chosen.
Novelty – what has been done beyond the obvious.
Insights – what insights does the proposal share with the audience that they did not know earlier.
Practical versus theoretical – we are looking for applied knowledge. If the proposal covers material that can be looked up online, it will not be considered.
Conceptual versus tools-centric – tell us why, not how. Tell the audience what was the philosophy underlying your use of an application, not how an application was used.
Presentation skills – proposer’s presentation skills will be reviewed carefully and assistance provided to ensure that the material is communicated in the most precise and effective manner to the audience.
For queries about proposals / submissions, write to email@example.com
Data Collection and Transport – for e.g, Opendatatoolkit, Scribe, Kafka, RabbitMQ, etc.
Data Storage, Caching and Management – Distributed storage (such as Gluster, HDFS) or hardware-specific (such as SSD or memory) or databases (Postgresql, MySQL, Infobright) or caching/storage (Memcache, Cassandra, Redis, etc).
Data Processing, Querying and Analysis – Oozie, Azkaban, scikit-learn, Mahout, Impala, Hive, Tez, etc.
Big data and security
Big data and internet of things
Data Usage and BI (Business Intelligence) in different sectors.
Please note: the technology stacks mentioned above indicate latest technologies that will be of interest to the community. Talks should not be on the technologies per se, but how these have been used and implemented in various sectors, enterprises and contexts.
Serving user intent : Facebook style notifications using HBase and Event streams
This talk is about building a low-latency, near real-time Notifications platform for serving user intent using Event based architecture, Complex Event Processing and a data store like HBase. Will also cover how millisecond response times are achieved when accessing data from 100 million rows by interpreting change from immutable events and organizing data as LSM trees.
Relevant and Personalized notifications in near real-time is a great way of serving user intent. The intent may vary - say liking a Facebook update as compared to a price drop for a browsed product on an e-commerce website. The system characteristics and solution patterns in both these instances may be very similar though.
This talk will cover the design of the Flipkart Notifications platform. The techniques and technologies used to serve product related intent can be easily applied to a different domain. This talk will also introduce projects that were Open Sourced while building the platform.
Architecture, Design patterns and technologies used in this system include:
- Pre-creating data that matches user intent - so as to significantly reduce data serving latencies
- Storing immutable events and interpreting change
- Event driven architectures(EDA) and its variant Staged EDA (SEDA) using technologies like RabbitMQ and Mule.
- Complex Event Processing (CEP) using technologies like Esper
- Data stores like HBase that organize data between memory and disk as Log Structured Merged (LSM) trees - leveraging Disk transfer better over Disk seek
- A data serving API that is resilient to failures and latencies - using Hystrix and Netty
The talk uses a typical e-commerce experience where user intent is either implicit or interpreted from actions - for example a user browsing a product of interest, adding an item to a shopping cart or adding it for future reference via a wish-list. In a dynamic e-commerce marketplace, product data (such as price, stock quantity) is constantly changing across millions of listed products even as user intent is being expressed on the website. User intent may be seen as one Event stream while Product attribute changes is another. An intersection of these two streams is the Notification data. An efficient data store that can store and serve tens of millions of such notifications with very low latencies is the Notification service.
The following projects were open sourced before or when building the Notifications Platform :
The talk will also feature a live view of the data serving metrics with millisecond response times.
Just technical curiosity about how those notifications on Facebook or Flipkart are delivered at scale. An appreciation of data stores (SQL and NoSQL) and their characteristics will also help. A big plus if you have spent time trying to solve similar problems.
Architect and Open source committer. My areas of interest are Distributed Systems, Big Data, Text Mining and Data Stores.
My experience as Architect includes:
- Building the World’s largest biometric identity platform in Aadhaar
- Customer facing Mobile and Web platforms at India’s leading e-Commerce company - Flipkart
Most of my work in recent years has been around OSS - using it to build large scale systems and in contributing projects back to the community. Some of my OSS work is downloaded and used worldwide:
Active projects on github : https://github.com/regunathb