arrow_back Hive and Presto for Big Data Analytics in the Cloud
Using Elasticsearch for Analytics
Submitted by Vaidik Kapoor (@vaidik) on Sunday, 18 May 2014
Section: Full talk Technical level: Intermediate
At Wingify, we have built a system to process and store analytics data for our customers, which they can use to slice and dice the data to make more meaningful reports. This talk is about how we solved this problem and how we used Elasticsearch to solve this problem at our scale rather quickly. Audience will take away some of the data problems they can quickly solve with Elasticsearch.
At Wingify, we collect data for website and mobile A/B testing campaigns created by our customers, store, process and crunch it to make it usable by our customers in the form of various reports that they use for Conversion Rate Optimization. This data is critical for our customers to make engineering, design and business decisions to improve their conversions and achieve their business goals. Being able to generate custom reports according to our customers’ requirements with the ability to slice and dice the data to get more targeted and meaningful reports is an important feature of the core of our application. And the enormous amount of data generated by campaigns created by thousands of customers makes this problem even more difficult as we have to carefully process the data keeping in mind the current and the future needs of the application, store it so that our users can play with it with the utmost flexibility and serve reports created using this data as fast as possible.
This talk will focus on how we discovered and used Elasticsearch to quickly prototype and use it to solve the described problem, how we took our implementation from prototype to production and the challenges we faced along the way:
- A very brief introduction to Elasticsearch and its amazing features that make it a really good system to use for quick prototyping to see how it may solve your problems.
- Architecting the data pipeline for fast writes and accurate updates with the ability to control what to process and write.
- Planning for fast reads and aggregation of large data sets.
- Designing the system for accuracy and maintaining data consistency.
- Elasticsearch scales when planned. Planning early for scaling needs.
- Preparing to handle problems at scale and fire fighting.
- The things about Elasticsearch we learned and what helped us the most.
- Best practices and tips from our experience of using Elasticsearch for all this.
I am a software engineer based out of New Delhi, working for Wingify, a Delhi based bootstrapped startup that develops the A/B testing tool - Visual Website Optimizer (VWO). At Wingify, I am primarily focused on services, scalability and infrastructure engineering, which also happens to involve working with data and analytics and all the problems that come with it. I am an open-source enthusiast. In my free time, I evangelize and try to organize local meetups and watch movies.