Generating Data Analytics Reports using Scalable Config Driven Framework

##This space is open for submitting proposals on data engineering, data science, machine learning, big data and analytics through the year in 2019.

We will host data events round the year, in 2019. Talks for these conferences will be selected from here. Submit a proposal any time.

##Should you have queries, write to us on fifthelephant.editorial@hasgeek.com or call us on 7676332020

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

Generating Data Analytics Reports using Scalable Config Driven Framework

Submitted Sep 4, 2018

Technical level: Intermediate

Generating a prolific number of Analytics Reports from 100’s of different dimensions and metrics for customers and internal stakeholders has been a critical work of BigData Analytics team at PubMatic.
Writing custom jobs to provide analytic reports, leads to repetitive efforts and redundancy of business logic in many different jobs.
Another challenge is scaling the platform which already processes 500 billion transactions (50 terabytes of data) per day on a 900-node cluster with ever-growing volume.
Therefore, we built a platform that allows creating a configuration driven data processing pipeline with highly re-usable business functions. It is also extensible to utilize cutting-edge technologies in the ever-changing big data ecosystem. This platform enables our development teams to build a robust batch data processing pipeline to power analytics dashboards. It also empowers novice users to provide a configuration with fact and dimensions to generate ad-hoc reports in a single data processing job. Framework intelligently identifies and re-uses existing business functions based on user inputs. It also provides an abstraction layer that keeps core business logic un-affected by any technology changes. This framework is currently powered by Spark, but it can be easily configured with other technologies.

Outline

Overview of Data Pipelines @ PubMatic
Scale and its issues
Data Framework Details
Uses of the framework and future use cases

Speaker bio

Satish Gopalani
A Machine Learning/AI and Distributed Systems engineer who enjoys solving complex problems and design application and systems to work at scale.Have worked on engineering various complex projects which include building predictive ML project for online advertising, deriving interseting insights on IPL(Indian Premier League), building connectors to offload data to Hadoop and even modifying Hadoop HDFS source code to make Namenode more scalable. I have B.Tech in Computer Science from VIT, Pune and have specialization in “Big Data Analytics” from IIM Bangalore.

Akshay Habbu
A Big Data Engineer with ample of experience working at scale with Spark, MapReduce and HDFS. Handled more than 60TB of data streaming everyday in the cluster of 900 nodes with 45PB under management. Deeply intereseted in designing & implementing complex & scalable data processing pipelines. Have varied interests ranging from bigdata, analytics, software engineering to being a food blogger.

All submissions

Previous Next

Comments

Hosted by

The Fifth Elephant

Submit a talk on data

Generating Data Analytics Reports using Scalable Config Driven Framework

Outline

Speaker bio

Comments