Submit a talk on data

Submit talks on data engineering, data science, machine learning, big data and analytics through the year – 2018

Generating Data Analytics Reports using Scalable Config Driven Framework

Submitted by Satish Gopalani (@satishg) on Tuesday, 4 September 2018

videocam_off

Technical level

Intermediate

Section

Full talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +8

Abstract

Generating a prolific number of Analytics Reports from 100’s of different dimensions and metrics for customers and internal stakeholders has been a critical work of BigData Analytics team at PubMatic.
Writing custom jobs to provide analytic reports, leads to repetitive efforts and redundancy of business logic in many different jobs.
Another challenge is scaling the platform which already processes 500 billion transactions (50 terabytes of data) per day on a 900-node cluster with ever-growing volume.
Therefore, we built a platform that allows creating a configuration driven data processing pipeline with highly re-usable business functions. It is also extensible to utilize cutting-edge technologies in the ever-changing big data ecosystem. This platform enables our development teams to build a robust batch data processing pipeline to power analytics dashboards. It also empowers novice users to provide a configuration with fact and dimensions to generate ad-hoc reports in a single data processing job. Framework intelligently identifies and re-uses existing business functions based on user inputs. It also provides an abstraction layer that keeps core business logic un-affected by any technology changes. This framework is currently powered by Spark, but it can be easily configured with other technologies.

Outline

  • Overview of Data Pipelines @ PubMatic
  • Scale and its issues
  • Data Framework Details
  • Uses of the framework and future use cases

Speaker bio

Satish Gopalani
A Machine Learning/AI and Distributed Systems engineer who enjoys solving complex problems and design application and systems to work at scale.Have worked on engineering various complex projects which include building predictive ML project for online advertising, deriving interseting insights on IPL(Indian Premier League), building connectors to offload data to Hadoop and even modifying Hadoop HDFS source code to make Namenode more scalable. I have B.Tech in Computer Science from VIT, Pune and have specialization in “Big Data Analytics” from IIM Bangalore.

Akshay Habbu
A Big Data Engineer with ample of experience working at scale with Spark, MapReduce and HDFS. Handled more than 60TB of data streaming everyday in the cluster of 900 nodes with 45PB under management. Deeply intereseted in designing & implementing complex & scalable data processing pipelines. Have varied interests ranging from bigdata, analytics, software engineering to being a food blogger.

Comments

  • 1
    Zainab Bawa (@zainabbawa) Reviewer 2 months ago (edited 2 months ago)

    We only accept one speaker per session. Let us know who is the primary contact here, who will present, if this proposal is selected. Also, submit draft slides and preview video for this proposal by no later than 5 October.

  • 1
    Satish Gopalani (@satishg) Proposer 2 months ago

    Hi Zainab Bawa,

    Will upload the draft slides and preview video soon.

    Regarding speakers, My colleague and I had presented together in ML Mini Conference 2017 Pune.
    You can refer to this proposal: https://anthillinside.talkfunnel.com/2017-miniconf-pune/11-applying-ml-in-adtech-and-lifecycle-of-an-ml-proje
    Actually, Akshay and I had together worked on this and had prepared slides and other stuff, so wanted to present together.

    Thanks,
    Satish Gopalani

    • 1
      Zainab Bawa (@zainabbawa) Reviewer 14 days ago

      Our policy is one speaker per session. If you can’t comply with it, please withdraw your proposal.

Login with Twitter or Google to leave a comment