Make a submission

Accepting submissions till 15 Jun 2019, 01:00 PM

NIMHANS Convention Centre, Bengaluru

Tickets

Loading…

##The eighth edition of The Fifth Elephant will be held in Bangalore on 25 and 26 July. A thousand data scientists, ML engineers, data engineers and analysts will gather at the NIMHANS Convention Centre in Bangalore to discuss:

  1. Model management, including data cleaning, instrumentation and productionizing data science.
  2. Bad data and case studies of failure in building data products.
  3. Identifying and handling fraud + data security at scale
  4. Applications of data science in agriculture, media and marketing, supply chain, geo-location, SaaS and e-commerce.
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

##Highlights:

1. Meet Peter Wang, co-founder of Anaconda Inc, and learn about why data privacy is the first step towards robust data management; the journey of building Anaconda; and Anaconda in enterprise.
2. Talk to the Fulfillment and Supply Group (FSG) team from Flipkart, and learn about their work with platform engineering where ground truths are the source of data.
3. Attend tutorials on Deep Learning with RedisAI; TransmorgifyAI, Salesforce’s open source AutoML.
4. Discuss interesting problems to solve with data science in agriculture, SaaS perspective on multi-tenancy in Machine Learning (with the Freshworks team), bias in intent classification and recommendations.
5. Meet data science, data engineering and product teams from sponsoring companies to understand how they are handling data and leveraging intelligence from data to solve interesting problems.

##Why you should attend?

  1. Network with peers and practitioners from the data ecosystem
  2. Share approaches to solving expensive problems such as cleanliness of training data, model management and versioning data
  3. Demo your ideas in the demo session
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

##Full schedule published here: https://hasgeek.com/fifthelephant/2019/schedule

##Contact details:
For more information about The Fifth Elephant, sponsorships, or any other information call +91-7676332020 or email info@hasgeek.com

#Sponsors:

Sponsorship Deck.
Email sales@hasgeek.com for bulk ticket purchases, and sponsoring 2019 edition of JSFoo:VueDay.

JSFoo:VueDay 2019 sponsors:

#Platinum Sponsor

Anatta

#Community Sponsors

Salesforce Ericsson freshworks
databricks

#Exhibition Sponsors

Sapient Atlassian GO-JEK
Bayer

#Bronze Sponsor

Sumologic Walmart Labs Atlan
Simpl Great Learning

#Community Sponsors

Elastic Anaconda Aruba Networks

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Pruthvi Raj

@pruthvirajeranti

Diksuchi: Data quality Monitoring platform for @scale batch data pipelines at Walmart

Submitted Jun 14, 2019

We the customer Backbone team at Walmart, are building customer identity and activity graph with around 20+ Billion nodes and 30 Billion edges, that works to be the lifeline of customer data for multiple pillars such as marketing, targeting, personalization, data sciences, etc. While building the graph using spark and hive pipelines, we generate many intermediate tables/states and output tables.
To provide high quality data to our teams, we have built a data quality monitoring platform, Diksuchi (meaning compass), that provides metrics, audit, monitoring, and alerting on our data pipelines for quick and easy debugging. The monitoring platform runs alongside the processing-heavy pipelines, heavy lifting the work of calculating metrics, checking the correctness of data, anomaly detection in the inputs & outputs of data and raise alarms. Diksuchi also provides dashboards for easy navigation and debugging.

The platform is uniquely developed to enable any developer, data scientist, analyst to write a simple configuration and onboard a new data processing pipeline anywhere in Walmart and monitor its data quality and correctness.

Outline

This talk will be covering below topics -

  1. Introduction to problem statement
  2. Why every company need this kind of platform
  3. Journey of metrics platform
  4. Demo of the platform
  5. Setting alert rules & Anomaly detection
  6. Takeaways from platform

Requirements

Basic understanding of Spark, Airflow, Elasticsearch Hive, Grafana

Speaker bio

Pruthvi is a senior data engineer at WalmartLabs, and he is working on customer backbone team for more than an year. He along with his team members, developed a customer identity and activity graph platform and a data quality monitoring platform during this time

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 15 Jun 2019, 01:00 PM

NIMHANS Convention Centre, Bengaluru

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more