The eighth edition of The Fifth Elephant will be held in Bangalore on 25 and 26 July. A thousand data scientists, ML engineers, data engineers and analysts will gather at the NIMHANS Convention Centre in Bangalore to discuss:
- Model management, including data cleaning, instrumentation and productionizing data science.
- Bad data and case studies of failure in building data products.
- Identifying and handling fraud + data security at scale
- Applications of data science in agriculture, media and marketing, supply chain, geo-location, SaaS and e-commerce.
- Feature engineering and ML platforms.
- What it takes to create data-driven cultures in organizations of different scales.
1. Meet Peter Wang, co-founder of Anaconda Inc, and learn about why data privacy is the first step towards robust data management; the journey of building Anaconda; and Anaconda in enterprise.
2. Talk to the Fulfillment and Supply Group (FSG) team from Flipkart, and learn about their work with platform engineering where ground truths are the source of data.
3. Attend tutorials on Deep Learning with RedisAI; TransmorgifyAI, Salesforce’s open source AutoML.
4. Discuss interesting problems to solve with data science in agriculture, SaaS perspective on multi-tenancy in Machine Learning (with the Freshworks team), bias in intent classification and recommendations.
5. Meet data science, data engineering and product teams from sponsoring companies to understand how they are handling data and leveraging intelligence from data to solve interesting problems.
Why you should attend?
- Network with peers and practitioners from the data ecosystem
- Share approaches to solving expensive problems such as cleanliness of training data, model management and versioning data
- Demo your ideas in the demo session
- Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.
Full schedule published here: https://hasgeek.com/fifthelephant/2019/schedule
For more information about The Fifth Elephant, sponsorships, or any other information call +91-7676332020 or email email@example.com
JSFoo:VueDay 2019 sponsors:
Metadata Catalogue - Making sense of all your data, whether stream or store, the self serve way
This talk presents the case for a central metadata catalogue repository for metadata discovery, cataloguing, and control service. This is another step towards enabling self service from your streams. We did this by forking Apache Atla, establishing a central metadata repository to capture metadata across datasets and surface it through a single platform to simplify data discovery and trace its lineage irrespective of formats, locations and tools.
Why should you care though?
Because the communoty is startng to care. There are multiple companies building theri won solutions (namely twitter, linkedin, netflix etc) and there is apache atlas which made its first GA version 1.0, roughly 6 months ago. We adopted it when this project was in incubation and we are happy we did!
What does this cover
Here is a brief overview of what the platform allows its users to do:
1. Discover data and data related artifacts : Data Sources, Events, Databases, Tables, Attributes, ETL Processes, Workflows etc
2. Trace the origin and owner of data
3. Understand data definitions, semantics, and constraints as intended by the producers
4. Trace data flow, evolutions, transformations and dependencies
5. Enable automatic programmatic checks for metadata consistency, and dependencies through an API
- Why you need a central metadata catalogue too.
- Why your schema registry is not enough.
- A quick brief of the open source solutions: by Twitter, linkedin, netflix and the apache offering too.
- Why we based our solution on the apache atlas. And why we maintain a fork of it internally.
- How it helped us make sense of each and every piece of data / message in flight or at rest.
Shiv is a passionate engineer who loves building scalable, fault-tolerant & highly available platforms. Shiv has contributed to multiple open source projects including apache pulsar, mysql, apache atlas etc. Shiv has worked on a variety of products ranging from backend platforms to infra to web applications and loves collaborating with people sharing and gathering knowledge through the open source community. Shiv has previously been a speaker at multiple open source conferences including FOSS ASIA, OPEN SOURCE INDIA etc.