##The eighth edition of The Fifth Elephant will be held in Bangalore on 25 and 26 July. A thousand data scientists, ML engineers, data engineers and analysts will gather at the NIMHANS Convention Centre in Bangalore to discuss:
- Model management, including data cleaning, instrumentation and productionizing data science.
- Bad data and case studies of failure in building data products.
- Identifying and handling fraud + data security at scale
- Applications of data science in agriculture, media and marketing, supply chain, geo-location, SaaS and e-commerce.
- Feature engineering and ML platforms.
- What it takes to create data-driven cultures in organizations of different scales.
1. Meet Peter Wang, co-founder of Anaconda Inc, and learn about why data privacy is the first step towards robust data management; the journey of building Anaconda; and Anaconda in enterprise.
2. Talk to the Fulfillment and Supply Group (FSG) team from Flipkart, and learn about their work with platform engineering where ground truths are the source of data.
3. Attend tutorials on Deep Learning with RedisAI; TransmorgifyAI, Salesforce’s open source AutoML.
4. Discuss interesting problems to solve with data science in agriculture, SaaS perspective on multi-tenancy in Machine Learning (with the Freshworks team), bias in intent classification and recommendations.
5. Meet data science, data engineering and product teams from sponsoring companies to understand how they are handling data and leveraging intelligence from data to solve interesting problems.
##Why you should attend?
- Network with peers and practitioners from the data ecosystem
- Share approaches to solving expensive problems such as cleanliness of training data, model management and versioning data
- Demo your ideas in the demo session
- Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.
##Full schedule published here: https://hasgeek.com/fifthelephant/2019/schedule
For more information about The Fifth Elephant, sponsorships, or any other information call +91-7676332020 or email email@example.com
Managing Infrastructure for Machine Learning Platform at Walmart scale - Using Kubernetes as the backbone
Session type: Full talk of 40 mins
One of the most critical challenges in bringing Machine Learning to practice is to avoid the various technical debt traps which the data science teams focus on in their day to day jobs. Building a Machine Learning Platform at Walmart has a single agenda i.e. to make it easy for data scientists to use the company’s data to train/build new ML models at scale and making the “single click” deployment experience seamless – However, this experience is possible only by providing a robust infrastructure back-end for the platform.
I would like to share the learnings from setting up infrastructure for building the infrastructure back-end for Machine Learning Platform at Walmart. I would elaborate primarily on how we have used kubernetes as the container management solution for the platform. Key features such as - Dynamic Scaling in kubernetes, going hybrid-cloud with kubernetes, managing very large kubernetes clusters, managing security, resource scheduling and priority, CI/CD kubernetes deployment pipeline, supporting heterogeneous VMs in a single kubernetes cluster (both cpu/gpu), and kubernetes monitoring would be discussed
I would also evaluate our container management solution in comparison with Amazon EKS, Google GKE, Azure AKS and list out the challenges
This talk reflects our journey over the past 14 months – as we went through the journey – starting from a small infrastructure setup on private cloud to going hybrid with 4000+ cores of usage for ML workloads – yet keeping the various DevOps and infrastructure aspects abstracted from data scientists and our platform users
Brief Content Flow
Introduction [3 mins]
A short introduction to the topics that are to be covered. Beginning with the initial introduction to Machine Learning Platform, this section sets the base for discussing the the infrastructure needs in further slides
Infrastructure needs for Machine Learning Platform - Challenges [5 mins]
We will elaborate on the infrastructure needs to support such a platform
Deep-dive into platform infrastructure layers [8 mins]
We will go through a deep-dive into the infrastructure layers of the platform. We will look at the various tools we used, and the choices we had.
Infrastructure Tech stack for container management
Learnings at Walmart scale [12 mins]
Here, we would extend the previous discussion to discuss in detail on how we the entire infrastructure back-end works on kubernetes. We will look at the below challenges/requirements and how we achieved them.
Topics covered: Important aspects of the learnings with kubernetes setup will be discussed:
managing very large kubernetes clusters,
billing our tenants/users,
supporting heterogeneous workloads (both cpu/gpu), and
Lessons Learnt [5 mins]
Again, we discuss some interesting issues we got through - and how not resolving them in time can be trouble in waiting
Conclusion [3 mins]
With a summary of above points covered, a few concluding remarks will be presented.
Overall Q&A [5 mins]
Reserved for Q&A.
Knowledge on kubernetes and docker Technical stack is a must
Aimed at machine learning and DevOps enthusiasts who wish to get started with understanding infrastructure for building platforms at scale with kubernetes.
RaviShankar is a member of the Machine Learning Platform team at Walmart. He has ~14 years of experience in IT. He has completed his masters from BITS Pilani, and completed Executive Management Programme(EGMP) course from IIM,Bangalore. Before working with walmart, he has worked with IBM Labs and Yahoo. He has rich experience working at various levels of the application stack. In current portfolio, Ravi manages the end-to-end infrastructure of the Machine Learning platform.