##The eighth edition of The Fifth Elephant will be held in Bangalore on 25 and 26 July. A thousand data scientists, ML engineers, data engineers and analysts will gather at the NIMHANS Convention Centre in Bangalore to discuss:
- Model management, including data cleaning, instrumentation and productionizing data science.
- Bad data and case studies of failure in building data products.
- Identifying and handling fraud + data security at scale
- Applications of data science in agriculture, media and marketing, supply chain, geo-location, SaaS and e-commerce.
- Feature engineering and ML platforms.
- What it takes to create data-driven cultures in organizations of different scales.
1. Meet Peter Wang, co-founder of Anaconda Inc, and learn about why data privacy is the first step towards robust data management; the journey of building Anaconda; and Anaconda in enterprise.
2. Talk to the Fulfillment and Supply Group (FSG) team from Flipkart, and learn about their work with platform engineering where ground truths are the source of data.
3. Attend tutorials on Deep Learning with RedisAI; TransmorgifyAI, Salesforce’s open source AutoML.
4. Discuss interesting problems to solve with data science in agriculture, SaaS perspective on multi-tenancy in Machine Learning (with the Freshworks team), bias in intent classification and recommendations.
5. Meet data science, data engineering and product teams from sponsoring companies to understand how they are handling data and leveraging intelligence from data to solve interesting problems.
##Why you should attend?
- Network with peers and practitioners from the data ecosystem
- Share approaches to solving expensive problems such as cleanliness of training data, model management and versioning data
- Demo your ideas in the demo session
- Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.
##Full schedule published here: https://hasgeek.com/fifthelephant/2019/schedule
For more information about The Fifth Elephant, sponsorships, or any other information call +91-7676332020 or email email@example.com
How we build highly scalable and multi-tenant orchestration service using Apache Airflow on Kubernetes
Session type: Short talk of 20 mins
We have different use cases which require some sort of workflow management and scheduling.Like there is use case to generate schedule reports. There are ML related use cases to author and manage multi-step workflows. There are ETLs jobs etc..
Currently teams are managing their own scheduler like cron or some workflow manager to meet these use cases. Some teams have also setup Apache Airflow to meet their requirements.
Our team started working on providing a fully managed, Highly scalable & Multi-Tenant Orchestration Service which can be used by different teams to meet their requirements.
In this talk I will cover how we solved different challenges faced while building a Managed, scalable & multi-tenant service using Apache Airflow.
Some of the high level requirements we considered while building orchestration service are
-> Abstraction of Apache Airflow from Users for ease of use.
-> Support for Dynamic on-demand & Static workflows.
-> Support for RestFul apis to author and manage workflows.
-> Support for different scheduling requirements like run only once or daily run.
-> Support to run 1000(s) of concurrent workflows.
-> Support to Airflow Operator Store which can be used by different teams.
-> Support to Customize Airflow config param like parallelism, dir refresh interval to meet different use cases.
We defined a JSON based DSL to simplify the multi-step workflow authoring process. Users can simply create a new workflow by invoking orchestration service’s REST api. Behind the scenes orchestration Service converts the JSON based workflow definition to Airflow Compatible Python DAG. It does all validations while generating a valid Airflow DAG.It also provides APIs to update/delete/read a workflow.
To meet high scale requirements we started using Apache Airflow on kubernetes but during our usage we observed that single Airflow cluster would not be sufficient to meet our required scale. In this talk I will cover how we solved high scale problem by setting up multiple Airflow clusters and by building right abstraction on top of it so that users remain agnostic of this.
In this journey we have also contributed multiple fixes to Apache Airflow in the areas of error handling, kubernetes Executor, Rest Apis, resiliency etc.
I am working as a Sr computer Scientist in Adobe and have been working on Orchestration Service from very begining. I Contributed to its design and played a key role in making it highly scalable on Apache Airflow. In this journey I have also made few contributions to Apache Airflow to improve its performance and resiliency