Re-imagining data infrastructures as event-based architectures
Submitted by Raj Rohit (@jalemrajrohit) on Monday, 13 November 2017
This talk would be about how we built a distributed serverless batch data architecture at Episource. This includes the end-to-end ETL pipeline which handles distributed Machine Learning, as well as how we automated ML deployment using the event-based (serverless) paradigm.
- Whirlwind intro on what the serverless paradigm is
- How we built a batch architecture instead of real-time
- How we got around the 5 min. time limit of Lambda to build an end-to-end completely serverless distributed Deep Learning pipeline
- How load balancing and monitoring can be done for such huge, complex systems
- How the serverless paradigm helps re-imagine data architectures for data engineers
Raj Rohit is a senior data scientist at Episource, where he builds ML algorithms, architects data pipelines, stares at endless Linux logs, and is building the company’s DevOps team. Raj is the author of the Julia Cookbook and is also the moderator of Stack Overflow’s DevOps and DataScience sites.