Democratizing ML at Freshworks
Submitted by Arvind Aravamudhan (@arvindaravamudhan) on Monday, 15 April 2019
Session type: Short talk of 20 mins
The data journey usually begins with raw data, advances to data analytics and then matures to data science. The key for reaching data science maturity is to organize and store data for large scale crunching. ML/AI being one of the key growth drivers for Freshworks, in the presentation I will walk through how we solved the data organization and access problem for ML/AI use cases by building our own data lake.
1.Purpose - Support multiproduct, time, cost, Security & Complaince
2.Driving Value from Data and adoption at Freshworks
3.Patterns of Data Flow
4.Ingestion, Self Service Portal and ML pipelines - version 1 & 2 of pipelines
5.Security - Kerberos, OAuth and Sentry
a) CDC - Faster Data b) Spot Instances - Cheaper (Cost)
Arvind heads the Data Engineering team at Freshworks, which, among other things, helps democratize ML and works closely with Analysts to help generate insights. The team is responsible building and managing the Data Lake and its pipelines. He has been working on moving some of the company’s core datasets from being processed in a once-a-day daily batch ETL to being processed in near real time. Previously, Arvind was an Architect at Banca Sella where he helped build and architect Fin-Tech solutions and was focused on identifying the Technology, Tools and Methodologies which aee most appropriate to the Banking Domain and highly transactional distributed systems.
He has also published blogs where he explains how he went about building one at Freshworks.