Managed Feature Store: Improving data reusability & Providing a means for low latency real-time prediction at Udaan

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Tickets

All submissions

Previous Next

This submission has been added to the schedule

Managed Feature Store: Improving data reusability & Providing a means for low latency real-time prediction at Udaan

Submitted Jul 11, 2021

A brief talk on Managed Feature Store built on top of Open Source Feast. We will start with a brief walkthrough of the open source Feast feature store including the architecture and core capabilities. We would call out some of the challenges/limitations of the open source Feast feature store. We would then describe some of the enhancements which enables us to have a more robust, secure and scalable deployment by using a) managed resources on Cloud platforms for eg, Kafka vs Event Hub (Azure), open source Spark vs Databricks; b) Integration of RBAC & Table Level Access Control to maintain controlled usage c) Scalable batch ingestion by using Spark instead of Pandas & addition of new capabilities to increase data reusability.

Speakers:
Dr Mohit Kumar (Head - Data Science, Product Analytics and Data Platform)
Sai Sharan Tangeda (Data Scientist)
Time: 30 mins

Agenda

Introduction
1. Introduction
2. Motivation for maintaining a Managed Feature Store
Feast (Open Source): Constructs, Core Capabilities & Limitation
1. Constructs & Architecture of Feast
2. Point In Time Join Capabilities with Batch Retrieval
3. Batch Ingestion into Historical Store & Scale Limitations
4. Streaming Capabilities with Apache Kafka & Redis
5. Reliability issues with self deployed resources like Kafka, Redis, PostgreSQL
Managed Feature Store as a fork of Feast
1. Overview of Core Architecture
2. Integration of Azure Eventhubs as a replacement for Apache Kafka
3. Introducing Databricks as Spark Backend
4. Ensuring Scalability for large data sizes via Spark
5. RBAC & Table Level Access Control for controlled usage
6. End-to-End flow for real-time model serving
Closing Arguments
1. Increase in Productivity with ready-to-use Features

Link to slides: https://drive.google.com/file/d/1ocJNDbEUxXVJqyBVD35k-Vvr1y8hjN5k/view?usp=sharing

All submissions

Previous Next

Comments

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Hybrid access (members only)

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

Supported by

Scribble Data

Scribble Data builds feature stores for data science teams that are serious about putting models (ML, or even sub-ML) into production. The ability to systematically transform data is the single biggest determinant of how well these models do. Scribble Data streamlines the feature engineering proces… more

Promoted

Privacy Mode

Deep dives into privacy and security, and understanding needs of the Indian tech ecosystem through guides, research, collaboration, events and conferences. Sponsors: Privacy Mode’s programmes are sponsored by: more

MLOps Conference

Managed Feature Store: Improving data reusability & Providing a means for low latency real-time prediction at Udaan

Agenda

Comments