The Fifth Elephant 2024 Annual Conference (12th &13th July)
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Manob Chakraborty
The NEST Framework automates the handling of dynamic and nested schemas, making it easier for developers to manage schema changes and maintain accurate, deduplicated tables in Spark. We are excited to present this innovative solution at the Data Engineering Conference.
This session is designed for professionals working with streaming data and managing evolving schema versions at the database level. If you work with deeply nested event tables and need to manage schema evolution and deduplication across different versions which Delta/Iceberg could not manage.
When consuming streaming data, the schema changes with time as new fields are added, old fields are removed or altered. When these fields are nested like struct of array of struct of (string, int, map), the union of data from different schemas becomes cumbersome - in addition to masking or hashing PII information in the nested fields. The data engineer needs to come up with queries to transform the nested fields like arrays and structs.
Participants will gain practical knowledge through slides and a hands-on session on using the framework. They will learn how dynamically manage schema evolution and deduplication processes, see framework in action.
Hosted by
Supported by
Gold Sponsor
Sponsor
Community Partner
Beverage Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}