The NEST Framework automates the handling of dynamic and nested schemas, making it easier for developers to manage schema changes and maintain accurate, deduplicated tables in Spark. We are excited to present this innovative solution at the Data Engineering Conference.
This session is designed for professionals working with streaming data and managing evolving schema versions at the database level. If you work with deeply nested event tables and need to manage schema evolution and deduplication across different versions which Delta/Iceberg could not manage.
When consuming streaming data, the schema changes with time as new fields are added, old fields are removed or altered. When these fields are nested like struct of array of struct of (string, int, map), the union of data from different schemas becomes cumbersome - in addition to masking or hashing PII information in the nested fields. The data engineer needs to come up with queries to transform the nested fields like arrays and structs.
- Even though Delta/Iceberg tables support schema evolution, they break if the data type of a column changes or if there are map type fields.
- Also maintenance of complex SQL is an overhead in addition to regular tasks and may result in failures if not handled immediately.
- Schema evolution management using Spark.
- Incremental deduplication of table versions using Delta.
- Masking and small transformations during the deduplication process.
Participants will gain practical knowledge through slides and a hands-on session on using the framework. They will learn how dynamically manage schema evolution and deduplication processes, see framework in action.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}