What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores

This submission has been added to the schedule

Powered by VideoKen

Preview video

RB

What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores

Submitted May 22, 2017

Section: Full talk for data engineering track Technical level: Intermediate

In system building, data store choices affect system scalability more often than language platforms. Frequently it is also the single most constrained resource in the application stack. While most database vendors will want you to believe their solution is the panacea for database scalability problems, it only leaves a developer confused among the plethora of SQL and NoSQL databases. This talk will guide you through key information, trends, innovations and challenges in choosing your next database/store.
A well informed choice looks at a number of factors such as : Schema, Query language, Data Distribution, Tools for operations, Ability to efficiently leverage available resources - CPU, Memory, Disk(SSD), Moving data to Analtics and/or across data centres, Query engine, Transaction/Durability and Consistency guarantees, ability to leverage Non-Volatile memory(NVM) etc.
In this talk I will cover many of these factors that influence datastore selection for different applications, catering to varied types of workloads - and will provide examples from my work in building Aadhaar database, the Flipkart catalog systems and more recently in HealthFace where we are building data driven decision systems like chatbots, automated symptom analysis and personal health records.

Outline

Introduction : talk about how a developer is spoilt for choice, landscape of databases/datastores available.
Database types : Relational, Columnar, KV etc.
Storage choices : Append-only, In-place updates
Different guarantees : Durability, CAP properties, Replication
Gotchas and ways to validate DB vendor claims. E.g. Jepsen tests
Polyglot persistence : How to build large database like Aadhaar, Flipkart catalog that scales to billion+ data records and can serve millions of requests per second
Challenges in using polyglot persistence
Case studies and examples from Aadhaar, Flipkart and HealthFace/CureFit at appropriate points in the discussion

Speaker bio

Regunath is an open source developer, engineer who built Aadhaar and later was responsible for Flipkart platform services. He is currently at HealthFace building data-driven decision systems for healthcare and personal health records.

Links

Some links on my previous talks at Strata Hadoop Conference, Fifth Elephant:
https://www.youtube.com/watch?v=08sq0y8V1sE
https://conferences.oreilly.com/strata/big-data-conference-sg-2015/public/schedule/detail/45043
https://www.youtube.com/watch?v=n4KTj9qwasA

Slides

https://www.slideshare.net/regunathbalasubramanian/what-database

The Fifth Elephant 2017

What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores

Outline

Speaker bio

Links

Slides

Comments