Plumbing data science pipelines arrow_forward
What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores
Submitted by Regunath Balasubramanian (@regunathb) on Monday, 22 May 2017
Section: Full talk for data engineering track Technical level: Intermediate
In system building, data store choices affect system scalability more often than language platforms. Frequently it is also the single most constrained resource in the application stack. While most database vendors will want you to believe their solution is the panacea for database scalability problems, it only leaves a developer confused among the plethora of SQL and NoSQL databases. This talk will guide you through key information, trends, innovations and challenges in choosing your next database/store.
A well informed choice looks at a number of factors such as : Schema, Query language, Data Distribution, Tools for operations, Ability to efficiently leverage available resources - CPU, Memory, Disk(SSD), Moving data to Analtics and/or across data centres, Query engine, Transaction/Durability and Consistency guarantees, ability to leverage Non-Volatile memory(NVM) etc.
In this talk I will cover many of these factors that influence datastore selection for different applications, catering to varied types of workloads - and will provide examples from my work in building Aadhaar database, the Flipkart catalog systems and more recently in HealthFace where we are building data driven decision systems like chatbots, automated symptom analysis and personal health records.
- Introduction : talk about how a developer is spoilt for choice, landscape of databases/datastores available.
- Database types : Relational, Columnar, KV etc.
- Storage choices : Append-only, In-place updates
- Different guarantees : Durability, CAP properties, Replication
- Gotchas and ways to validate DB vendor claims. E.g. Jepsen tests
- Polyglot persistence : How to build large database like Aadhaar, Flipkart catalog that scales to billion+ data records and can serve millions of requests per second
- Challenges in using polyglot persistence
- Case studies and examples from Aadhaar, Flipkart and HealthFace/CureFit at appropriate points in the discussion
Regunath is an open source developer, engineer who built Aadhaar and later was responsible for Flipkart platform services. He is currently at HealthFace building data-driven decision systems for healthcare and personal health records.