The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores

Submitted by Regunath Balasubramanian (@regunathb) on Monday, 22 May 2017

videocam
Preview video

Technical level

Intermediate

Section

Full talk for data engineering track

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +2

Abstract

In system building, data store choices affect system scalability more often than language platforms. Frequently it is also the single most constrained resource in the application stack. While most database vendors will want you to believe their solution is the panacea for database scalability problems, it only leaves a developer confused among the plethora of SQL and NoSQL databases. This talk will guide you through key information, trends, innovations and challenges in choosing your next database/store.
A well informed choice looks at a number of factors such as : Schema, Query language, Data Distribution, Tools for operations, Ability to efficiently leverage available resources - CPU, Memory, Disk(SSD), Moving data to Analtics and/or across data centres, Query engine, Transaction/Durability and Consistency guarantees, ability to leverage Non-Volatile memory(NVM) etc.
In this talk I will cover many of these factors that influence datastore selection for different applications, catering to varied types of workloads - and will provide examples from my work in building Aadhaar database, the Flipkart catalog systems and more recently in HealthFace where we are building data driven decision systems like chatbots, automated symptom analysis and personal health records.

Outline

  • Introduction : talk about how a developer is spoilt for choice, landscape of databases/datastores available.
  • Database types : Relational, Columnar, KV etc.
  • Storage choices : Append-only, In-place updates
  • Different guarantees : Durability, CAP properties, Replication
  • Gotchas and ways to validate DB vendor claims. E.g. Jepsen tests
  • Polyglot persistence : How to build large database like Aadhaar, Flipkart catalog that scales to billion+ data records and can serve millions of requests per second
  • Challenges in using polyglot persistence
  • Case studies and examples from Aadhaar, Flipkart and HealthFace/CureFit at appropriate points in the discussion

Speaker bio

Regunath is an open source developer, engineer who built Aadhaar and later was responsible for Flipkart platform services. He is currently at HealthFace building data-driven decision systems for healthcare and personal health records.

Links

Slides

https://www.slideshare.net/regunathbalasubramanian/what-database

Preview video

https://hasgeek.tv/fifthelephant/2013-2/626-latency-and-fault-tolerance-in-oltp-1-5-billion-day-service-calls

Comments

  • 1
    Abhishek Balaji (@booleanbalaji) Reviewer a year ago

    Hi Regunath,

    Please upload draft slides outlining what you intend to cover in your talk and a two-min preview video explaining what the talk is about and what the key takeaway is for participants. We need this information by 29 May to evaluate your proposal.

  • 1
    Abhishek Balaji (@booleanbalaji) Reviewer a year ago

    Hello Regunath,

    Please upload the drafts slides for your talk as requested before. You need to upload this before 12 June for the editorial team to begin reviewing your proposal.

Login with Twitter or Google to leave a comment