Choosing datastores
Rootconf For members

Choosing datastores

Guide on how to select datastores to solve different problems

Make a submission

Accepting submissions till 15 Aug 2021, 11:59 PM

Tickets

Loading…

How do you select datastores and be aware of their limitations when applied to the problem at hand? Are there misconceptions you wish someone had cleared for you as you started on your journey of scaling with datastores?

Choosing data stores for your use cases conference will help you understand:

  • Running datastores at scale - and tuning, debugging and operations.
  • Solving specific use cases with a certain datastore.
  • Data modelling and developer experience with datastore.

Senior infrastructure and software engineers from Farfetch, Aerospike, Zeotap, eightfold.ai, LinkedIn and Tesco engineering will share war stories and their learnings with practitioners in the audience.

View schedule at https://hasgeek.com/rootconf/choosing-datastores/schedule

Contact information: Join the Rootconf Telegram group at https://t.me/rootconf or follow @rootconf on Twitter.
For inquiries, contact Rootconf at rootconf.editorial@hasgeek.com or call 7676332020.

Hosted by

We care about site reliability, cloud costs, security and data privacy

Sai Geetha M N

@saigeethamn

A Big Data Store – Performance optimised for writes, reads or both?

Submitted Aug 9, 2021

At Tesco, the 3rd largest retailer in the world, data is huge and so is the urgency in getting the latest data for use in operations and decision making.

We have modernized our demand forecasting system and moved it to the Hadoop platform giving us the power and flexibility of a distributed platform to improve our accuracies with more data and better algorithms. We have also been able to manage the forecast at the most granular level leading to huge data.

Each time we forecast, we generate 1 to 1.2 billion records (about 140 GB of data) three times a day. This is to be saved in a data store and the total data that is queried at any point is about 3 TB of data in a single table/entity store.
We needed a data store that is able to provide fast reads of less than 200 ms response time across 3 TB data and yet we be able to write the bulk data of 1 billion records generated in 15 to 20 minutes without disrupting the read performance. This meant that we had to achieve a write speed of 800k to 1.1. million records/sec and yet not impact the read performance.
We know most data store architectures allow you to tune towards faster reads or faster writes, not both. We evaluated a lot of data stores and finally had to come up with a different architectural pattern in order to be able to achieve this. The same pattern could be applied in a SQL database like Postgres or a NoSQL database like HBase and we did that successfully in both.

In this talk I would like to share how we achieved this, while we continued to support smaller streaming updates as well.In the process we also discuvered a few nuances about tuning HBase for fast reads, which are lesser known. Would like to share that as well, if time permits. Finally, I would like to touch upon: is there a trade off? Can we have it all?

About Me
You can view here: https://www.linkedin.com/in/saigeethamn/
You can view a few of my articles at https://www.saigeetha.in/blog

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 15 Aug 2021, 11:59 PM

Hosted by

We care about site reliability, cloud costs, security and data privacy