When, why and what database to choose for time-series data analytics?

Submitted Aug 24, 2021

Category: 15 min talk

Introduction
Time series database (TSDB) is optimized for storing and serving data through associated pairs of time and value. They are different from other datastores that track changes to the overall system as INSERTs not UPDATEs. TSDB largely help in forecasting and anomaly detection with seamless application of moving average, exponential smoothing, stationarity, autocorrelation, SARIMA, and other techniques. Hence Knowing where and how to store time-series data can have a dramatic impact on future success.
With so many TSDB available and the fastest growing segment in the market, it becomes difficult to choose the one which fits and solves the data challenge efficiently. In my presentation I’ll compare top 3 TSDB, in the context of the following features

Comparison methodology

Scale - We are looking for a TSDB which can capture change efficiently and later on forecast it accurately and all this has to happen at scale. With ever increasing need to store everything and anything, the scalability feature becomes a top priority for any data system.

Performance - TSDBs must be designed to handle exceptionally large volumes, specifically, performant ingestion (WRITE operations).
Ability to handle large time series dependent scans of many records, and time series aware queries is a second priority in my list.

Precision - Data systems these days are capturing time series at second, millisecond, microsecond, or nanosecond precision levels. Being able to accurately capture change at such a precise level should be a key feature of any TSDB.

Compression - Although it is variable depending on the level of precision the user needs, but it also depends upon the way a database stores it.

Storage - All this data has to be stored in an easy to write/retrieve/analyse/summarize/aggregate manner. Database architectural design plays a major role here and we will see how a purpose built TSDB performs better over a re-engineered one.

Optimization - Data cleaning, lifecycle management and indexing are important too and should be considered while selecting a database.

Trade-0ffs - No system is perfect and there are many trade-offs that one has to experience while choosing the best one for the problem at hand.
I’ll discuss how downsampling, interval search and partitioning help in overcoming the common trade-offs.

Takeaway
One should be able to independently compare and select a time series database for their data systems and analytics.

Submitted by: Amandeep Singh (asingh26@paypal.com)
Company: PayPal India
Designation: Software Developer 3

Submissions for Data Stores track

When, why and what database to choose for time-series data analytics?

Comments