Sep 2021
30 Mon
31 Tue
1 Wed
2 Thu
3 Fri 12:00 PM – 03:30 PM IST
4 Sat
5 Sun
Sep 2021
30 Mon
31 Tue
1 Wed
2 Thu
3 Fri 12:00 PM – 03:30 PM IST
4 Sat
5 Sun
Kalyanasundaram Somasundaram
Abstract
“In its early days, the LinkedIn data ecosystem was quite simple. A single RDBMS contained a handful of tables for user data such as profiles, connections, etc. This RDBMS was augmented with two specialized systems: one provided full text search of the corpus of user profile data, the other provided efficient traversal of the relationship graph. These latter two systems were kept up-to-date by Databus, a change capture stream that propagates writes to the RDBMS
primary data store, in commit order, to the search and graph clusters. Over the years, as LinkedIn evolved, so did its data needs”.
The above is an excerpt from Linkedin’s Espresso paper in 2013. At that time Linkedin had 200 million users worldwide. With a growth phase that followed, the user base today is ~4x that number, add to it the ever increasing user engagement and new feature rollouts. During this growth phase, LinkedIn data systems evolved for each of our use case. In this talk, we will attempt to give a glimpse of our Online Storage ecosystem and its evolution.
Online datasystem like Oracle and MySQL evolved from single datacenter to multi datacenter.
In addition to the above Relational systems, Online storage fleet today houses :
Custom NoSQL cluster(s)
Derived Data Store(s)
BLOB Storage
Couchbase
OLAP system
Cluster Manager/State Machine
Provisioner
All these components form the online storage stack for Linkedin. Each one has a unique use case and we strongly believe that “one size fits all” isn’t true in the data realm!.
Sep 2021
30 Mon
31 Tue
1 Wed
2 Thu
3 Fri 12:00 PM – 03:30 PM IST
4 Sat
5 Sun
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}