The Fifth Elephant 2019

The eighth edition of India's best data conference

Participate

A journey through Cosmos to understand users.

Submitted by Avinash Ramakanth (@savinashr) on Sunday, 14 April 2019

Session type: Full talk of 40 mins

View proposal in schedule

Abstract

This talk covers the journey of building a cloud native user feedback system for Inmobi DSP. The challenges involved and the need for sharing these learnings can be appreciated by observing that a typical DSP processes anywhere from 250,000 - 1,000,000 queries per second, with an average response time of sub 50 milliseconds. To make intelligent decisions in such high throughput low latency system the supporting user store needs to be highly scalable, extremely cost conscious and reliable. The journey of building such a system in a cloud native setting raises a lot of learnings both emperical and theoretical, which will the focus of this talk. The major headlines of the talk will be
1. Understanding the factors which drive the cost of such a system, how to minimize the operational costs and intelligence to enable auto-scalability.
2. How to do multi version concurrency control.
3. The need for inflight abbreviated compression and how to achieve it with minimal overhead.

Outline

The topics we will be covering in this talk:
1. Introduction - Briefly provide business context to appreciate the need to solve this problem, and challenges involved.
2. The factors driving the decision to choose Cosmos DB as our backend store.
3, Key insights into what drives cost of the store, and various gotchas involved when designing such a system.
4. How to optimize the cost and bring intelligence to enable auto-scalability.
5. The need for building a multi version concurrency control and how to achieve it to enable parallel writes with multiple schema versions for the same record.
6. The tradeoff between readability and storage cost, and how to get the best of both worlds by building an avro library to enable inflight abbreviated compression.

Speaker bio

Avinash Ramakanth:
Tech lead at Inmobi, MSc Computer Systems Indian Institute of Science.
I was part of the group which experimented and conceptualized the design for building the user inference systems for Inmobi DSP. My prior experience for the past 4 years, involve understanding user data at Inmobi and building large scale systems to provide inferences for enabling intelligent ad serving. This work spans across building large scale stream processing systems, ML pipelines to make inferences and various big data applications.

Slides

https://www.slideshare.net/AvinashRamakanth/a-journey-through-cosmos-5th-el-147413401

Comments

  • Venkata Pingali (@pingali) a month ago

    Hi! Avinash,

    A couple of thoughts:

    1. Can you add slides?
    2. I believe that people outside adtech wont know what “a cloud native user feedback system for Inmobi DSP” is. You may want to add a one-liner to describe what the application is.

    -Venkata

  • Abhishek Balaji (@booleanbalaji) Reviewer a month ago

    Feedback from rehearsal 1:

    • Time taken: 30 mins
    • Add an introduction slide at the beginning, with details about yourself. How can people reach you if they have any questions about your rehearsal?
    • Add block diagrams, state transitions, info graphics and flow charts in your slides. Don’t use paragraphs as points to explain. (Auto scaling, factors to consider, etc)
    • Rework the graphs, they’re too grainy to be visible in the auditorium
    • Add content on the differences between having data stores in the cloud vs on premise
    • Reclarify if multiple version concurrency control is the right term to use.
    • Explain the story behind the choices. What led to exploring CosmosDB as a solution, what other solutions did you consider?
    • Rearrange the content to make sure that the talk is about your experience. Don’t switch concepts too fast
    • It’s not convincing enough as a solution, try to explain how it helped your situation, not what it can do.
    • use examples to describe json splitting into multiple parts.
    • Use azure documentation as reference for diagrams and charts
    • Add a conclusion and takeaway slide.

    As next steps, incorporate the feedback in your slides and update the revised slides on your proposal within 10 days, 24 May. We’ll evaluate your revised content and then let you know if your talk can be confirmed for The Fifth Elephant.

  • Zainab Bawa (@zainabbawa) Reviewer a month ago (edited a month ago)

    Avinash, upload the work-in-progress slides here so that we can continue the conversation and review here.

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    Hello Avinash,

    I couldn’t join the rehearsal, but looked through the slides. The following struck me:

    1. The context needs to be stated more clearly with quick explanation of the terms. As was discussed in the rehearsal, those who are not from the Ad Tech domain will not understand the terms.
    2. What led to choosing CosmosDB as a solution, what other solutions did you consider? This has to be explained in the presentation. Participants who come to The Fifth Elephant want to understand why speakers made tooling, architecture and approach choices and how they can abstract the learnings, irrespective of the domain, to evaluate these decisions for themselves.
    3. Rearrange the content to make sure that the talk is about your experience. Don’t switch concepts too fast. I also felt the same way about the slides, that concepts were moving too fast, and that information was put together. Instead of de-personalizing by putting information together as facts, put the facts together as a narrative and experience story.
    4. Share before-and-after scenarios – what was the situation before you switched to CosmosDB and what was the situation after the move? Did you have to make any compromises after the move?

    Look forward to the revised slides.

  • Venkata Pingali (@pingali) 29 days ago

    Hi! Avinash,

    To add to previous reviewers’ comments, I can see that the problem might be interesting but both the problem and solution are getting ‘lost’ in the flow either due to verbosity or articulation. A few thoughts:

    1. Although Cosmos is a key building block, this is about a solution on top of Cosmos. Should the title be performance optimization for customer data management platform?
    2. Can you label the RU chart axes? It is unclear what the chart is showing.
    3. You should explicitly articulate the problem (minimize RU across reads and writes?).
    4. Performance relationships also needs expanding - what is the distribution of reads and writes? At what pace is each growing? How frequently are the records updated?
    5. You need one slide where you are discussing the ‘levers’ to improve performance (splitting documents, indexing, rate control) and what each one does to the performance, and what are their costs.
    6. What was the end state?
    7. Do you have any final thoughts around managing high-performance key-value stores?

    -Venkata

  • Avinash Ramakanth (@savinashr) Proposer 26 days ago (edited 26 days ago)

    Hi Venkata, Zainab, Abhishek and Reviewers,

    Updated the slides. Please have a check. Would love to hear any further suggestions.

  • Venkata Pingali (@pingali) 19 days ago

    Hi! Avinash,

    Had a chance to look through. Coming together well. A few minor issues on presentation:

    1. Slide 8 - you want to show it as 3 columns instead of two (concern, option A, option B)
    2. Slide 15 - you want to title/annotate the slide with what the audience should take away e.g., “write cost grows almost linearly, read cost sub-linearly with size)
    3. How is slide 15 different from 16? And why dont they agree?
    4. Slide 17 and others - convert long sentences into two phrases

Login with Twitter or Google to leave a comment