Rootconf Hyderabad edition

On SRE, systems engineering and distributed systems

Participate Propose a session

The art of troubleshooting distributed systems

Submitted by Damini Satya (@daminisatya) on Sunday, 7 April 2019

Section: Crisp talk Technical level: Intermediate Section: Full talk (40 mins) Category: Distributed systems

View proposal in schedule

Abstract

Implementing and running a distributed system, poses unique challenges for Systems/DevOps Engineers. Troubleshooting and debugging issues in distributed systems is a tedious and complex process. The standard approach of gaining insight into system activity by analysing system logs alone is not enough. In this talk we demystify the complex process by presenting some approaches, best practices to tame the beast. These are based on our learnings in running distributed systems at internet scale.

Outline

Speaker bio

Damini Satya is a software engineer at Salesforce building tools for infrastructure automation internally. Previously, she was a speaker at GHC 2018 with a talk titled “Elsa, A conversational agent aimed at improving women’s mental health”, which garnered huge applause from the attendees both at the conference and on social media. She also spoke at GHC 2017 and GHC India 2016 on a wide variety of technical topics. Apart from her presence at GHC, she also presented tech talks at conferences like ReactConf & FOSSASIA. A passionate developer and with a desire to mentor students, she transitioned from her role as a student in Google summer of Code (GSoC) 2016 with the FOSSASIA organization working on a peer to peer scraper system, Loklak, and became a mentor for the organization during GSoC 2017. She is an active open source contributor (Kubernetes) and a part of various open source communities while continually aiming to bringing more women into contributing to open source software.

Links

Comments

  • Zainab Bawa (@zainabbawa) Reviewer 13 days ago

    Thanks for the detailed outline, Damini. The core aspects of the proposal to focus on are:

    1. The case studies – because participants will be interested in hearing your war stories.
    2. What you learned from each case study.

    The part on evolution of distributed systems can be cut down, since you can assume that most of the audience coming to Rootconf Hyderabad will be aware of distributed systems. Hence, this information is redundant.

    You may also want to check Talina Shrotiya’s talk at Rootconf 2019 where she similarly spoke about SRE War Stories. This is to get cues on how to structure your talk, given the similarity in the overall approach of both of your talks:

    https://hasgeek.tv/rootconf/2019/1711-sre-culture-and-strategy-trustingsocial and https://hasgeek.com/rootconf/2019/proposals/sre-culture-strategy-9uuj5MTuwijgQuZP3aehwM

Login with Twitter or Google to leave a comment