Rootconf 2019

Rootconf 2019

On infrastructure security, DevOps and distributed systems.

About Rootconf 2019:

The seventh edition of Rootconf is a two-track conference with:

  1. Security talks and tutorials in audi 1 and 2 on 21 June.
  2. Talks on DevOps, distributed systems and SRE in audi 1 and audi 2 on 22 June.

Topics and schedule:

View full schedule here: https://hasgeek.com/rootconf/2019/schedule

Rootconf 2019 includes talks and Birds of Feather (BOF) sessions on:

  1. OSINT and its applications
  2. Key management, encryption and its costs
  3. Running a bug bounty programme in your organization
  4. PolarDB architecture as Cloud Native Architecture, developed by Alibaba Cloud
  5. Vitess
  6. SRE and running distributed teams
  7. Routing security
  8. Log analytics
  9. Enabling SRE via automated feedback loops
  10. TOR for DevOps

Who should attend Rootconf?

  1. DevOps programmers
  2. DevOps leads
  3. Systems engineers
  4. Infrastructure security professionals and experts
  5. DevSecOps teams
  6. Cloud service providers
  7. Companies with heavy cloud usage
  8. Providers of the pieces on which an organization’s IT infrastructure runs – monitoring, log management, alerting, etc
  9. Organizations dealing with large network systems where data must be protected
  10. VPs of engineering
  11. Engineering managers looking to optimize infrastructure and teams

For information about Rootconf and bulk ticket purchases, contact info@hasgeek.com or call 7676332020. Only community sponsorships available.

Rootconf 2019 sponsors:

Platinum Sponsor

CRED

Gold Sponsors

Atlassian Endurance Trusting Social

Silver Sponsors

Digital Ocean GO-JEK Paytm

Bronze Sponsors

MySQL sumo logic upcloud
platform sh nilenso CloudSEK

Exhibition Sponsor

FreeBSD Foundation

Community Sponsors

Ansible PlanetScale

Hosted by

Rootconf is a forum for discussions about DevOps, infrastructure management, IT operations, systems engineering, SRE and security (from infrastructure defence perspective). more

Talina Shrotriya

@talina06

SRE: Culture & Strategy

Submitted May 31, 2019

The work of a Site Reliability Engineer is either misconstrued or portrayed less frequently in the tech community. Most of the SRE stories get lost in daily on-call schedules. This talk opens up a gold mine by discussing the issues an SRE team faces and the solutioning done around them. The intent is to provide the audience with a set of case studies which deal with distributed environments and scale.

Outline

The work of an SRE team is to serve a single purpose of shipping code in a fast, reliable and economical manner. 4 Key Principles of SRE are - Measuring risk factors, automation, visibility and simplicity.

Infrastructure Management:

Infrastructure is the entrypoint to deploy code to production. While cloud providers do make this task easy, there are deeper Risk Factors we had to measure, such as - versioning, locking access to concurrent updation of resources and enabling webhooks. We developed a tool called Tessellate to do just this.

Scheduling:

Choosing the right scheduler for the workload and the types of services helps maintain simplicity across all deployments.
We discuss the risk factors involved in our initial scheduler deisgn, and how we solutioned a service to circumvent the risks.

Network:

We discuss network related administrative tasks, to understand why automation is an essential principle of SRE. We walkthrough 2 scenarios where we automated processes and workflows by building light weight services.

Observability:

We discuss how observability is much more than merely gathering metrics. We understand what visibility means, and see how a good monitoring solution helps us gain the right amount of visibility into production systems.

Conclusion:

The core belief of an SRE team is to solve problems for the larger good and not restrict ourselves to the problem at hand. Every single tool we use was solutioned keeping this intent in mind. Each solution was a step towards better visibility and access to production systems and each of these solutions made our on-call shifts manageable. The key takeaway from this talk would be to follow the same approach and taking a step back and thinking twice before doing something manually, thinking whether this problem is repeatable, reusable and can be automated in a simple manner.

Requirements

N/A

Speaker bio

Talina is a software engineer @ Trusting Social.
She has worked on data intensive projects primarily written in Java, using Spark.
She was recently exposed to the world of Site Reliability Engineering, where she worked on designing and implementing Monitoring and Alerting systems for a large scale infrastructure.

Links

Slides

https://docs.google.com/presentation/d/1SZb33H2x5Y9lGisQb-lNFgHi3z1P0GaDqLCh9nTGC8k/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Rootconf is a forum for discussions about DevOps, infrastructure management, IT operations, systems engineering, SRE and security (from infrastructure defence perspective). more