The Urban Myth Of Full Uptime

Jan 2020

13 Mon

14 Tue

15 Wed

16 Thu

17 Fri

18 Sat 09:00 AM – 05:40 PM IST

19 Sun

Make a submission

Accepting submissions till 30 Nov 2019, 11:59 PM

India International Centre (Kamaladevi Complex), Seminar Hall 1, 2 & 3, New Delhi

Tickets

Rootconf is a platform to discuss real-world problems around Site Reliability Engineering (SRE), DevOps for data engineering platforms, evaluating and adopting technologies such as Kubernetes and containers, and DevSecOps.

Rootconf Delhi edition will be held on 18 January 2020 at the India International Centre (IIC).

Speakers from Flipkart, Hotstar, MindTickle, Red Hat and Naukri.com will discuss the following topics:

Scaling and engineering challenges from Hotstar’s and Flipkart’s experiences.
Data store choices.
Kubernetes and K8s -- when to choose what and why?
DevSecOps

##Who should attend Rootconf:

Operations engineers
DevOps programmers
Software developers
SRE
Tech leads

To know more about Rootconf, check the following resources:

hasgeek.com/rootconf
https://www.youtube.com/channel/UCDHao9FxNRHw1VyLuGXI_rA

#Sponsors:

Click here to view the Sponsorship Deck.
Email sales@hasgeek.com for bulk ticket purchases, and sponsoring the above Rootconf Series.

Rootconf Delhi sponsors:

#Silver Sponsor

#Bronze Sponsors

#Community Partner

For information about the event, tickets (bulk discounts automatically apply on 5+ and 10+ tickets) and speaking, call Rootconf on 7676332020 or write to info@hasgeek.com.

Hosted by

Rootconf

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

All submissions

Previous Next

The Urban Myth Of Full Uptime

Submitted Nov 19, 2019

Section: Full talk (40 mins) Category: SRE

Strategies to achieve high uptime at scale. The points this talk is going to cover are:

A real-life case study
Cloud Architecture
Immutable infrastructure
Infrastructure as code
Secrets Management
Service Discovery
Container management and scheduling
Blue Green Deployment
Observability

Outline

About Me, My Company and My Situation
- Set context for the rest of the headlines
- Touch up on the legacy setup and infrastructure so that people can put the upcoming points in contrast with it
Causes of our downtime
- Architecture
- Poor Provisioning Practices
  - Hardware
  - Configuration
- Lack of Monitoring
- Missing Backups, DR and BC
- Poor Technical Choices
  - Storing Data on single node
  - Scaling storage with LVM
  - Node local cache for distributed apps
  - Cyclic API calls
- Security
  - Checked in secrets
  - Publicly accessible resources
  - Outdated and vulnerable versions of tools
- Lack of Documentation and Testing
- Takeaway - Typical problems faced in a poorly architected infrastructure
Architecture
- What’s wrong with it
- Designing immutable infrastructure
Poor Provisioning Practices
- What’s wrong with it
- Provisioning immutable resources with Terraform
- Deploying and Configuring services in immutable fashion
Monitoring
- What’s wrong with it
- Implementing Observability
Backups, DR and BC
- What’s wrong with it
- Automated backups with redundant copies
Poor Technical Choices
- What’s wrong with it
- Fixing the mistakes made so far
Lack of Documentation and Testing
Summary