Rootconf 2018

Rootconf 2018

On scaling infrastructure and operations

##About Rootconf 2018 and who should attend:

Rootconf is India’s best conference on DevOps, SRE and IT infrastructure. Rootconf attracts systems and operations engineers to share real-world knowledge about building reliable systems.

The 2018 edition is a single track conference. Day 1 – 10 May – features talks on security. Colin Charles (chief evangelist at Percona Foundation), Pukhraj Singh (former national cybersecurity manager at UIDAI), Shamim Reza (open source enthusiast), Alisha Gurung (network engineer at Bhutan Telecom) and Derick Thomas (former network engineer at VSNL and Airtel Bharti) will touch on important aspects of infrastructure, database, network and enterprise security.

Day 2 – 11 May – is filled with case studies and stories about legacy code, immutable infrastructure, root-cause analysis, handling dependencies and monitoring. Talks from Exotel, Kayako, Intuit, Helpshift, Digital Ocean, among others, will help you evaluate DevOps tools and architecture patterns.

If you are a:

  1. DevOps programmer
  2. Systems engineer
  3. Architect
  4. VP of engineering
  5. IT manager

you should attend Rootconf.

Birds Of Feather (BOF) sessions at Rootconf 2018 will cover the following topics:

  1. DevSec Ops
  2. Microservices - tooling, architecture, costs and culture
  3. Mistakes that startups make when planning infrastructure
  4. Handling technical debt
  5. How to plan a container strategy for your organization
  6. Evaluating AWS for scale
  7. Future of DevOps

Rootconf is a conference for practitioners, by practitioners.

The call for proposals is closed. If you are interested in speaking at Rootconf events in 2018, submit a proposal here: rootconf.talkfunnel.com/rootconf-round-the-year-2018/

##Venue:

NIMHANS Convention Centre, Lakkasandra, Hombegowda Nagar, Bengaluru, Karnataka 560029.

Schedule, event details and tickets: https://rootconf.in/2018

For more information about Rootconf, sponsorships, outstation events, contact support@hasgeek.com or call 7676332020.

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

Sarika Mohapatra

@sarikamm

How we scaled devops while we scaled 20x at SumoLogic

Submitted Mar 10, 2018

SumoLogic is a unified logs and metrics platform for monitoring and troubleshooting and operational and security analytics. Ours is a cloud based microservices architecture that is multitenant in nature, analyzing upto 100+ petabytes, ingesting 100+ terabytes and serving 20+ million queries per day.

As a devops engineer in SumoLogic, we design, build, deploy, monitor our services all at the same time. To build for cloud scale in a microservices multitenant architecture that is dynamically changing and evolving, and to manage an ever-growing infrastructure, while being agile, has its unique challenges. Since our customers use SumoLogic for business critical needs, it is paramount that our service is always up and running.

So this talk is about the journey of scaling more than 20x, the engineering and operational challenges we faced and how we today run Sumo Search successfully, cost optimally and reliably while still being agile.

We will cover real case studies of what worked and what didn’t, go over our solutions and key lessons to build for scale. We will also cover how we do continuous testing and how we set up monitoring and alerting to run our services reliably.

Outline

  • Sumo - What we do and architecture overview
  • Biggest challenges in running Sumo Search reliably in the face of failing systems and unpredictability with concrete examples
    • Failures and unpredictability
    • Multitenancy
    • Scale
    • Being reliable while being agile
    • Upkeep of existing services with minimal manual effort
    • Operational KT
  • Solutions and lessons around the various dimensions of scale
    • Auto monitoring
    • Auto alerting
    • Auto remediation
    • Handling spikes
    • Blast radius control
    • Resource management
    • Configuration management
  • Continuous Testing
    • Continuous ITs
    • Performance & Reliability Testing
    • Shadow Testing
    • Dogfooding
  • How to setup your monitoring and troubleshooting system to meet your uptime goals and reduce your MTTR?
    • Logs and metrics collection
    • Setting up monitoring and observability for everything!
    • Alerting, troubleshooting and remediation
    • Feedback loop: outages, postmortems and how it influences our infrastructure and system design
  • Key takeaways:
    • Lessons for designing resilient and scalable services and running them reliably on production

Requirements

N/A

Speaker bio

Sarika is a Senior Software Engineer at SumoLogic where she is part of the search team that builds and runs petabyte scale multitenant cloud based log search and analytics service.

Prior to Sumo, Sarika graduated from IIT Kanpur and worked in Microsoft Apps Experience team, Microsoft Bing Search and Microsoft Research. In spare time, she invests time in her IoT projects and android apps.

Her passion lies in building high quality enterprise products that are indispensable to customers’ business. Besides building products, Sarika’s interests include sports, teaching, travel and pets!

Slides

https://docs.google.com/presentation/d/1ZsG_B14_5fCO7Qi9CYbKgNlDhbKs3MOBfICapQOkVnw/

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more