Rootconf 2017

On service reliability

##Submit proposals for flash talks
Rootconf is on 11-12 May. If you have:

  1. Tips and tricks for simplifying infrastructure management and maintenance;
  2. Experiences with new tools to share;
  3. Cool demos;

then propose a flash talk here, or on the spot, at the venue.

The flash talk session is on 11 May, from 17:20-18:20. We have room for about 12 flash talks. Each presentation should be no more than 5 minutes.

A final note of caution when presenting at flash talks: we have a code of conduct at the conference. You must refrain from making remarks that may be perceived as sexist or derogatory. If you want to double check your presentation, contact Sandhya Ramesh, Karthik B. or Zainab Bawa at the venue.

##Theme
The theme for the 2017 edition is service reliability. The conference will feature talks on state of the art deployment strategies and appropriate monitoring technologies at different scales. Rootconf this year will broadly cover topics like toil, on-call, outage handling, and post-mortem analysis. We are inviting presentation proposals from academics and practitioners on these topics.

Rootconf aims to appeal to the widest possible range of DevOps practitioners: from embryonic startups to the largest established enterprises. We are keen to schedule presentations that appeal both to attendees’ current needs as well as their future aspirations.

##About the Conference
Rootconf is India’s principal conference where systems and operations engineers share real world knowledge about building reliable systems. We are now accepting submissions for our next edition which will take place in Bangalore on 11-12 May 2017.

Topics for Round 2 of the CfP were:

  1. Capacity planning.
  2. Deploying microservices, and issues concerning monitoring and reliability of microservices.
  3. Deployment and orchestration of container based infrastructures.
  4. Open tracing.

Topics for Round 1 of the CfP were:

  1. Monitoring strategies
  2. Deployment strategies
  3. Capacity planning
  4. Automation beyond deployment and monitoring
  5. Eliminating toil
  6. On-call outage handling
  7. Postmortem / root cause analysis
  8. Incident response

##Format
Rootconf is a three track conference:

We are inviting proposals for:

  • Full-length 40-minute talks – which cover conceptual topics and include case studies.
  • Crisp 15-minute how-to talks or introduction to a new technology.
  • Sponsored sessions, of 15 minutes and 40 minutes duration (limited slots available; subject to editorial scrutiny and approval).
    Hands-on workshop sessions of 3 and 6 hour duration where participants follow the instructors on their laptops.

##Selection Process
Proposals will be filtered and shortlisted by an Editorial Panel. Please make sure to add links to videos / slide decks when submitting proposals. This will help us understand your speaking experience and delivery style. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals. We might contact you to ask if you’d like to repost your content on the official conference blog.

We expect you to submit an outline of your proposed talk, either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.

Selection Process Flowchart

You can check back on this page for the status of your proposal. We will notify you if we either move your proposal to the next round or if we reject it. Selected speakers must participate in one or two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

A speaker is NOT confirmed a slot unless we explicitly mention so in an email or over any other medium of communication.

There is only one speaker per session. Entry is free for selected speakers.

##Travel Grants
As our budget is limited, we prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek provides these limited grants where applicable:

  • Two grants covering travel and accommodation for international speakers.
  • Three grants covering travel and accommodation for domestic speakers.

Grants will be made available to speakers delivering full sessions (40 minutes or longer).
*Speaker travel grants will be given in the order of preference to students, women, persons of non-binary genders, and speakers from Asia and Africa.

##Commitment to Open Source
HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like for it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a “sponsored session”.

##Important Dates:

  • Deadline for submitting proposals: 10 April, 2017
  • Final conference schedule: 15 April 2017
  • Conference dates: 11-12 May, 2017

##Contact
For more information about speaking proposals, tickets and sponsorships, contact info@hasgeek.com or call +91-7676332020.

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

Anshu Prateek

@anshprat

Life @ Freecharge on November 9th - A SRE perspective.

Submitted Jan 20, 2017

In this talk we will see the various challenges faced by SRE @ Freecharge in view of the demonetization campaigns. Scaling, Monitoring, Release, and above all, trying to work towards making SRE itself redundant!

Outline

November 8th will be a date every Indian will remember atleast for this decade. Payment companies seem to have been at the forefront of the race to take India to digital money. And the pace at which various changes have happened can be compared to that of an F1 race! And in a F1 race, one needs to have the pit crew working at their best to ensure that the driver can win the race. We as SRE many a times drive the role of pit crew, mechanic, R&D and a lot more.


Capacity planning

Review of the existing capacity for various key components - login, wallet, other backends - and spruce up the same where required (Login services traffic and utilization jumped up 2x overnight). Some other backends saw upto 3-4x traffic increase. Will see how the various backends were scaled - and how horizontal vs vertical scaling was decided. (5 minutes)


Load testing

The numbers that we used to see only during heavy campaigns became a thing of every afternoon (organically!). The numbers were getting capped at a certain xxxx requests at the top most layer. These topmost layer calls are further amplified at various backend layers. We needed to find and fix these chokepoints. (3 mins)


Architectural changes

We found (or rather were already aware of) the first bottleneck at the last layer run on top of a mongo replica. This setup is used by various services across the company. As a result, it sees 4-5x amplification of the frontend traffic. We looked at various ways to resolve the issue. Code changes will take time. We will discuess the various options that we reviewed, the one that we zeroed down upon and how we got it up and running within 18 hours. Also, how an effort to save 2 hours ended up adding 6 more hours to the operation! (15 mins)


Oncall/outages/response/COE/Postmortems

We took a hard look at the combined results of the load testing and the first campaign after that. It lead to a company wide exercise of capacity review and more architectural optimizations. (5 mins)

Monitoring - the above efforts of load testing and outages highlighted the already known gaps in monitoring. We will disucss what were these gaps and how they impacted us and how we are working on resolving the same.
(2 mins)

Security - With increased visibility, attacks on other fintech companies increased as well. We reviewed and strengthened our security setups.

Tuning - One specific example where we reduced the latency from 200ms to 1.77ms!

What next... Dockers! We will see why we are working towards using dockers.

And the end goal - to make SRE redundant! (How and why?!)
(10 mins)

Speaker bio

XY!, Ex-Aerospike, Ex-Reliance Jio and now at Freecharge. I ve seen and worked at scales of all levels - from thousands of machines to millions of tps in sub millisecond to working on the world’s largest startup targetting a Billion+ people! These various experiences are helping me in ensuring Freecharge remains the fastest wallet out there!

Slides

https://docs.google.com/presentation/d/1wdpSS4jAc4crKDPq0LKnGhRp093cffhv23ciYSoDs8o/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more