BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//HasGeek//NONSGML Funnel//EN
DESCRIPTION:Availability and reliability 24/7- the SRE life
X-WR-CALDESC:Availability and reliability 24/7- the SRE life
NAME:SRE Conf 2023
X-WR-CALNAME:SRE Conf 2023
REFRESH-INTERVAL;VALUE=DURATION:PT12H
SUMMARY:SRE Conf 2023
TIMEZONE-ID:Asia/Kolkata
X-PUBLISHED-TTL:PT12H
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal - No downtime of stateful servers
DTSTART:20231117T153000Z
DTEND:20231117T161000Z
DTSTAMP:20260421T102045Z
UID:session/5wKCUssuWLFUHLYF9iQqxM@hasgeek.com
SEQUENCE:5
CREATED:20231115T113010Z
DESCRIPTION:Anush Arvind will do a rehearsal of their [talk](https://hasge
 ek.com/rootconf/sreconf-2023/sub/no-downtime-migration-of-stateful-servers
 -Pcdj9aaHEA89AJ6rZa7VoB) with Rootconf SRE editors and reviewers.\n\nRootc
 onf members can participate in the rehearsals\, and give their feedback to
  the speakers.  
LAST-MODIFIED:20231115T114843Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal - No downtime of stateful servers in 5 
 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal - Lessons learned while self-managing Terab
 yte scale database for three nines of up-time
DTSTART:20231117T162000Z
DTEND:20231117T165500Z
DTSTAMP:20260421T102045Z
UID:session/EcKZ9pHPWpKnvGL4fzrGHa@hasgeek.com
SEQUENCE:2
CREATED:20231115T113308Z
DESCRIPTION:Chinmay Naik will do a rehearsal of their [talk](https://hasge
 ek.com/rootconf/sreconf-2023/sub/lessons-learned-while-managing-a-terabyte
 -scale-da-X6ieYGMJM4Epzd4hf3vnbm) with Rootconf SRE editors and reviewers.
 \n\nRootconf members can participate in the rehearsals\, and give their fe
 edback to the speakers.  
LAST-MODIFIED:20231115T113359Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal - Lessons learned while self-managing T
 erabyte scale database for three nines of up-time in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal -  Deep dive into analyzing high cardinalit
 y metrics
DTSTART:20231118T162000Z
DTEND:20231118T165500Z
DTSTAMP:20260421T102045Z
UID:session/NHjBY6M3adNUAG22KGcmoz@hasgeek.com
SEQUENCE:3
CREATED:20231115T114942Z
DESCRIPTION:Preeti Dewani will do a rehearsal of their [talk](https://hasg
 eek.com/rootconf/sreconf-2023/sub/deep-dive-into-analyzing-high-cardinalit
 y-metrics-7iUEoUTU1JxVhRXWdBRG2m) with Rootconf SRE editors and reviewers.
 \n\nRootconf members can participate in the rehearsals\, and give their fe
 edback to the speakers.  
LAST-MODIFIED:20231115T115221Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal -  Deep dive into analyzing high cardin
 ality metrics in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal - Hacking organizations - from SRE to SRE m
 anager
DTSTART:20231119T153000Z
DTEND:20231119T161000Z
DTSTAMP:20260421T102045Z
UID:session/McBPoACEc3ocbpvvCMS3Tc@hasgeek.com
SEQUENCE:4
CREATED:20231115T114837Z
DESCRIPTION:Biju Chacko will do a rehearsal of their [talk](https://hasgee
 k.com/rootconf/sreconf-2023/sub/hacking-organisations-from-sre-to-sre-mana
 ger-rkQWuAbYn3GPb3J6nCmBF) with Rootconf SRE editors and reviewers.\n\nRoo
 tconf members can participate in the rehearsals\, and give their feedback 
 to the speakers.  
LAST-MODIFIED:20231118T140221Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal - Hacking organizations - from SRE to S
 RE manager in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal -  Cost optimization: not my infrastructure
 \, but my architecture is the culprit.
DTSTART:20231120T153000Z
DTEND:20231120T161000Z
DTSTAMP:20260421T102045Z
UID:session/Lt5WTZB1eMwg8n9GktjkFW@hasgeek.com
SEQUENCE:4
CREATED:20231115T115134Z
DESCRIPTION:Jaideep Khandelwal will do a rehearsal of their [talk](https:/
 /hasgeek.com/rootconf/sreconf-2023/sub/cost-optimization-not-my-infrastruc
 ture-but-my-arc-42WiiveLUZbswY9m14uTVW) with Rootconf SRE editors and revi
 ewers.\n\nRootconf members can participate in the rehearsals\, and give th
 eir feedback to the speakers.  \n\n
LAST-MODIFIED:20231115T122231Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal -  Cost optimization: not my infrastruc
 ture\, but my architecture is the culprit. in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal -  SRE: "toil reduction" through communicat
 ion/collaboration/coordination!
DTSTART:20231120T162000Z
DTEND:20231120T164000Z
DTSTAMP:20260421T102045Z
UID:session/KkBQnv9CjhuD7RMSySDuDU@hasgeek.com
SEQUENCE:7
CREATED:20231115T115708Z
DESCRIPTION:Ravindra Harish will do a rehearsal of their [talk](https://ha
 sgeek.com/rootconf/sreconf-2023/sub/sre-toil-reduction-through-communicati
 on-collabora-ApheQfZKvGuBhgLrRJWUMN) with Rootconf SRE editors and reviewe
 rs.\n\nRootconf members can participate in the rehearsals\, and give their
  feedback to the speakers.  \n\n
LAST-MODIFIED:20231115T122247Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal -  SRE: "toil reduction" through commun
 ication/collaboration/coordination! in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal - Shell based architecture for maintaining 
 high availability
DTSTART:20231121T153000Z
DTEND:20231121T160000Z
DTSTAMP:20260421T102045Z
UID:session/SiEX9ohmJYX3SFF4dN5Gkr@hasgeek.com
SEQUENCE:4
CREATED:20231116T061726Z
DESCRIPTION:Raadhikaa Srinivasan will do a rehearsal of their [talk](https
 ://hasgeek.com/rootconf/sreconf-2023/sub/shell-based-architecture-for-main
 taining-high-avai-PAfCRfuc4Puey7DiTUHsVs) with Rootconf SRE editors and re
 viewers.\n\nRootconf members can participate in the rehearsals\, and give 
 their feedback to the speakers.  
LAST-MODIFIED:20231121T052931Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal - Shell based architecture for maintain
 ing high availability in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal - Scaling neetoDeploy from zero to producti
 on - building\, maintaining and optimizing our cloud deployment platform.
DTSTART:20231121T160000Z
DTEND:20231121T161500Z
DTSTAMP:20260421T102045Z
UID:session/9zxQGwHxaZ9jZAWF9svf8B@hasgeek.com
SEQUENCE:7
CREATED:20231115T121554Z
DESCRIPTION:Sreeram Venkitesh will do a rehearsal of their [talk](https://
 hasgeek.com/rootconf/sreconf-2023/sub/scaling-neetodeploy-from-zero-to-pro
 duction-buildi-RzAuZKcmqZunJSoHnH4Ufy) with Rootconf SRE editors and revie
 wers.\n\nRootconf members can participate in the rehearsals\, and give the
 ir feedback to the speakers.  \n\n
LAST-MODIFIED:20231121T052955Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal - Scaling neetoDeploy from zero to prod
 uction - building\, maintaining and optimizing our cloud deployment platfo
 rm. in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rootconf SRE rehearsal - The power of adaptation: lessons from cra
 fting malleable systems
DTSTART:20231122T153000Z
DTEND:20231122T161000Z
DTSTAMP:20260421T102045Z
UID:session/2M9SzWmtdtMqBckCjKqi43@hasgeek.com
SEQUENCE:3
CREATED:20231115T115829Z
DESCRIPTION:Harsh Mittal will do a rehearsal of their [talk](https://hasge
 ek.com/rootconf/sreconf-2023/sub/the-power-of-adaptation-lessons-from-craf
 ting-mall-27186brynxnw8jgAv5vHRY) with Rootconf SRE editors and reviewers.
 \n\nRootconf members can participate in the rehearsals\, and give their fe
 edback to the speakers.  \n\n
LAST-MODIFIED:20231115T115909Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Rootconf SRE rehearsal - The power of adaptation: lessons from
  crafting malleable systems in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Check-in\; onsite registrations
DTSTART:20231124T043000Z
DTEND:20231124T044500Z
DTSTAMP:20260421T102045Z
UID:session/KEeRMMdJ9CohAeWHB6Q8UZ@hasgeek.com
SEQUENCE:5
CREATED:20231110T062549Z
LAST-MODIFIED:20231126T113445Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Check-in\; onsite registrations in Seminar hall 2 (1st floor) 
 in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Introduction to Rootconf SRE\; editors\; house rules and announcem
 ents
DTSTART:20231124T044500Z
DTEND:20231124T050000Z
DTSTAMP:20260421T102045Z
UID:session/BLoNxbLF8akopbHkPd6khz@hasgeek.com
SEQUENCE:6
CREATED:20231110T062611Z
LAST-MODIFIED:20231126T113441Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Introduction to Rootconf SRE\; editors\; house rules and annou
 ncements in Seminar hall 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Cost optimization: where the architecture - not the infrastructure
  - is the culprit.
DTSTART:20231124T050000Z
DTEND:20231124T054000Z
DTSTAMP:20260421T102045Z
UID:session/TgFfNBEj7MpWGZBtRKapT6@hasgeek.com
SEQUENCE:11
CATEGORIES:Leadership,Accept
CREATED:20231110T062704Z
DESCRIPTION:Come economic winter and as Infrastructure engineer your calen
 dar is booked for multiple meetings/calls titled "Optimising Cloud Cost". 
 I am sure it sounds familiar. Everyone in the engineering teams prioritise
 s cutting the cloud cost. But this is often a reactive and partial approac
 h.\nWhy?\nAs an observation\, we only optimise what is visible to us and p
 luck low-hanging fruits. What we often need to address are the issues with
  our architecture. Look into the architecture to consider the cost of othe
 r factors like security as first class\, cost of scaling\, and cost of ove
 r-engineering. If we focus on fixing them\, the infrastructure cost reduct
 ion becomes a by-product. Eventually\, it also leads to more predictable c
 osts for your infrastructure.\n\nThis talk focuses on why architecture sho
 uld not be made from Ivory Towers but more realistic to your business to k
 eep the infrastructure cost in check. During this talk\, I will touch upon
  the hidden costs often overlooked and try to explain them with examples a
 nd stories. We divide the cost into two categories: direct cost and indire
 ct cost.\n\n#### Direct cost:\n\n- **Cost of optimisation for scale\, used
  by none**: As engineers\, everyone wants to solve for scale. We built and
  optimised it for scale\, with zero paying customers. Add more components 
 to the fantastic architecture\, which is sadly used by *none*.\n\n- **Cost
  of not understanding the workload**: Without understanding the workload\,
  over-provisioning\, auto-scaling horizontally or vertically without data 
 points. \n \n- **Cost of no signals\, but all noise**: Just because we hav
 e metrics\, traces\, and logs does not mean we will always use them. Examp
 le:\n	   - Sending metrics with high cardinality does not improve your obs
 ervability but increases your cost.\n	 - No guard rails at your central lo
 gging infrastructure\, which increases your storage\, computing\, and netw
 ork. \n\n#### Indirect cost:\n\n- **Cost of no collaboration**: When produ
 ct engineering teams and infrastructure teams do not collaborate and build
  architectures in silos.\n\n- **Cost of shiny tool syndrome** - Introducin
 g a "shiny new database" excites you because a cool company has solved whe
 n they reached a **specific** scale. The cost of your infrastructure will 
 undoubtedly increase\, but the engineering team effort required will be ma
 ssive for a minimal gain.\n\n- **Cost of overlooking security and complian
 ce**: After all the engineering effort\, the product that runs on a partic
 ular infrastructure does not follow good practices. Example:\n	   - Runnin
 g components in the public network.\n	  - No VPN for the internal tools li
 ke logging infrastructure or self-hosted CI/CD. \n	  - Secret keys spread 
 all across the application.\n  \n  The cost is your reputation which trick
 les to your sales team and the inability to convert leads. Also\, the cost
  to plan and move your stateless/stateful components.      \n\n- **Cost of
  heterogeneity**: Multiple ways of doing one thing can exist. Some of the 
 costs we should consider are maintenance and vendor lock-ins\, which can b
 e hard to quantify at times. Example:\n	  - Running a similar workload on 
 Kubernetes and running server-less functions on the cloud.\n\n### Why shou
 ld you attend this talk?\n\nIf you are a product engineer or work as infra
 structure/platform engineer this talk should help\, some of the key takeaw
 ays are:\n\n- Understand the hidden factors for cloud cost optimization. H
 ow to treat it as continuous activity instead of one time effort.\n- Cloud
  cost optimization cannot be done in isolation. It is a joint effort betwe
 en product engineers and infrastructure engineers. More empathy across tea
 ms :-).\n- Guidelines that can help make decisions between self-managed or
  hosted solutions.\n
LAST-MODIFIED:20240124T043746Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/cost-optimization-n
 ot-my-infrastructure-but-my-architecture-is-the-culprit-TgFfNBEj7MpWGZBtRK
 apT6
BEGIN:VALARM
ACTION:display
DESCRIPTION:Cost optimization: where the architecture - not the infrastruc
 ture - is the culprit. in Seminar hall 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:MySql war stories - Lessons learned while managing a Terabyte scal
 e database for three nines of uptime
DTSTART:20231124T054000Z
DTEND:20231124T061500Z
DTSTAMP:20260421T102045Z
UID:session/LoLhy6RkSYfvAuTPzHKDLo@hasgeek.com
SEQUENCE:9
CATEGORIES:Engineering,Accept
CREATED:20231110T062920Z
DESCRIPTION:### Title\nLessons learned while self-managing Terabyte scale 
 database for three nines of uptime\n\n### Abstract\nManaging uptime for st
 ateless systems is relatively easy\, you can scale (horizontally and verti
 cally) by throwing more hardware. However\, the uptime and reliability of 
 stateful systems (such as databases) is hard. This talk covers some lesson
 s learned managing production databases with Terabytes of data to achieve 
 three nines of uptime.\n\nI had to manage the uptime and scalability of th
 e self-managed production MySQL cluster for a Fintech company (Flip.id). T
 he transactional MySQL database cluster was 1.5TB in size and grew 6 GB pe
 r day. There were six database nodes\, each with 32vCPU\, 128GB RAM\, and 
 2TB disks.\n\nSome of the challenges I had to solve:\n- Observability and 
 uptime monitoring\n- Scalability (disks\, compute\, etc.)\n- Read-write tr
 affic routing across various DB nodes\n- Schema migrations\n- Managing dat
 abase security\n- Controlling Replication lag\n- Making data available for
  analytics use case\n- Numerous prod incidents related to database uptime 
 and performance\n\nEach of these bullet points above is worthy of a talk i
 n itself. However\, I’ll cover all these challenges and how we solved th
 em in our case. We started with very limited observability and gradually t
 ransitioned to three nines of uptime for the database. We eventually moved
  from a self-managed cluster to a cloud SaaS service (GCP’s CloudSQL)\, 
 but that story is for another time. 😀\n\n### What's in it for you?\nYou
 ’ll learn tools and patterns that you can apply in your own work if you
 ’re managing any stateful system (database\, queues\, etc). \n\nYou’ll
  learn the importance of: \n- System decoupling (when I cover ProxySQL and
  how it helped us decouple components)\n- Power of operationally simple to
 ols (gh-ost and how it simplified schema migrations for us)\n- Dev-prod pa
 rity and testing your approaches with prod scale in your staging environme
 nt\n\n**There will be a lot of diagrams and storytelling instead of just b
 ullet point slides for you to read. I recently spoke at RubyConf India abo
 ut lessons for managing trade-offs between over-engineering and the Big-ba
 ll-of-mud when building software systems. I am attaching the video of that
  talk to give the review committee an idea about how I speak.**
LAST-MODIFIED:20240124T043819Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/lessons-learned-whi
 le-managing-a-terabyte-scale-database-for-three-nines-of-uptime-LoLhy6RkSY
 fvAuTPzHKDLo
BEGIN:VALARM
ACTION:display
DESCRIPTION:MySql war stories - Lessons learned while managing a Terabyte 
 scale database for three nines of uptime in Seminar hall 2 (1st floor) in 
 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Break
DTSTART:20231124T061500Z
DTEND:20231124T064500Z
DTSTAMP:20260421T102045Z
UID:session/ViJ8NJqxDqCCCsQs7Baopj@hasgeek.com
SEQUENCE:4
CREATED:20231110T065402Z
LAST-MODIFIED:20231115T105715Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Break in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Deep dive into analyzing high cardinality metrics
DTSTART:20231124T064500Z
DTEND:20231124T071000Z
DTSTAMP:20260421T102045Z
UID:session/BjZyDL9rEaFRSgLX4ykDRc@hasgeek.com
SEQUENCE:17
CATEGORIES:Engineering,Accept
CREATED:20231110T065122Z
DESCRIPTION:### Introduction\n\nMetrics are the fundamental unit of time s
 eries databases (TSDB). They consist of labels which denote dimensions e.g
 . http_status_code\, url\, etc. Critical insights require metrics to have 
 both breadth (e.g. large number of labels) and depth (e.g. each label havi
 ng a large number of unique values). This makes up metric cardinality. Hig
 her cardinality == deeper insights. This talk will assume Prometheus-like 
 systems for reference.\n\n### Problem\n\nIn today's world of microservices
 \, distributed services having dimensions like tenant\, region\, and servi
 ce invariably lead to high cardinality. Unchecked cardinality growth can l
 ead to adverse effects like higher resource consumption\, slow load up of 
 dashboards\, alerting queries failing\, observability systems going blank\
 , reduced retention time\,  etc. \n\nAvoiding these situations at the ente
 rprise scale requires solutions to understand what is causing high cardina
 lity and how to manage it. This is often an afterthought which ignores the
  tooling to answer these questions:\n\n1. How to find your TSDB cardinalit
 y limits?\n2. How do you know when your system is approaching these limits
 ?\n3. When it does approach them\, how to find out which metrics are contr
 ibuting to it?\n4. How to dissect these metrics to find the labels which l
 ed to cardinality explosion?\n5. What actions need to be taken to fix this
 ?\n\n### Solution\n\nThe default solutions hover around finding labels wit
 h high cardinality and dropping them. But I have seen that in customer pro
 duction environments\, blindly dropping labels gives a false sense of beli
 ef of fixing the problem without actually solving anything. I wrote a tool
  to analyse high cardinality systems\, get insights out of them to answer 
 these questions:\n\n1. What is the overall state of my system - are there 
 any metrics approaching limits?\n2. Which labels of which metrics have the
  probability of causing cardinality explosion?\n3. If you choose to drop/a
 ggregate these labels - what will the end state look like?\n4. How not to 
 think about dropping labels as the default solution e.g. there are corner 
 cases where dropping the label with highest cardinalty has zero impact on 
 reduction.\n\n### Benefits\n\nThe audience of this talk will have the foll
 owing take aways:\n1. Fundamental knowledge to question assumptions on app
 roaching systems with high cardinality metrics.\n2. A handy open source ca
 rdinality debugger/explorer well tested in customer production environment
 s to analyze your Prometheus like TSDB systems and have the right numbers 
 upfront which will help you choose where to invest your time - updating yo
 ur instrumentation code\, changing your metric agent configuration\, etc. 
 \n
LAST-MODIFIED:20231229T063136Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/deep-dive-into-anal
 yzing-high-cardinality-metrics-BjZyDL9rEaFRSgLX4ykDRc
BEGIN:VALARM
ACTION:display
DESCRIPTION:Deep dive into analyzing high cardinality metrics in Seminar h
 all 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Toil reduction - building collaborative and communicative SRE team
 s
DTSTART:20231124T071000Z
DTEND:20231124T072000Z
DTSTAMP:20260421T102045Z
UID:session/3Uc4nePSYpGKTVDpiyYkaf@hasgeek.com
SEQUENCE:13
CATEGORIES:Leadership,Lightning talk
CREATED:20231110T065523Z
DESCRIPTION:Among the fundamental pillars of the SRE practice and framewor
 k\, "Toil Reduction" is one that gives SREs a kick for rightly serving bot
 h the left (engineering) and right (operations) parties of the product. It
  reflects the principles for which Google introduced SRE in the first plac
 e. As they famously said\, "Put a software engineer in front of an operati
 onal problem and see how the paradigm changes in solving the problem".   \
 n\nAs the conference theme includes communication\, collaboration and coor
 dination for operational challenges\, I would like to bring together the b
 est of the SRE pillar called "Toil Reduction"\, the best of the collaborat
 ion tool "Slack" and how together we can work to maintain the importance o
 f the SRE mindset to reduce workload\, reduce frustration between microser
 vice silo teams\, increase the speed of incident identification and resolu
 tion. Blending ChatOps\, AIOps and virtual bots with real humans is a perf
 ect example of a future where humans work alongside bots sitting right nex
 t to each other (virtually 😊 )!\n\nHello everyone. I am Ravindra Harish
 . I am the Director of SRE at Nike. I recently moved back to India and am 
 here to establish and drive the SRE practices at our India office serving 
 our global technology units of Nike. I have been leading the SRE function 
 at Nike for the last 6 years. We are a company that strongly believes that
  a perfect SRE can be a combination of great software engineers and a good
  talent of domain experts from the operations side. We run an IDENTIFICATI
 ON (proactive) model of incidents through tools like Splunk\, SignalFx\, C
 atchpoint and NewRelic to deliver the value of Distributed Tracing\, Chaos
  Engineering with a clear focus on reducing MTTD\, MTTR\, PAV\, improving 
 Fault Budget and hence define CUJ through SLI/O with derivation at SLA!\n\
 nI would like to present a real-life case study as part of an event. We ha
 ve a product that we call D.O.E.S (DevOps Enablement Systems)\, which is v
 ery focused on toil reduction using Slack and AWS. We have had a lot of su
 ccess with ChatOps and are now very focused on bringing AIOps into the mix
 . If we get the chance\, we'd love to show you a demo and expose you to th
 e possibilities of the future in the world of collaborative automation.\n
LAST-MODIFIED:20231229T063145Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/sre-toil-reduction-
 through-communication-collaboration-coordination-3Uc4nePSYpGKTVDpiyYkaf
BEGIN:VALARM
ACTION:display
DESCRIPTION:Toil reduction - building collaborative and communicative SRE 
 teams in Seminar hall 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Scaling neetoDeploy from zero to production - Building\, maintaini
 ng and optimizing our cloud deployment platform
DTSTART:20231124T072000Z
DTEND:20231124T072500Z
DTSTAMP:20260421T102045Z
UID:session/85qK3A3X8WnRShSqyU5QUg@hasgeek.com
SEQUENCE:12
CATEGORIES:Engineering,Lightning talk
CREATED:20231110T065547Z
DESCRIPTION:At [BigBinary](https://bigbinary.com)\, we've been building [n
 eetoDeploy](https://neetodeploy.com) for the past one year. \n\nWe were ru
 nning all our PR review apps across around 25-30 projects on Heroku. Last 
 year once Heroku announced that they're getting rid of their free plans\, 
 we started off by building a platform to deploy PR review apps. We kept th
 e date Heroku was planning to remove their free plans as a deadline and qu
 ickly put together our platform on top of Kubernetes so that we could migr
 ate all our apps from Heroku. We completed this way before the deadline an
 d spent the rest of the time fixing bugs and stabilizing the platform. We 
 architected an entire idle mechanism for the apps\, based on the network r
 equests each service recieves. If an app is not accessed for 5 minutes\, i
 t will get scaled down and would only be brought back up when its accessed
  again. We were able to bring down our costs substantially with this.\n\nO
 nce we had nailed PR review apps\, we started experimenting with staging a
 nd production app deployments. Since the basic functionality was there\, w
 e were able to bring it together easily. With this\, we moved all of BigBi
 nary's internal staging deployments to neetoDeploy. One of the major uses 
 of staging deployments was to run Cypress tests against them everyday. \n\
 nAfter we started using our platform to deploy staging apps\, we started f
 acing a lot of stability issues with existing features. We had to rebuild 
 and re-architecture several features that we had already implemented\, kin
 d of like building the Ship of Theseus. We went back to the drawing board 
 and designed a new efficient way of streaming logs faster. We setup cluste
 r autoscaler to handle load\, and overprovisioned the cluster ever so ligh
 tly based on the existing deployments\, so that new deployments never have
  to wait for the cluster to be up\, resulting in seamless and fast deploym
 ents. We moved from an external docker registry to our own registry hosted
  inside our Kubernetes cluster to bring down network costs and latency and
  so on.\n\nThe last one year has been a rollercoaster ride in terms of lea
 rning and experimenting. Working on and maintaining neetoDeploy over the p
 ast year taught me a lot of lessons the hard way and I've understood what 
 SRE means in a project of this scale. We wrote a [bunch of blog posts](htt
 ps://www.bigbinary.com/blog/categories/neetodeploy) about it too. \n\nThis
  is the neetoDeploy story - how we built a cloud deployment platform as a 
 service from scratch and took it to production in a year.\n
LAST-MODIFIED:20231126T113412Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/scaling-neetodeploy
 -from-zero-to-production-building-maintaining-and-optimizing-our-cloud-dep
 loyment-platform-85qK3A3X8WnRShSqyU5QUg
BEGIN:VALARM
ACTION:display
DESCRIPTION:Scaling neetoDeploy from zero to production - Building\, maint
 aining and optimizing our cloud deployment platform in Seminar hall 2 (1st
  floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Shell based architecture for maintaining high availability
DTSTART:20231124T072500Z
DTEND:20231124T075000Z
DTSTAMP:20260421T102045Z
UID:session/Y6T5mxr5w1hitnn2c6mHLY@hasgeek.com
SEQUENCE:9
CATEGORIES:Engineering,Accept
CREATED:20231116T060640Z
DESCRIPTION:High availability is critical for tech companies to meet user 
 expectations\, maintain business operations\, honor SLAs\, gain a competit
 ive edge and scale operations in their services or products.\nMaintaining 
 high availability requires implementing a robust architecture.  It is a ch
 allenge faced by every multi-tenant tech company that has grown larger ove
 r the past years\, in terms of number of customers\, size\, and features. 
 Multi-tenancy means that multiple customers of a cloud vendor are using th
 e same compute resources. In a multi-tenant architecture\, an issue for a 
 single customer can easily cascade and impact all its neighbors. As a matt
 er of fact\, there have been incidents in the past where a simple failure 
 for one customer at the database level has threatened to take the entire w
 eb cluster down.\n\nShell architecture\n\nFor Freshdesk to be a highly ava
 ilable product\, blast radius had to be minimized. Blast radius is the max
 imum impact that might be sustained in the event of a failure. To minimize
  the blast radius and increase availability\, shell architecture was intro
 duced in Freshdesk. Shell architecture is a logical grouping of compute re
 sources. Each shell is a stack\, but serving only a smaller bucket of cust
 omers categorized by shards that they belong to. A shard is a horizontal p
 artition of data in the database. Hence a set of customers  are mapped to 
 a particular shard. Set of shards are mapped to a shell. For observability
  of each stack and tenant\, shell information is exposed as metric and she
 ll specific alerts are configured.\n\nKey Benefits of shell architecture \
 n\nFlexibility to increase compute capacity for specific use cases. \nFlex
 ibility  to isolate noisy tenants and help reduce blast radius.\n\n\nAny c
 ompany can adopt this architecture to maintain high availability as the co
 mpany scales. This will give the company feasibility to isolate noisy tena
 nts and help reduce blast radius.\nTo move to this architecture\, companie
 s need to come up with an approach to group customers into different stack
 s based on a particular/ set of criterias.For example\, the nature of requ
 ests (light/heavy impact to DB) \, rate of requests (rpm)\,  business prio
 rities of requests etc. Companies must have infra setup to support on dema
 nd scaling of stacks and adopt autoscaling solutions to scale within stack
 s.\n\nThis talk will focus on primary motivations\, architecture\,  challe
 nges companies should prepare for if they decide to move to shell architec
 ture and its benefits.\n
LAST-MODIFIED:20240124T043829Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/shell-based-archite
 cture-for-maintaining-high-availability-Y6T5mxr5w1hitnn2c6mHLY
BEGIN:VALARM
ACTION:display
DESCRIPTION:Shell based architecture for maintaining high availability in 
 Seminar hall 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lunch
DTSTART:20231124T075000Z
DTEND:20231124T085000Z
DTSTAMP:20260421T102045Z
UID:session/DjrVnjRQmJSVUocfpX1uSb@hasgeek.com
SEQUENCE:8
CREATED:20231110T072219Z
LAST-MODIFIED:20231116T060649Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lunch in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Flash talk - major upgrades are hard
DTSTART:20231124T085000Z
DTEND:20231124T085500Z
DTSTAMP:20260421T102045Z
UID:session/6QTpejokRfMX5UecG8GBQE@hasgeek.com
SEQUENCE:1
CREATED:20231125T040107Z
LAST-MODIFIED:20231125T040120Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Flash talk - major upgrades are hard in Seminar hall 2 (1st fl
 oor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Flash talk - Business journeys with engineering
DTSTART:20231124T085500Z
DTEND:20231124T090500Z
DTSTAMP:20260421T102045Z
UID:session/G5nbMXGcNDzpqJ6cxYLxGj@hasgeek.com
SEQUENCE:1
CREATED:20231125T040713Z
LAST-MODIFIED:20231125T040716Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Flash talk - Business journeys with engineering in Seminar hal
 l 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:No downtime migration of stateful servers
DTSTART:20231124T090500Z
DTEND:20231124T094000Z
DTSTAMP:20260421T102045Z
UID:session/5DCJXuQMe1S52uauqMm455@hasgeek.com
SEQUENCE:18
CATEGORIES:Engineering,Accept
CREATED:20231110T065648Z
DESCRIPTION:In an era defined by continuous innovation and relentless user
  expectations\, the significance of seamless transitions cannot be oversta
 ted. Today\, we delve into the intricate landscape of stateful server migr
 ation and its pivotal role in ensuring uninterrupted service delivery.\n\n
 The Challenge of Stateful Servers:\n\nStateful servers lie at the heart of
  SaaS products\, holding precious user data\, session information\, and co
 nfigurations. Migrating such servers while maintaining business continuity
  poses a formidable challenge. Unlike stateless components\, stateful serv
 ers are not merely binaries\; they encapsulate users' interactions\, prefe
 rences\, and experiences. The traditional downtime-laden migrations\, no l
 onger sustainable\, are against the seamless experience users demand.\n\nT
 he Value of Zero-Downtime Migration:\n\nZero-downtime migration embodies t
 he core principle of user-centricity. It's not merely about preserving upt
 ime\; it's about safeguarding user satisfaction\, trust\, and loyalty. By 
 ensuring no disruptions\, SaaS providers demonstrate their commitment to u
 sers\, reinforcing the belief that their data is secure and their experien
 ces uninterrupted. The intrinsic alignment between zero-downtime migration
  and business sustainability underscores the paramount importance of adopt
 ing this approach.\n\nBenefits Amplified:\n\nEnhanced User Experience: Mai
 ntaining service availability during migration fosters positive user perce
 ptions\, loyalty\, and reduces the risk of churn.\nBusiness Continuity: Ze
 ro-downtime migration averts revenue losses\, safeguards reputation\, and 
 reinforces the SaaS provider's reliability.\nMinimal Impact on Workflows: 
 Users can continue their tasks without disruption\, boosting productivity 
 and operational efficiency.\nRegulatory Compliance: Data-sensitive industr
 ies benefit from migration methods that minimize compliance risks.\nInnova
 tion Acceleration: A reliable migration strategy empowers teams to focus o
 n innovation rather than firefighting.\n\nAs architects of modern SaaS pro
 ducts\, we're entrusted with the task of orchestrating technology transiti
 ons with a human touch. Embracing zero-downtime migration for stateful ser
 vers isn't just a choice\; it's an imperative. It's an acknowledgment that
  technology serves humans\, not the other way around. It's a declaration t
 hat no user should ever experience service disruption due to our technolog
 ical advancements. It's a commitment to the promise that innovation won't 
 come at the cost of user satisfaction. This talk will focus on how freshse
 rvice’s stateful servers were migrated from third party to in house with
  zero downtime.\n
LAST-MODIFIED:20231229T063150Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/no-downtime-migrati
 on-of-stateful-servers-5DCJXuQMe1S52uauqMm455
BEGIN:VALARM
ACTION:display
DESCRIPTION:No downtime migration of stateful servers in Seminar hall 2 (1
 st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Hacking organizations - from SRE to SRE manager
DTSTART:20231124T094000Z
DTEND:20231124T102000Z
DTSTAMP:20260421T102045Z
UID:session/PEvEUgLoqMdPAKB2tHiY4W@hasgeek.com
SEQUENCE:11
CATEGORIES:Leadership,Accept
CREATED:20231110T071921Z
DESCRIPTION:# Short Abstract\nMost SREs consider the manager track at some
  point. But while it's easy to read about the technical and project manage
 ment aspects of being an SRE Manager there's very little about the reality
  of the day to day job. \n\nIn the past 25 years I've learned that managem
 ent is the art of hacking organisations. Much like hacking infrastructure 
 or code it requires many skills. In this talk\, we discuss what makes you 
 a good candidate\, the key skills you'll need and what will have to change
  in how you think about the job.\n\n# Abstract\nAt some point in their car
 eers most SREs consider pursuing the manager track. Unfortunately\, while 
 it's easy read up about the technical\, operational or project management 
 aspects of being an SRE Manager but that is often less than a third of the
  job. \n\nWhat do SRE Managers really do? Many people are filled with vagu
 e\, mistaken ideas of the job gleaned from observation of the limited part
  of their manager's job that is visible to them. Without a clear idea of w
 hat the job entails\, how do you decide if it is something that you want t
 o do?\n\nIn many tech companies there is very little management training -
 - new managers are often dropped head first into the job and are expected 
 to just figure it out. They spend years being ineffective and may end up l
 osing a hard-won opportunity.\n\nIn the past 25 years I've been a develope
 r\, an SRE and a manager. I've led all sorts of teams --from ones with 2 p
 eople to some with 250 people. I initially saw it as an unpleasant necessi
 ty but as I learned more it became fascinating. In the same way how it's f
 un to hack technology to get the result you need\, the fun of management i
 s hacking organisations to achieve a desired outcome. In this talk\, I dis
 till what I've learned about this kind of hacking\, that is\, being a mana
 ger.\n\nWe start by discussing what characteristics set you out as a good 
 candidate for management. We then go on what SRE managers do. What skills 
 will you need to learn to succeed as a manager? This could be as varied as
  basic communication skills to reading a balance sheet. However\, the most
  interesting skills may be "meta-skills"\, that is\, knowing when to apply
  other skills and what their limits are.\n\nAn old aphorism says that "90%
  of technology problems are actually people problems" so it stands to reas
 on that managers spend most of their time dealing with people. We'll look 
 at some the problems you'll face with your team\, your peers\, your custom
 ers or your boss.\n\nFinally\, you'll find that many of your fundamental a
 ssumptions don't apply anymore when you become a manager. The faster you u
 nderstand this\, the faster you'll become effective. We'll close by going 
 through some lessons I learned the hard way.\n\nManaging SREs can a reward
 ing career but it is quite different from being an SRE. Understanding what
  being a manager is like can help you decide whether it interests you but 
 more importantly it will help you work more productively with your manager
 s.\n
LAST-MODIFIED:20231229T063154Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/hacking-organisatio
 ns-from-sre-to-sre-manager-PEvEUgLoqMdPAKB2tHiY4W
BEGIN:VALARM
ACTION:display
DESCRIPTION:Hacking organizations - from SRE to SRE manager in Seminar hal
 l 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Break
DTSTART:20231124T102000Z
DTEND:20231124T105500Z
DTSTAMP:20260421T102045Z
UID:session/CicjgMAWDDzkoVfvZHARMT@hasgeek.com
SEQUENCE:5
CREATED:20231115T105816Z
LAST-MODIFIED:20231119T165802Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Break in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Building highly scalable systems with resilience.
DTSTART:20231124T105500Z
DTEND:20231124T114000Z
DTSTAMP:20260421T102045Z
UID:session/UtWgcw6VqwAsbWZFSuRoA9@hasgeek.com
SEQUENCE:12
CATEGORIES:Engineering,Accept
CREATED:20231115T105517Z
DESCRIPTION:Failures are part of the tech landscape\, inevitable and unsto
 ppable. No matter how much you prepare\, failures will happen. Sure\, a we
 ll-organized SOP can speed up recovery time\, but what you really need is 
 the ability to bounce back when things go haywire.\n\nEnter malleability\,
  or as we commonly know it\, resiliency. It's the backbone of dealing with
  the unexpected. Think about being one of India's biggest OTT players\, wh
 ere millions are counting on you for a flawless cricket experience at each
  ball. Resilience isn't an option\; it's a necessity.\n\nIn this talk\, we
 're laying out the nitty-gritty of what we've learned and fine-tuned on ou
 r journey to breaking world records.\n\nIf you're knee-deep into running c
 omplex systems that are crucial to your business\, expect some practical t
 akeaways:\n\n1. Beyond the Basics: Get the lowdown on the basics of resili
 ency and shift your focus to goodput instead of just throughput.\n2. Survi
 ving Stress: Learn about patterns that help your systems bounce back smoot
 hly under stress. Skip the theoretical stuff\; we're sharing real examples
  from the trenches\, showing how we applied these principles bit by bit.\n
 3. Inside the Engine Room: If you're into distributed systems and geek out
  on designing for failure\, we've got the real deal. Find out how India's 
 biggest OTT doesn't just talk the talk but walks the walk when it comes to
  building robust\, adaptable tech.
LAST-MODIFIED:20231229T063159Z
LOCATION:Seminar hall 2 (1st floor) - Bangalore International Centre (BIC)
 \nBengaluru\,\nIN
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/sreconf-2023/schedule/the-power-of-adapta
 tion-lessons-from-crafting-malleable-systems-UtWgcw6VqwAsbWZFSuRoA9
BEGIN:VALARM
ACTION:display
DESCRIPTION:Building highly scalable systems with resilience. in Seminar h
 all 2 (1st floor) in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Wrap-up\; feedback
DTSTART:20231124T114000Z
DTEND:20231124T115000Z
DTSTAMP:20260421T102045Z
UID:session/CTgCtgZvrLYAzVTz1tApYU@hasgeek.com
SEQUENCE:4
CREATED:20231115T105838Z
LAST-MODIFIED:20231117T161226Z
LOCATION:Bangalore International Centre (BIC)
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Wrap-up\; feedback in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
END:VCALENDAR
