About Rootconf 2019:
The seventh edition of Rootconf is a two-track conference with:
- Security talks and tutorials in audi 1 and 2 on 21 June.
- Talks on DevOps, distributed systems and SRE in audi 1 and audi 2 on 22 June.
Topics and schedule:
View full schedule here: https://hasgeek.com/rootconf/2019/schedule
Rootconf 2019 includes talks and Birds of Feather (BOF) sessions on:
- OSINT and its applications
- Key management, encryption and its costs
- Running a bug bounty programme in your organization
- PolarDB architecture as Cloud Native Architecture, developed by Alibaba Cloud
- SRE and running distributed teams
- Routing security
- Log analytics
- Enabling SRE via automated feedback loops
- TOR for DevOps
Who should attend Rootconf?
- DevOps programmers
- DevOps leads
- Systems engineers
- Infrastructure security professionals and experts
- DevSecOps teams
- Cloud service providers
- Companies with heavy cloud usage
- Providers of the pieces on which an organization’s IT infrastructure runs – monitoring, log management, alerting, etc
- Organizations dealing with large network systems where data must be protected
- VPs of engineering
- Engineering managers looking to optimize infrastructure and teams
For information about Rootconf and bulk ticket purchases, contact firstname.lastname@example.org or call 7676332020. Only community sponsorships available.
Rootconf 2019 sponsors:
Abhay Rana (Nemo), software engineer at Razorpay
Learnings from running my Home Server (and why you should run one too)
This talk will cover the important bits:
- A brief overview of my homeserver setup
i. What all Hardware is involved
ii. What all services are running
iii. Networking and how to route traffic to your home.
- Infrastructure Setup
i. Terraform Configuration
ii. Service Configuration
iii. Running a Kubernetes Cluster
- A brief overview of my homeserver setup
Shakthi Kannan, senior DevOps engineer at Aerospike
Tutorial: Shooting the trouble down to the Wireshark Lua plugin
- Lua and Wireshark Lua
- Usage and Example
- Debugging and Linting
- Literate Programming
- Markdown Structure
- Protocol Dissection Pattern
- Dissector Table
- Wireshark User Interface
- Info, Message and Heartbeat Protocol
- Hot key Report
- Future Work
Kushal Das, interest technologist at the Freedom of the Press Foundation
Building reproducible Python applications for secured environments
- Introduction - 1 minute
- Why all of these painful steps? 2 minutes
- SecureDrop client desktop tools and their dependency on other upstream projects (or think about an application structutre and standard deployment strategy)- 3 minutes
- Updating dependencies or do we read all updates? - 2 minutes
- Development environment and using pipenv + tools to create
requirements.txtwtih hashes only for source - 3 minutes
- Structure of a static HTML based private package index - 4 minutes
- GPG signed list of already built wheels + syncing them locally - 2 minute
- Running python3 setup.py sdist to create the release tarball + a step before to have a requirements..txt with only binary hashes from our list of wheels. - 5 minutes
- Final Debinan packaging script (for automation) which does double verification of the wheel hashes. - 3 minutes
- Reproducible Debian package as end product - 2 minutes
- Possibility in the RPM land - 1 minute
Anurag Bhatia, network researcher at Hurricane Electric
Let’s talk about routing security
This talk covers how (in)secure in the routing at the global scale, covers about IRR in detail. Includes examples, tools and challenges with IRR based BGP filtering. It also gives a brief introduction to RPKI as well as latest developments in this domain (AT&T doing RPKI based filtering, Google about to do IRR based filtering etc)
Lavakumar Kuppan, co-founder at Ironwasp Security
Deploying and managing CSP: the browser-side firewall
- Introduction to client-side Data Exfiltration attacks
- Introduction to Content-Security Policy
- Content Security Policy to prevent Data Exfiltration attacks
○ What is possible
○ What are the limitations
- How to design and deploy CSP to detect/prevent Data Exfiltration attacks
- How to monitor policy violations and alerts
Suraj Deshmukh, software engineer at Kinvolk
Using pod security policies to harden your Kubernetes cluster
This talk will start with current state of Kubernetes security and how folks are setting up their clusters. How folks are using shortcuts to get around changing their old bad practices. The talk will explain folks what’s worst that can happen if they keep using those bad practices. Specially in the multi-tenant setup this can lead to massive breakouts.
The above topics are there to create a ground for folks to appreciate the security feature of Kubernetes Pod Security Policy.
We then come to core of the talk this is where I will explain what Pod Security Policy is and how it can help in hardening the cluster. I will explain all the supported features that PSP has and what feature stops what kind of attack vector in a multi-tenant untrusted environment.
Also I will explain the benefits of having secure & hardened clusters from the development phase itself and how it helps you understand and catch the issues that you might encounter only while deploying on production.
Bernd Erk, CEO at NETWAYS
Keynote: How convenience Is killing open standards
All the technical diversity we enjoy in our industry is the result of internal evangelism 20 yers ago. Now all three major cloud providers have been pushing their serverless solutions to lure customers into a new form of vendor lock-in. I think it is time, to remind ourselves about Open Standards.
Pulkit Vaishnav, DevOps engineer at Moengage
SSH Certificates: a way to scale SSH access
- Traditional Public key (asymmetric key) cryptography authentication(2 min)
- Traditional SSH authentication methods
- Password-based authentication
- Public-key (asymmetric) based authentication
- Generic Security Service Application Program Interface (an API to access servers)
- Centralized authentication approach & limitations (3 min)
- How LDAP/Kerberos working (in brief)
- Limitations of a cenralized system
- A adventures ride with SSH certificates (6 min)
- Working of SSH certificates
- Generate signed certificate from CA
- Configuration on the host system
- Configuration on the user system
- Demo (3 min)
- Features of SSH CA (3 min)
- Role-based access
- Host-based access
- Certificate validity
- Certificate identity
- Limitation & solutions (3 min)
- Traditional Public key (asymmetric key) cryptography authentication(2 min)
How do you keep your secrets and how much does it cost?
Rather than starting from a particular solution and discuss the specific nature in which it works, the discussion should be centered around <b><u>why a particular solution was deployed </u></b> in an organization and the thought process that went into it.
We need at least 4 - 5 examples on real world thought process that went in deploying a particular solution. In particular we would be interested in the following structure * What was the secret keeping solution that was chosen? * Why was it chosen? What are the parameters for chosing a particular solution? The parameters we are looking to understand in depth are:
- Number of Secrets
- Nature of Secrets (Ephemeral, API Keys, Tokens etc.)
- Number of entities that access these secrets (500, 1000 etc.)
- Integration with Identity management.
- Cost for storing and managing secrets.
- Trust in the organization/community/person that developed and maintains the solution.
- Integration with existing tools/cloud providers/organization workflows etc.
Himanshu Kumar Das, security at CRED
Sponsored talk: Implementing security from day one at a fintech startup
Information security domain has become vast with lots of industry standards, frameworks, tools, etc. However, all business at the end of the day cares about is releasing a product securely with minimal friction and enabling tech to move fast while having security in place.
In this talk, we will touch base on the approaches as well as key decisions we took to ensure we have security in place from day 1 of our product launch. To keep understanding simple, I have segmented security into 3 following buckets
A cloud approach
A compliance approach
A product approach
A cloud approach: Most of our founding team members were well versed with a public cloud (AWS), hence, this was a no brainer decision to adapt an AWS heavy infrastructure.
Due diligence of shared responsibility: All managed workloads would need to have a policy defined. E.g. An IAM role must not have an excess permissions or an admin user should not be able to delete a running ECS cluster.
Lifecycle of workloads/resources: E.g. Security groups for enabling temporary access across AWS resource needs to be revoked asap.
Secret/Key Management: E.g, Because secrets are not meant to be hardcoded.
Incident Response: E.g. Bitcion miner on a hacked EC2.
Least Principle - OKTA as SSO on separate AWS accounts(dev,stage,prod,PCI, central) with distinguished user groups.
Continuous AWS Monitoring - https://www.cloudconformity.com/conformity-rules/
AWS Guardduty - Monitors Cloudtrail, VPC Flow logs and Route53 logs - SNS to Email for all alerts.
A Compliance Approach: Fintech is regulated business and industry standards are its consequence. During the first month of our product launch, we were required to become compliant to NPCI guidelines for a UPI launch. Followed by RBI’s data localization requirement(SAR) and then ISO 27001:2013
Onboarding independent auditors to the concept of credit card bill payments.
Onboarding consultants to view product/business from a different angle.
Creating a process oriented culture to adhere to various compliance requirements.
A Product Approach: Our founder wanted our product to be as secure before we launch.
Dealing with rapid code+design changes.
Defensive versus offensive.
Proposing secure solutions for end user application flow.
Keeping track of changes in every alpha build. Sit next to developer and start with a simple code review. Need not be a tool based approach, for every API call, check the corresponding codebase and think what could go wrong.
Too many tools and framework to attack. Think on how to make every attack difficult. E.g. SSL Pinning, Code obfuscation( Proguard followed by Dexguard)
Review all application flow, look at application having user inputs. E.g. OTP flow in our app.
Jaskaran Narula, satellite engineer at Red Hat
Securing infrastructure with OpenScap: the automation way
Audience will learn about Openscap. Tools used by openscap along with profiles and components of openscap. I will also through some light on how we all can have our own set of policies and how we can develop certain profiles and policies that will be custom of yourself and also useful for upstream. This talk will also involve how to deploy openscap, how to use different tools of openscap and warping up with the analysis of the reports generated by the scap policies. At the end you get a clear picture of openscap also with managing all the tools and reports by them.
Shubham Mittal, co-founder at RedHuntLabs
OSINT for proactive defense
Below is an outline of the presentation:
- Overview of OSINT
- Why Security Teams should use OSINT
- Continuous Discovery and Monitoring of Assets
- Use OSINT Data for Periodic Attack Simulation
- Discovering Sensitive Information Leakage
- Monitoring Breached Passwords
- Proactively Identifying Security Incidents using SOCMint
- OSINT Countermeasures
Shadab Siddiqui, head of Information Security at Hotstar
Devil lies in the details: running a successful bug bounty programme in your organization
Agenda of this talk is to give a glimpse into the actual world of bug bounty and just not from what we read in news. These will be some points of discussion to paint a complete picture for the audience:
-Introduction and benefits of having a bug bounty program
-Discuss on would it make sense to have a bug bounty program or can we live without it
-What take do leadership has on bug bounty, their concerns, and expectations
-What could go wrong if we dont even bother
-When is the right time in the timeline of an organization to have open connect with security researchers
-What kind of organizations need such program or how do we decide it for my non-IT organization
-What platform make sense? Should we buy or build our own
-Why problem would pop up while building a platform vs drawbacks on signing up on a platform
-What all process needs to put in place across the organization to have a successful one
-What is bare minimum automation we need to have to scale up to all bugs we receive
-How do different teams react to it like the legal team, finance team, PR team etc.
-What are the logistic problem that shows up towards the launch
-Do’s and Do not’s of a bug bounty program
Talina Shrotriya, software engineer at Trusting Social
Sponsored talk: SRE -- culture and strategy
The work of an SRE team is to serve a single purpose of shipping code in a fast, reliable and economical manner. Keeping these 3 factors in mind, we proceed to understand the undermentioned use cases.
- Metric Collection:
We will discuss on how we were able to collect metrics from closely guarded systems, with no access to the internet by using a reverse hole punching mechanism.
- Scalable Storage:
We started off by using Postgres as our database for storing metrics. Soon enough we hit a situation where the rate of ingestion almost exponentially filled up our storage. We will discuss on how we designed this solution to push metrics to multiple storage systems without affecting existing deployments.
- Proactive Alerting:
Observability is more than just metrics collection. We will walk through certain use cases which helped us analyse traffic load to adjust resource allocation, keep a track of all error codes sent by upstream, and detect slowness vs disconnection across networks.
We deal with deployments in closely guarded systems where we don’t have control over the infrastructure. We will discuss how we utilized existing tooling to share data between services deployed across vlans.
Intent: Cost-Reduction and Speed
Network accessibility would mean restricting access to certain sources. The exercise of whitelisting source addresses is neither economical nor fast when it comes to closely guarded systems. With the use of a transparent proxy we were able to manage access easily.
We will discuss how Squid proxy would hijack a 429 error with a 503, leading to temporary denial of access to the upstream service.
We will discuss 2 war stories with respect to DNS - one where DNS resolution failed randomly and the other where DNS lookup limit was reached when too many services were accessing our central monitoring solution in a short span of time.
- File sharing:
Any deployment in a closely guarded environment would mean us having to upload files to the destination manually. We resolved this situation by implementing a small service which made file access easy across such environments.
- Topology mapping:
Constant manual intervention was needed to check if x ports were opened cross y systems in z vlans. We discuss the solutioning of how we automated this task.
Intent: Reliability and Speed.
Infrastructure is the entrypoint to deploy code to production. While cloud providers do make this task easy, there are deeper problems we had to address, such as - versioning, locking access to concurrent updation of resources and enabling webhooks. We developed a tool called Tessellate to do just this.
With an intent to give more power to product teams to bring up their own infrastructure without having to build the skill set, we designed a tool called Escher which would take a YAML file as an input and provide a fully baked infrastructure to the user. Escher also made automating workflows easier alongside our orchestrator service.
Infosec: Authorization and Traceability:
Intent: Cost-Reduction and Reliability.
Any activity performed by the SRE team, be it service deployment, PR merge or infrastructure improvement, had to be tracked to ensure production systems were reliably maintained. We built multiple authentication and authorization workflows around each of these use cases. These simple solutions enabled us to confidently allow deployments to take place. We discuss solutions such as ACL on job scheduling, 2FA authentication for service deployments and LDAP for traceability.
The core belief of an SRE team is to solve problems for the larger good and not restrict ourselves to the problem at hand. Every single tool we use was solutioned keeping this intent in mind. Each solution was a step towards better debuggability and accessibility to the issues faced in production, and each of these solutions made our on-call shifts manageable. The key takeaway from this talk would be to follow the same approach and taking a step back and thinking twice before doing something manually, thinking whether this problem is repeatable, reusable and can be automated. -
Defensive and offensive applications of open source intelligence
by Karan Saini, Shubham Mittal, Vandana Verma
Defensive and offensive applications of open source intelligence
This Birds of a Feather session does not have a particular decided flow for discussion. All inputs, including questions, techniques, experiences, war stories, et cetera, will be helpful for moving the conversation forward. However, it should be clarified that, for the purpose of this discussion, the use of open source intelligence will not cover the associated investigative or journalistic aspect, but rather that which deals with potential security implications, both from the red and blue perspective.
Ratnadeep Debnath, SRE at Zapier
Scale MySQL beyond limits with ProxySQL
- Setting up the scene: MySQL database flooded with connections, more than it can handle
- Vision: Achieve 10x scale without 10x cost
- An ideal solution?
- Solutions available: ProxySQL, MaxScale, Nginx, HAProxy
- Why ProxySQL?
- Benchmarking ProxySQL
- Conneting the missing dots
- Chosing an architecture for deployment and why
- Challenges and workarounds
- The end result!
- The Future
Denis Dsouza, DevOps engineer at Moonfrog Labs
Log Analytics with ELK Stack (Architecture for aggressive cost optimization and infinite data scale)
Business Requirements/Use Cases
- Log analysis platform (Application, Web-Server, Database logs)
- Data Ingestion rate: ~300GB/day
- Frequently accessed data: last 8 days
- Infrequently accessed data: 82 days (90 - 8 days)
- Uptime: 99.9
- Hot Retention period: 90 days
- Cold Retention period: 90 days (with potential to increase)
- Cost effective solution
Areas of optimization
- Replica counts and its impacts
- How to run ELK on Spot instances correctly.
- EBS Costs can be high, how to set up Hot / Cold data storage
- Auto Scaling
- On-demand ELK Cluster
Infinite Data Retention
- How to setup S3 as a hot backup
- Recover on Demand
- Cost/GB data ingested
- Trade-offs made
- DR mechanisms
- Building a log analytics is not rocket science. But it can be painfully iterative if you are not aware of the options. Be aware of the trade-offs you are OK making and you can roll out a solution specifically optimized for that.
Neelu Tripathy, Lavakumar Kuppan, Vandana Verma, Shadab Siddiqui
Birds of a Feather: DevSecOps
We are looking to bring forth the following issues:
-Challenges in integrating security in a fast paced DevOps Cycle
-Current Practices being followed for DevSecOps in their Organizations
-How is SAST & DAST placed in the lifeCycle
-Embedding Security in the Pipeline and Automation
-Whats different when dealing with containers and cloud
Trishna Guha, senior software engineer at Red Hat
Network automation with Ansible
The talk will cover the key features like provisioning, configuration management of Networking devices and How Ansible becomes one Solution for all of it.
Agenda of the talk:
- What are challenges Network Engineers face
- WHY use Ansible for Network Automation to make their life simple.
- What are the things Ansible offers for Network Automation
- How to use Ansible to resolve challenges Network Engineers/Operators face everyday
- Live demo of Configuring Networking device with Ansible
Piyush Verma, head of SRE at Trustingsocial.com
Software/site reliability of distributed systems
Consider a sample application:
A number that user sends an SMS text to of the form “Remind <date format> about <y>.” When it’s due, a service calls you back. User is charged for each SMS and reminders that they answer.
Where all do you think this can start failing?
What are the right tools and strategies to measure and monitor these failure points?
What is the cost of measuring or leaving it un-measured?
There are Queues in the system. How do you monitor synchronous and asynchronous architectures?
The load has started to increase, but before we discuss strategies Let’s discuss CAP quickly.
How do we decide if we need sharding, better CPU or Clustering?
How do we add backups? Should they be asynchronous or synchronous?
Criteria to consider before picking up a strategy.
So far, we have been reactive about failures. How do we move to a proactive model?
And Meanwhile, could you trace that request from that particular user for me?
At what stage and how do we start injecting reliability as a part of the Software development process?
Lastly, while all of this is said to improve and fix things, how do we prove that it does? How do you validate that MySQL replicas come back when the master dies. The only way to know is by simulating. How do we set up Simulations? A decade ago it used to be called FMEA; now it’s called Chaos Engineering.
And oh, we should also discuss Site vs Software Reliability.
Load Balancing : an in-depth study to scale @ 80K TPS
- Referencing the 13 yr old article of willy tarreau ( 2006 )
5 categories of LB
Evaluation params of LBs
DNS Load Balancing Detail
- Layer ¾ Loadbalancing
Haproxy example and monitoring params
Layer 7 Loadbalancing
Hardware and Software Routing ( setups and cases of each )
LVS: history and implementation
- Direct Routing
- Tunnel Based routing
- What to monitor
- Interrupts handling and CPU affinity
HA with Keepalived and consul
Sudheer Kumar Meesala
Sponsored talk: Decomposing a monolith -- engineering and operational strategies we have gleaned from our experience at Endurance
Our APAC story in identifying and addressing the key issues around moving ~$70 million dollar monolithic Domain provisioning & Billing platform to microservices:
Need for micro-services
Managing the transition
Scaling, Redundancy and Deployments
Sayan Chowdhury, Kushal Das
Birds of a Feather: Security paranoid OS
In the wake of one data breach after another, and in this age of surveillance, security has become serious business. The trust on the big giants like Facebook, Google, etc has diminished over the years. Security has been a huge concern in the recent times for many of us, and a lot of them have grown a valid paranoia around security and privacy. The term “valid” signfies just not be a paranoid but to be methodical in your action.
As we grow dependent more and more on internet-based services, the more vunerable are we becoming to exploits, and you can easily notice how in the recent times the exploits has affected a large number of people.
And, there could be no better place to start than your OS itself. There has been a growing list of OS targeted towards security. There are bunch of options these days, like TailOS, QubeOS, Silverblue, Whonix etc each behaving a bit differently but trying to acheive the same goal, Security & Privacy
We gather to discuss our ideas and concerns on this topic, discuss our methods, how we compartmentalize applications to fight vulnerabilties.
Aaditya Talwai, SRE at Confluent
Virtuous Cycles: Enabling SRE via automated feedback loops
In pictures we will show a journey of instrumentation - how one can use app-level telemetry and tracing to build confidence that your auto-remediating strategies are doing the right things. Case studies include:
- Intelligent query timeouts that allow loaded workers to recover
- A backoff and jitter system for controlling thundering-herd on an internal service
- Watermark-based quota system for shaping traffic on a multitenant cluster
We will show that using open-source tooling, and good observability practices, you can make an opaque part of your system that is operationally taxing into a well-behaved component, that remediates itself. We take a very visual approach to telling these stories - so expect graphs and lot of them!
Ultimately, we want to give audience a framework and strategy to answer these questions:
- Is an ops procedure worth automating?
- How to get good feedback from internal telemetry in your application?
- How to use this feedback to drive auto-remediation?
- And most importantly, how to experiment on all this, without breaking production :)
Udit Kumar, software engineer at Nilenso
Scalable distributed systems from grounds up in Elixir
The outline of the talk would be roughly as follows:
1. Discussion of the erlang/elixir process model
2. Discussion of GenServers (other OTP constructs if required)
3. Discussion on distributed erlang
4. Cycling through live game show architecutures
Vivek Sridhar, senior product marketing manager at Microsoft
Virtual nodes to auto-scale applications on Kubernetes
Let’s see a demo application using ServiceMonitor for Prometheus, a HPA, and a custom container that will count the instances of the application and expose them to Prometheus. Finally, Grafana dashboard to view the metrics in real-time.
Deepak Goyal, software engineer at Walmart Labs
Kafka streams at scale
Problem Statement: Stateful Realtime Processing of multi-million events.
- Intro Kafka Streams and event flow (2 slides)
- Challenges in Kafka Streams
a. Fault Recovery
b. Horizontal Scalability
c. Cloud Readiness
d. Restricted RocksDB
e. Large Clusters
- Lay a background on why are these a challenge.
- How we forked the code to solve each of these over the past year.
- Future Works
T K Sourabh, performance engineer at Redhat
eBPF: exploring use case of BPF kernel infrastructure
- Introduction to eBPF
- What is eBPF?
- tcpdump: Beginning of BPF
- eBPF use cases
- Networking: XDP, AF_XDP
- Why XDP: Comparison/benchmark data bw existing packet filtering(iptables) and high performance network data path(DPDK)
- When should you use XDP: Practical use-case in production.
- Tracing: bcc-tools, Bpftrace
- Overview of ebpf tracing tools and existing tracing tools.
- Writing bpftrace program to trace kernel
- Other scenarios and developing your own use-case
- Networking: XDP, AF_XDP
- eBPF Architecture
- Overview of in-kernel vm and verifier.
- How does a eBPF program looks like?
- Security aspect of eBPF
- Features of ebpf bytecode verifier
- Seccomp BPF
- What is eBPF?
- Q & A
- Introduction to eBPF