Rootconf 2019

Rootconf 2019

On infrastructure security, DevOps and distributed systems.

About Rootconf 2019:

The seventh edition of Rootconf is a two-track conference with:

  1. Security talks and tutorials in audi 1 and 2 on 21 June.
  2. Talks on DevOps, distributed systems and SRE in audi 1 and audi 2 on 22 June.

Topics and schedule:

View full schedule here: https://hasgeek.com/rootconf/2019/schedule

Rootconf 2019 includes talks and Birds of Feather (BOF) sessions on:

  1. OSINT and its applications
  2. Key management, encryption and its costs
  3. Running a bug bounty programme in your organization
  4. PolarDB architecture as Cloud Native Architecture, developed by Alibaba Cloud
  5. Vitess
  6. SRE and running distributed teams
  7. Routing security
  8. Log analytics
  9. Enabling SRE via automated feedback loops
  10. TOR for DevOps

Who should attend Rootconf?

  1. DevOps programmers
  2. DevOps leads
  3. Systems engineers
  4. Infrastructure security professionals and experts
  5. DevSecOps teams
  6. Cloud service providers
  7. Companies with heavy cloud usage
  8. Providers of the pieces on which an organization’s IT infrastructure runs – monitoring, log management, alerting, etc
  9. Organizations dealing with large network systems where data must be protected
  10. VPs of engineering
  11. Engineering managers looking to optimize infrastructure and teams

For information about Rootconf and bulk ticket purchases, contact info@hasgeek.com or call 7676332020. Only community sponsorships available.

Rootconf 2019 sponsors:

Platinum Sponsor

CRED

Gold Sponsors

Atlassian Endurance Trusting Social

Silver Sponsors

Digital Ocean GO-JEK Paytm

Bronze Sponsors

MySQL sumo logic upcloud
platform sh nilenso CloudSEK

Exhibition Sponsor

FreeBSD Foundation

Community Sponsors

Ansible PlanetScale

Hosted by

Rootconf is a forum for discussions about DevOps, infrastructure management, IT operations, systems engineering, SRE and security (from infrastructure defence perspective). more

DEEPAK GOYAL

@zonker

Kafka Streams at Scale

Submitted Mar 5, 2019

http://Walmart.com generates millions of events per second. At WalmartLabs, I’m working in a team called the Customer Backbone (CBB), where we wanted to build a platform capable of processing this event volume in real-time and store the state/knowledge of possibly all the Walmart Customers generated by the processing. Kafka streams’ event-driven architecture seemed like the only obvious choice. However, there are a few challenges w.r.t. Walmart’s scale:
the clusters need to be large and the problems thereof
infinite retention of changelog topics, wasting valuable disk
slow stand-by task recovery in case of a node failure (changelog topics have GBs of data)
no repartitioning in Kafka Streams

As part of the event-driven development and addressing the challenges above, I’m going to talk about some bold ideas that we developed as features/patches to Kafka Streams to deal with the scale required at Walmart.
Cold Bootstrap: Where in case of a Kafka Streams node failure, how instead of recovering from the change-log topic, we bootstrap the standby from active’s RocksDB using JSch and zero event loss by careful offset management.
Dynamic Repartitioning: We added support for repartitioning in Kafka Streams where state is distributed among the new partitions. We can now elastically scale to any number of partitions and any number of nodes.
Cloud/Rack/AZ aware task assignment: No active and standby tasks of the same partition are assigned to the same rack.
Decreased Partition Assignment Size: With large clusters like ours (>400 nodes and 3 stream threads per node), the size of Partition Assignment of the KS cluster being few 100MBs, it takes a lot of time to settle a rebalance.

Key Takeaways:
Basic understanding of Kafka Streams.
Productionizing Kafka Streams at large scale.
Using Kafka Streams as Distributed NoSQL DB.

Outline

Problem Statement: Stateful Realtime Processing of multi-million events.

  1. Intro Kafka Streams and event flow (2 slides)
  2. Challenges in Kafka Streams
    a. Fault Recovery
    b. Horizontal Scalability
    c. Cloud Readiness
    d. Restricted RocksDB
    e. Large Clusters
  3. Lay a background on why are these a challenge.
  4. How we forked the code to solve each of these over the past year.
  5. Conclusion
  6. Future Works

Speaker bio

Deepak is working at Walmart Labs as a software engineer in the Customer Backbone team where multi-millions events need to be processed in real-time. He’s promoting the event-driven architecture, enabled by a “Distributed NoSQL DB and Streaming Platform” based on Kafka Streams that the team is working on. He’s now working on a Distributed Graph Algorithm using the event-driven architecture.

Slides

https://www.slideshare.net/DeepakGoyal25/kafka-streams-at-scale-preview

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Rootconf is a forum for discussions about DevOps, infrastructure management, IT operations, systems engineering, SRE and security (from infrastructure defence perspective). more