Rootconf 2018

On scaling infrastructure and operations

Building and running 200K msgs/sec WebSockets platform @ Helpshift


Kapil Reddy


I will talk about how we built and maintained a WebSockets platform on AWS infra.
You can expect to have insights about,

  • How to build and evovle a WebSockets platform on AWS
  • How we made the platform more resilient to failures known and unknown
  • How we saved costs by using right strategy for auto-scaling and load balancing
  • How to monitor a WebSockets platform


  • Building / Running a high scale websockets service on AWS
    • Building and Evolving
      • JVM + Clojure + http-kit
        • Websocket server
      • ZMQ
        • Transporting messages
        • ZMQ patterns
      • Zookeeper
        • Using it with ZMQ brokers
      • How it all fits together? Overview of Architecture
    • Monitoring
      • Statsd + Grafana
      • Debugging and Audit patterns using Grafana and Sensu
    • Compression and Costs
      • gzip support for websockets to save costs
    • Auto-Scaling
      • Load balancing using least load and Herald
      • Herald is an internal system which does feedback load balancing
    • Conclusion
      • Long running connections poses different scaling challanges
      • Look for performance impacting metrics instead of number of connection when scaling and load balancing
      • Auditing and debugging is difficult for ephemeral data but it is important for quality of product
      • Get the protocol right so adding new capability becomes simple

Speaker bio

Staff Engineer @Helpshift | I love to code on server, client and everything between! Like movies,anime,books/manga, Clojure,JS,mecha,GITS,algos,good music