Rootconf 2018

Rootconf 2018

On scaling infrastructure and operations

Kapil Reddy

@kapilr

Building and running 200K msgs/sec WebSockets platform @ Helpshift

Submitted Mar 7, 2018

I will talk about how we built and maintained a WebSockets platform on AWS infra.
You can expect to have insights about,

  • How to build and evovle a WebSockets platform on AWS
  • How we made the platform more resilient to failures known and unknown
  • How we saved costs by using right strategy for auto-scaling and load balancing
  • How to monitor a WebSockets platform

Outline

  • Building / Running a high scale websockets service on AWS
    • Building and Evolving
      • JVM + Clojure + http-kit
        • Websocket server
      • ZMQ
        • Transporting messages
        • ZMQ patterns
      • Zookeeper
        • Using it with ZMQ brokers
      • How it all fits together? Overview of Architecture
    • Monitoring
      • Statsd + Grafana
      • Debugging and Audit patterns using Grafana and Sensu
    • Compression and Costs
      • gzip support for websockets to save costs
    • Auto-Scaling
      • Load balancing using least load and Herald
      • Herald is an internal system which does feedback load balancing
    • Conclusion
      • Long running connections poses different scaling challanges
      • Look for performance impacting metrics instead of number of connection when scaling and load balancing
      • Auditing and debugging is difficult for ephemeral data but it is important for quality of product
      • Get the protocol right so adding new capability becomes simple

Speaker bio

Staff Engineer @Helpshift | I love to code on server, client and everything between! Like movies,anime,books/manga, Clojure,JS,mecha,GITS,algos,good music

Slides

https://speakerdeck.com/kapilr/building-and-scaling-a-websockets-pubsub-system

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy