Logging at scale using Graylog - Billion+ messages, 100K req/sec
Submitted by Rohit Gupta (@rohit01) on Friday, 29 April 2016
With the advent of micro-services, dozens of releases per day, logs are the bread and butter for a successful real-time technology platform like OlaCabs. In this talk, I would be presenting our logging pipeline and the challenges we faced while doing it at Ola scale.
- Infrastructure Overview:
- Hundreds of micro-services
- 100k requests per second
- The logging pipeline: Kafka, Graylog & Elasticsearch
- Scalability Issues, Resolution and lessons learnt:
- Huge Lag for Application logs in Graylog UI
- Docker service crashing due to Fluentd log driver
- Exceptions in Graylog server due to 3MB log messages
- Journal Utilisation too high, uncommitted messages deleted from journal - Part I
- Journal Utilisation too high, uncommitted messages deleted from journal - Part II
- Slow Output compared to Inputs from Kafka
Rohit is a technologist, explorer and a proud Indian.
As a technologist, he has worked in the area of convergence of telephony over the web. He has worked as an infrastructure engineer, product developer, did customer support, helped built teams and also led a team of 10 DevOps developers. He loves open-source, started a Linux User Group at his university and has an active Github profile.
As an explorer, he loves to try out new things. He loves trekking, photography, challenged himself with vipassana meditation and long distance bike tours.
You can find more about him at www.rohit.io