Handling logs, events and metrics using Heka
- Intended to benefit folks building and operating distributed systems
- Goal is to use one (clean, consistent and fast) pipeline for collecting data (logs, events, metrics) instead of using a mishmash of different technologies and tools (such as StatsD, Graphite, logstash, etc.)
For any decently big distributed / SoA system, good monitoring is a must for smooth operations. Services emit different data for diagnosibility and instrumentation - logs, events and metrics. The semantics of these 3 types of data are fundamentally different. For ex: Events must have millisecond latencies and not a single event can be dropped. Metrics can be missed, there will be more of them, and can be aggregated at slightly larger (say 1 min) latencies. Log collection can have a longer latency (~ 10 min) and volume is typically huge and ideally it must be indexed and archived.
Typically complex distributed systems use a combination of tools and architectures to collect these different data. For example, statsd (with Graphite or InfluxDB) are be used for metrics collection. Logstash takes care of logs and piped to Kibana. Custom solutions (typically on distributed queues) are built for events.
We, at Exotel, have built a single pipeline for data collection using Heka. Heka is an incredibly powerful and versatile data collection and processing framework developed by Mozilla. Using a pipeline built on top of Heka, we collect all 3 types of data in a consistent way. We have written a library (currently in Go) which the any service uses for logging, eventing and instrumentation. Goal of this talk is to explain our data pipeline architecture so that it is hopefully useful for others building and operating distributed systems.
Basic knowledge of statsd, graphite, Kafka, Logstash
I am Co-founder and CTO @ Exotel. And prior to Exotel, I was with Microsoft. I love distributed systems - building, scaling, making them robust and performant, and most importantly, maintainable.