Jan 2026
26 Mon
27 Tue
28 Wed
29 Thu
30 Fri
31 Sat 11:00 AM – 12:30 PM IST
1 Sun
Prachi Jha
@prachiAwesomeJha
Submitted Jan 30, 2026
When building systems that handle many concurrent connections, TCP drops can happen silently at the kernel level. I discovered this while stress testing a Go auction server—spawning 1000 concurrent clients caused TCP_LISTEN_OVERFLOW drops because connections arrived faster than the server could accept them.
Traditional debugging was frustrating:
tcpdump was too slow to run during load testingnetstat -s showed system-wide drop counters but couldn’t tell me which process or connection was affectedI needed real-time, per-process visibility into TCP drops to understand what was happening during high load.
Why this matters for distributed systems:
When services experience network issues (database connections timing out, microservice communication failures, container networking problems), the root cause is often kernel-level TCP drops that don’t surface in application logs.
I built an eBPF-based TCP drop monitor that hooks into the Linux kernel’s kfree_skb tracepoint (where packets get dropped). It captures drop events in the kernel, transfers them via ring buffer, and provides real-time output with process attribution.
Architecture:
kfree_skb, captures PID, drop reason code, and kernel function address/proc/kallsyms binary search, formats output with 256KB bufferDrop reasons detected:
TCP_LISTEN_OVERFLOW - listen queue overflowed (my stress test case)TCP_RETRANSMIT - retransmit failuresNETFILTER_DROP - firewall dropsTCP_CSUM - checksum errorsCaveats:
kfree_skb is a hot path. At extreme drop rates, monitoring overhead becomes non-trivialExample output:
[15:04:23] Drop | PID: 1234 | Reason: TCP_LISTEN_OVERFLOW | Function: tcp_v4_syn_rcv+0x8c
Now you know: Process 1234 is experiencing TCP drops. You can investigate the specific process and connection.
Demo: Live TCP drop detection using SYN flood to trigger TCP_LISTEN_OVERFLOW drops, showing real-time per-process monitoring.
1. eBPF for Distributed Systems Debugging
2. When to Use eBPF
Key insight: eBPF gives you visibility into kernel events, but attributing those events to userspace processes is tricky. For my use case (debugging listen queue overflows), the PID is usually correct. For other scenarios, you need to validate.
3. Real-World Applications
4. Production Considerations
Honest assessment: This tool helped me debug my load testing issue. For production use at scale, you’d want kernel-side aggregation and more rigorous testing of the PID attribution accuracy.
Key lesson: When distributed systems show symptoms but application logs don’t explain why, drop down to the kernel level. TCP drops are often the smoking gun.
Project status: Working prototype, will be open-sourced on GitHub. This is a learning share and debugging tool, not a product pitch.
Hosted by
Supported by
Venue host
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}