Sub-Second Decisions: Design Principles for Real-Time Go Workflow Engines on GCP for Martech Scale
Submitted Apr 20, 2025
Topic of your submission:
Serverless architectures
Type of submission:
30 mins talk
I am submitting for:
Rootconf Annual Conference 2025
Description:
Processing massive event streams for real-time Martech decisions, like session personalization, DCO, NBA, personalized ad targeting etc., frequently requires complex, stateful logic best modeled as workflows under strict sub-second latency, but traditional engines can introduce unacceptable overhead. Building such systems on modern cloud platforms presents challenges: how do you orchestrate stateful workflows efficiently without the overhead of generic engines? This talk dives into the practical design principles and considerations behind a high-throughput, low-latency workflow engine built in GoLang, tailored for the demands of real-time engagement & personalization technology.
We explore key architectural choices focusing on performance and reliability. Learn how lightweight Go concurrency patterns (using errgroup and goroutines) enable parallel execution of steps like external API calls or data lookups. We’ll detail our approach to state management in this highly concurrent environment, emphasizing thread-safe execution contexts and input immutability. Addressing reliability, we discuss the critical trade-off: implementing selective, lightweight checkpointing only at steps with significant side-effects (like external network I/O), while primarily leveraging the retry capabilities of the underlying cloud message queue (like Pub/Sub) for resilience against transient failures and infrastructure events within our sub-second execution window.
Finally, a performant system demands effective observability without hindering speed. We’ll discuss pragmatic approaches to implement efficient, real-time step counting via Redis Hashes (and the design choices considered), and a strategy for selective asynchronous tracing, where detailed execution state (including intermediate results) is captured and asynchronously published via a cloud messaging queue to a data warehouse (via Pub/Sub -> BigQuery/BigTable) to provide essential debugging information and DAG visualization while managing data volume, performance impact and cost constraints inherent in high-throughput, low-latency Adtech systems. This session focuses on the practical application of design principles when standard solutions don’t fit demanding real-time constraints.
Takeaways:
-
Apply key design principles for building low-latency (<1s), high-throughput Go workflow systems on cloud platforms, balancing concurrency, state management, and strategies like selective checkpointing versus source-level retries.
-
Implement practical, pragmatic observability (counting & selective state tracing) tailored for performance-critical, real-time environments, managing trade-offs between visibility, latency, and cost.
Audience Segment:
Backend Engineers, System Architects, Platform Engineers, SREs, and Go developers working on high-throughput, low-latency event processing systems on cloud platforms, especially within domains like Adtech/Martech, real-time bidding/data pipelines, or streaming data analysis where sub-second processing is critical.
Bio:
Name: Shubham Patil
Role: Senior Engineering Manager
Company: Zeotap
Bio: I am a Senior EM at Zeotap (a privacy-first leading CDP for EU marketeers) and a tech enthusiast, specializing in building scalable backend systems and real-time data infrastructure on major cloud platforms using Java & Go. I have extensive experience in Adtech, focusing on low-latency distributed systems and performance optimization, and am passionate about pragmatic engineering solutions.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}