Rootconf 2025 Annual Conference CfP

Rootconf 2025 Annual Conference CfP

Speak at Rootconf 2025 Annual Conference

Tickets

Loading…

Pravin Bange

The Hidden Complexity of Text-to-SQL: A Case Study from Cyber Security

Submitted Apr 16, 2025

At Uptycs, we tackled a massive challenge in data discovery accros security data lake with over 12,000 denormalized tables. While most Text-to-SQL systems perform reasonably well with a few well-structured tables, we found they struggle as schema size scales—resulting in incorrect table selection, missing joins, and hallucinated queries. Our solution, Ask Uptycs, leverages a combination of Retrieval-Augmented Generation (RAG), vector search, hierarchical clustering, and ontology-based join reasoning to ensure the right tables and columns are selected for each query.

In this talk, we’ll walk through how we built a production-grade Text-to-SQL pipeline for cybersecurity analytics. We’ll share our approach to handling noisy and overlapping schemas, using embeddings to retrieve relevant structures, applying hierarchical clustering for smarter narrowing, and building ontological relationships for accurate join logic. You’ll see real-world examples, mistakes made, and optimizations that helped us deliver a system where security analysts don’t need SQL knowledge but still get correct, performant queries.

Takeaways:

  • How to scale Text-to-SQL systems beyond toy schemas using retrieval and clustering.
  • Why ontology and real-world query patterns are essential for high-accuracy join inference.

Target Audience:
This session will benefit ML/NLP engineers, platform architects, and anyone building natural language interfaces for large, complex databases—especially in enterprise or domain-heavy environments like security, finance, or health tech.

Bio:
I work at Uptycs, where I build AI-driven data systems to simplify data processing and data driven investigations. My recent focus has been on bridging the gap between natural language and SQL for massive cybersecurity telemetry lakes using LLMs, embeddings, and structured reasoning.
I will be presenting with Anudeep who is leading data engineering team at Uptycs.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy