The Fifth Elephant

The Fifth Elephant 2025 Winter Edition

On AI disrupting analytics

Dec 2025

1 Mon

2 Tue

3 Wed

4 Thu 09:00 AM – 05:15 PM IST

5 Fri

6 Sat

7 Sun

Samarthanam Auditorium, Bengaluru,

Tickets

All submissions

Summary of Birds of Feather (BOF) session - exploring the semantic layer underlying your data

Submitted Dec 21, 2025

At the Birds-of-a-Feather (BOF) session held at Fifthel Winter on 4 Dec, we dug into a deceptively simple question: How do we actually give data meaning -- especially now that more analytics is flowing through natural language and AI tools?

It was a room full of practitioners sharing experiences: what’s working in production, what’s been painful, and what teams are doing to earn trust when the interface is “just ask a question.” How teams define and maintain metrics, how they reduce hallucinations in natural-language analytics, how they validate and audit AI answers, and how they enforce security without slowing everything down.

A recurring theme emerged: the “semantic layer” isn’t a single product or file - it’s an evolving system of definitions, workflows, guardrails, and human review that must balance governance, speed, cost, and trust.

BOF photo

Notable insights and discussion highlights

Semantic layers are socio-technical: the “human problems” (alignment, definitions, ownership, feedback) are as important as the technical modeling.
Schema drift is the enemy of trust: without drift monitoring and review workflows, AI systems silently degrade over time.
Citations are necessary but not sufficient: they help users validate answers, but you still need a governance workflow to resolve disagreements about metric definitions.
Root-cause analysis requires controlled exploration: it is powerful, but needs guardrails to prevent infinite branching, high costs, and misleading narratives.
Tool-first architecture scales better than prompt stuffing: treat catalogs, dictionaries, and access checks as callable tools rather than prompt payloads.

Key topics discussed

Tooling people are using today (and what breaks)
Participants referenced metrics-layer–style approaches (e.g., cubes / semantic models), catalogs and metadata systems (open metadata/data catalogs), and LLM-enabled “data discovery” pipelines that ingest schema and documentation to build a usable dictionary of concepts.

What broke in practice:

Creation and maintenance overhead: defining metrics and semantic models is still very manual, and maintaining them over time is a major blocker to launching AI analytics internally.
A semantic layer without strong foundations is fragile: if the warehouse is large (hundreds of tables, terabytes, 100–200M events), the “definition debt” becomes real operational drag.

Key insight: Most teams don’t fail because they lack a semantic layer idea - they fail because they can’t sustain the ongoing work of maintaining definitions across schema drift, ownership changes, and evolving business logic.

Hallucinations: reducing wrong answers and wrong meanings
The group quickly converged on hallucinations as the most painful failure mode - especially when LLMs infer “semantic meaning” from cryptic column names, inconsistent IDs, and legacy schemas.
Patterns that helped:

Human-in-the-loop review for inferred semantics, especially during schema changes or for ambiguous sources (e.g., BigQuery tables with nested JSON and frequent drift).
Schema drift detection and alerting → review queue: when the schema changes, generate an alert, route it to a human, and update the dictionary and definitions.
Better context improves accuracy: clear naming (“transaction_id” vs. “TID”), better column descriptions, and pipeline documentation materially reduce hallucinations.
Copilot mentality for the “first draft”: let the LLM propose an initial dictionary or model, then treat it as editable scaffolding rather than truth.

Key insight: Naming things well helps a lot - but many organizations can’t rename everything, so the semantic system has to compensate with documentation, modeling, and review workflows.

Getting to “initial definitions” without boiling the ocean
A practical tension emerged: invest heavily up front to avoid hallucinations, or start imperfectly and iterate with user feedback.

Two viable approaches discussed:
a. Up-front investment (interviews, onboarding, curated context) to ensure first impressions are good - because if the first answers are wrong, business users quickly lose trust.
b. Progressive hardening: launch with “beta” semantics, collect feedback, then formalize definitions through a governed workflow.

Conclusion: There’s no free lunch. Teams either pay up front (curation and modeling) or pay later (trust repair and manual corrections). The best teams do a small up-front “minimum viable semantics” and then iterate with strong feedback loops.

Validating AI answers: citations, explain plans, and “show your work”
Participants emphasized that trust requires verifiability—especially as analytics moves into natural-language interfaces.

Ideas that resonated:

Citations as a baseline: point users to the underlying tables, queries, or evidence so they can verify quickly.
Explain-plan–style checks (or query analysis tools) as guardrails: have a toolchain step that inspects the SQL plan or cost and flags dangerous scans or inefficiencies.
Golden queries / exemplars: provide known-good query patterns to steer the model toward house standards—while acknowledging they can bias the model and sometimes harm open-ended tasks like root-cause analysis.
Key nuance: These guardrails improve correctness, but they also add latency and cost—so teams need to be selective and pragmatic.

Performance and cost: the hidden constraint on “safe AI analytics”
A clear operational reality emerged: dashboards are typically sub-second, but AI answers can take 30–60 seconds once you add reasoning steps, prompt transforms, retrieval, validation, optimization, and retries.

Cost and performance tactics discussed:

Context engineering: don’t stuff a 1,000-table catalog into the prompt; expose catalog lookup as tools and let the LLM query only what it needs.
Pre-aggregation and rollups (database-side) to reduce downstream query cost.
Memory and pattern reuse: most user questions are repetitive; store successful query patterns and reuse them with small substitutions.
Limit loops and iterations: enforce caps on agent exploration (e.g., 5–10 iterations for root-cause analysis) to prevent runaway costs.

Key takeaway: The best systems don’t try to be maximally intelligent on every query. They optimize for the “90% daily questions” and reserve heavier reasoning workflows for power users.

Security and governance across layers (API and DB)
The group discussed the core enterprise requirement: users must only see what they’re entitled to see—whether they query through an LLM interface, an API, or directly in the warehouse.

Common strategies shared:

No direct DB access for end users: enforce policy through an API or metrics layer so all access is mediated.
Catalog-aware authorization: the system checks the catalog’s access model before querying anything.
DB-enforced permissions where possible: use database-level security features (where supported) to ensure even tool calls cannot bypass policy.
Group-based access mapping: map JWT claims (groups, roles, territories) into query constraints or policy context to avoid giving broad access.
Hard constraints on query types: enforce read-only SQL, apply rate and resource limits, and prevent destructive queries.

Key takeaway: Security is easiest when it’s designed end to end. “Bolting it on” later creates inconsistent rules between the semantic layer, API, and warehouse.

Conclusions from the discussion

You don’t “buy” a semantic layer - you operate one. Success depends on maintenance workflows (drift detection, review, versioning), not just an initial model.
Trust comes from verifiability and governance. Citations, explain plans, and feedback → PR-style approval were repeatedly seen as credible patterns.
Pragmatism wins. Teams should optimize for common questions, reuse patterns, and be selective about expensive guardrails to keep latency acceptable.
Security must be end to end. The safest approach is mediated access through an API or metrics layer plus DB-enforced constraints where feasible.

Key takeaways

Start with “minimum viable semantics” (core metrics and key entities), then expand through governed iteration.
Use LLMs to draft; use humans to approve. Human-in-the-loop isn’t a failure - it’s the control system.
Invest in drift monitoring so your semantic dictionary doesn’t decay as the warehouse evolves.
Make answers auditable: citations, query visibility, and clear definitions reduce disputes and rebuild trust fast.
Avoid prompt bloat: expose catalogs and dictionaries as tools; retrieve only what’s needed.
Manage cost and latency deliberately: cap agent loops, reuse patterns, and pre-aggregate where it counts.
Keep security consistent across layers: authorization checks before query execution, plus least-privilege enforcement.

Image credits: Guru Pratap volunteered to make pictures of BOF sessions at The Fifth Elephant Winter edition.

All submissions