At the Birds-of-a-Feather (BOF) session held at Fifthel Winter on 4 Dec, we dug into a deceptively simple question: How do we actually give data meaning -- especially now that more analytics is flowing through natural language and AI tools?
It was a room full of practitioners sharing experiences: what’s working in production, what’s been painful, and what teams are doing to earn trust when the interface is “just ask a question.” How teams define and maintain metrics, how they reduce hallucinations in natural-language analytics, how they validate and audit AI answers, and how they enforce security without slowing everything down.
A recurring theme emerged: the “semantic layer” isn’t a single product or file - it’s an evolving system of definitions, workflows, guardrails, and human review that must balance governance, speed, cost, and trust.

- Semantic layers are socio-technical: the “human problems” (alignment, definitions, ownership, feedback) are as important as the technical modeling.
- Schema drift is the enemy of trust: without drift monitoring and review workflows, AI systems silently degrade over time.
- Citations are necessary but not sufficient: they help users validate answers, but you still need a governance workflow to resolve disagreements about metric definitions.
- Root-cause analysis requires controlled exploration: it is powerful, but needs guardrails to prevent infinite branching, high costs, and misleading narratives.
- Tool-first architecture scales better than prompt stuffing: treat catalogs, dictionaries, and access checks as callable tools rather than prompt payloads.
- Tooling people are using today (and what breaks)
Participants referenced metrics-layer–style approaches (e.g., cubes / semantic models), catalogs and metadata systems (open metadata/data catalogs), and LLM-enabled “data discovery” pipelines that ingest schema and documentation to build a usable dictionary of concepts.
What broke in practice:
- Creation and maintenance overhead: defining metrics and semantic models is still very manual, and maintaining them over time is a major blocker to launching AI analytics internally.
- A semantic layer without strong foundations is fragile: if the warehouse is large (hundreds of tables, terabytes, 100–200M events), the “definition debt” becomes real operational drag.
Key insight: Most teams don’t fail because they lack a semantic layer idea - they fail because they can’t sustain the ongoing work of maintaining definitions across schema drift, ownership changes, and evolving business logic.
- Hallucinations: reducing wrong answers and wrong meanings
The group quickly converged on hallucinations as the most painful failure mode - especially when LLMs infer “semantic meaning” from cryptic column names, inconsistent IDs, and legacy schemas.
Patterns that helped:
- Human-in-the-loop review for inferred semantics, especially during schema changes or for ambiguous sources (e.g., BigQuery tables with nested JSON and frequent drift).
- Schema drift detection and alerting → review queue: when the schema changes, generate an alert, route it to a human, and update the dictionary and definitions.
- Better context improves accuracy: clear naming (“transaction_id” vs. “TID”), better column descriptions, and pipeline documentation materially reduce hallucinations.
- Copilot mentality for the “first draft”: let the LLM propose an initial dictionary or model, then treat it as editable scaffolding rather than truth.
Key insight: Naming things well helps a lot - but many organizations can’t rename everything, so the semantic system has to compensate with documentation, modeling, and review workflows.
- Getting to “initial definitions” without boiling the ocean
A practical tension emerged: invest heavily up front to avoid hallucinations, or start imperfectly and iterate with user feedback.
Two viable approaches discussed:
a. Up-front investment (interviews, onboarding, curated context) to ensure first impressions are good - because if the first answers are wrong, business users quickly lose trust.
b. Progressive hardening: launch with “beta” semantics, collect feedback, then formalize definitions through a governed workflow.
Conclusion: There’s no free lunch. Teams either pay up front (curation and modeling) or pay later (trust repair and manual corrections). The best teams do a small up-front “minimum viable semantics” and then iterate with strong feedback loops.
- Validating AI answers: citations, explain plans, and “show your work”
Participants emphasized that trust requires verifiability—especially as analytics moves into natural-language interfaces.
Ideas that resonated:
- Citations as a baseline: point users to the underlying tables, queries, or evidence so they can verify quickly.
- Explain-plan–style checks (or query analysis tools) as guardrails: have a toolchain step that inspects the SQL plan or cost and flags dangerous scans or inefficiencies.
- Golden queries / exemplars: provide known-good query patterns to steer the model toward house standards—while acknowledging they can bias the model and sometimes harm open-ended tasks like root-cause analysis.
- Key nuance: These guardrails improve correctness, but they also add latency and cost—so teams need to be selective and pragmatic.
- Performance and cost: the hidden constraint on “safe AI analytics”
A clear operational reality emerged: dashboards are typically sub-second, but AI answers can take 30–60 seconds once you add reasoning steps, prompt transforms, retrieval, validation, optimization, and retries.
Cost and performance tactics discussed:
- Context engineering: don’t stuff a 1,000-table catalog into the prompt; expose catalog lookup as tools and let the LLM query only what it needs.
- Pre-aggregation and rollups (database-side) to reduce downstream query cost.
- Memory and pattern reuse: most user questions are repetitive; store successful query patterns and reuse them with small substitutions.
- Limit loops and iterations: enforce caps on agent exploration (e.g., 5–10 iterations for root-cause analysis) to prevent runaway costs.
Key takeaway: The best systems don’t try to be maximally intelligent on every query. They optimize for the “90% daily questions” and reserve heavier reasoning workflows for power users.
- Security and governance across layers (API and DB)
The group discussed the core enterprise requirement: users must only see what they’re entitled to see—whether they query through an LLM interface, an API, or directly in the warehouse.
Common strategies shared:
- No direct DB access for end users: enforce policy through an API or metrics layer so all access is mediated.
- Catalog-aware authorization: the system checks the catalog’s access model before querying anything.
- DB-enforced permissions where possible: use database-level security features (where supported) to ensure even tool calls cannot bypass policy.
- Group-based access mapping: map JWT claims (groups, roles, territories) into query constraints or policy context to avoid giving broad access.
- Hard constraints on query types: enforce read-only SQL, apply rate and resource limits, and prevent destructive queries.
Key takeaway: Security is easiest when it’s designed end to end. “Bolting it on” later creates inconsistent rules between the semantic layer, API, and warehouse.
- You don’t “buy” a semantic layer - you operate one. Success depends on maintenance workflows (drift detection, review, versioning), not just an initial model.
- Trust comes from verifiability and governance. Citations, explain plans, and feedback → PR-style approval were repeatedly seen as credible patterns.
- Pragmatism wins. Teams should optimize for common questions, reuse patterns, and be selective about expensive guardrails to keep latency acceptable.
- Security must be end to end. The safest approach is mediated access through an API or metrics layer plus DB-enforced constraints where feasible.
Key takeaways
- Start with “minimum viable semantics” (core metrics and key entities), then expand through governed iteration.
- Use LLMs to draft; use humans to approve. Human-in-the-loop isn’t a failure - it’s the control system.
- Invest in drift monitoring so your semantic dictionary doesn’t decay as the warehouse evolves.
- Make answers auditable: citations, query visibility, and clear definitions reduce disputes and rebuild trust fast.
- Avoid prompt bloat: expose catalogs and dictionaries as tools; retrieve only what’s needed.
- Manage cost and latency deliberately: cap agent loops, reuse patterns, and pre-aggregate where it counts.
- Keep security consistent across layers: authorization checks before query execution, plus least-privilege enforcement.
Image credits: Guru Pratap volunteered to make pictures of BOF sessions at The Fifth Elephant Winter edition.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}