Unavailable

This livestream is restricted

Already a member? Login with your membership email address

Tickets

Loading…

Nishant Bangarwa

@nishantbangarwa BOF Facilitator

Mayur Singal

@mayur_singal BOF facilitator

Jayaram Kasi Visweswaran

@jayaramkasi Fifthel Winter Editor

Summary of Birds of Feather (BOF) session - exploring the semantic layer underlying your data

Submitted Dec 21, 2025

At the Birds-of-a-Feather (BOF) session held at Fifthel Winter on 4 Dec, we dug into a deceptively simple question: How do we actually give data meaning -- especially now that more analytics is flowing through natural language and AI tools?

It was a room full of practitioners sharing experiences: what’s working in production, what’s been painful, and what teams are doing to earn trust when the interface is “just ask a question.” How teams define and maintain metrics, how they reduce hallucinations in natural-language analytics, how they validate and audit AI answers, and how they enforce security without slowing everything down.

A recurring theme emerged: the “semantic layer” isn’t a single product or file - it’s an evolving system of definitions, workflows, guardrails, and human review that must balance governance, speed, cost, and trust.

BOF photo

Notable insights and discussion highlights

  1. Semantic layers are socio-technical: the “human problems” (alignment, definitions, ownership, feedback) are as important as the technical modeling.
  2. Schema drift is the enemy of trust: without drift monitoring and review workflows, AI systems silently degrade over time.
  3. Citations are necessary but not sufficient: they help users validate answers, but you still need a governance workflow to resolve disagreements about metric definitions.
  4. Root-cause analysis requires controlled exploration: it is powerful, but needs guardrails to prevent infinite branching, high costs, and misleading narratives.
  5. Tool-first architecture scales better than prompt stuffing: treat catalogs, dictionaries, and access checks as callable tools rather than prompt payloads.

Key topics discussed

  1. Tooling people are using today (and what breaks)
    Participants referenced metrics-layer–style approaches (e.g., cubes / semantic models), catalogs and metadata systems (open metadata/data catalogs), and LLM-enabled “data discovery” pipelines that ingest schema and documentation to build a usable dictionary of concepts.

What broke in practice:

  • Creation and maintenance overhead: defining metrics and semantic models is still very manual, and maintaining them over time is a major blocker to launching AI analytics internally.
  • A semantic layer without strong foundations is fragile: if the warehouse is large (hundreds of tables, terabytes, 100–200M events), the “definition debt” becomes real operational drag.

Key insight: Most teams don’t fail because they lack a semantic layer idea - they fail because they can’t sustain the ongoing work of maintaining definitions across schema drift, ownership changes, and evolving business logic.

  1. Hallucinations: reducing wrong answers and wrong meanings
    The group quickly converged on hallucinations as the most painful failure mode - especially when LLMs infer “semantic meaning” from cryptic column names, inconsistent IDs, and legacy schemas.
    Patterns that helped:
  • Human-in-the-loop review for inferred semantics, especially during schema changes or for ambiguous sources (e.g., BigQuery tables with nested JSON and frequent drift).
  • Schema drift detection and alerting → review queue: when the schema changes, generate an alert, route it to a human, and update the dictionary and definitions.
  • Better context improves accuracy: clear naming (“transaction_id” vs. “TID”), better column descriptions, and pipeline documentation materially reduce hallucinations.
  • Copilot mentality for the “first draft”: let the LLM propose an initial dictionary or model, then treat it as editable scaffolding rather than truth.

Key insight: Naming things well helps a lot - but many organizations can’t rename everything, so the semantic system has to compensate with documentation, modeling, and review workflows.

  1. Getting to “initial definitions” without boiling the ocean
    A practical tension emerged: invest heavily up front to avoid hallucinations, or start imperfectly and iterate with user feedback.

Two viable approaches discussed:
a. Up-front investment (interviews, onboarding, curated context) to ensure first impressions are good - because if the first answers are wrong, business users quickly lose trust.
b. Progressive hardening: launch with “beta” semantics, collect feedback, then formalize definitions through a governed workflow.

Conclusion: There’s no free lunch. Teams either pay up front (curation and modeling) or pay later (trust repair and manual corrections). The best teams do a small up-front “minimum viable semantics” and then iterate with strong feedback loops.

  1. Validating AI answers: citations, explain plans, and “show your work”
    Participants emphasized that trust requires verifiability—especially as analytics moves into natural-language interfaces.

Ideas that resonated:

  • Citations as a baseline: point users to the underlying tables, queries, or evidence so they can verify quickly.
  • Explain-plan–style checks (or query analysis tools) as guardrails: have a toolchain step that inspects the SQL plan or cost and flags dangerous scans or inefficiencies.
  • Golden queries / exemplars: provide known-good query patterns to steer the model toward house standards—while acknowledging they can bias the model and sometimes harm open-ended tasks like root-cause analysis.
  • Key nuance: These guardrails improve correctness, but they also add latency and cost—so teams need to be selective and pragmatic.
  1. Performance and cost: the hidden constraint on “safe AI analytics”
    A clear operational reality emerged: dashboards are typically sub-second, but AI answers can take 30–60 seconds once you add reasoning steps, prompt transforms, retrieval, validation, optimization, and retries.

Cost and performance tactics discussed:

  • Context engineering: don’t stuff a 1,000-table catalog into the prompt; expose catalog lookup as tools and let the LLM query only what it needs.
  • Pre-aggregation and rollups (database-side) to reduce downstream query cost.
  • Memory and pattern reuse: most user questions are repetitive; store successful query patterns and reuse them with small substitutions.
  • Limit loops and iterations: enforce caps on agent exploration (e.g., 5–10 iterations for root-cause analysis) to prevent runaway costs.

Key takeaway: The best systems don’t try to be maximally intelligent on every query. They optimize for the “90% daily questions” and reserve heavier reasoning workflows for power users.

  1. Security and governance across layers (API and DB)
    The group discussed the core enterprise requirement: users must only see what they’re entitled to see—whether they query through an LLM interface, an API, or directly in the warehouse.

Common strategies shared:

  • No direct DB access for end users: enforce policy through an API or metrics layer so all access is mediated.
  • Catalog-aware authorization: the system checks the catalog’s access model before querying anything.
  • DB-enforced permissions where possible: use database-level security features (where supported) to ensure even tool calls cannot bypass policy.
  • Group-based access mapping: map JWT claims (groups, roles, territories) into query constraints or policy context to avoid giving broad access.
  • Hard constraints on query types: enforce read-only SQL, apply rate and resource limits, and prevent destructive queries.

Key takeaway: Security is easiest when it’s designed end to end. “Bolting it on” later creates inconsistent rules between the semantic layer, API, and warehouse.

Conclusions from the discussion

  • You don’t “buy” a semantic layer - you operate one. Success depends on maintenance workflows (drift detection, review, versioning), not just an initial model.
  • Trust comes from verifiability and governance. Citations, explain plans, and feedback → PR-style approval were repeatedly seen as credible patterns.
  • Pragmatism wins. Teams should optimize for common questions, reuse patterns, and be selective about expensive guardrails to keep latency acceptable.
  • Security must be end to end. The safest approach is mediated access through an API or metrics layer plus DB-enforced constraints where feasible.

Key takeaways

  • Start with “minimum viable semantics” (core metrics and key entities), then expand through governed iteration.
  • Use LLMs to draft; use humans to approve. Human-in-the-loop isn’t a failure - it’s the control system.
  • Invest in drift monitoring so your semantic dictionary doesn’t decay as the warehouse evolves.
  • Make answers auditable: citations, query visibility, and clear definitions reduce disputes and rebuild trust fast.
  • Avoid prompt bloat: expose catalogs and dictionaries as tools; retrieve only what’s needed.
  • Manage cost and latency deliberately: cap agent loops, reuse patterns, and pre-aggregate where it counts.
  • Keep security consistent across layers: authorization checks before query execution, plus least-privilege enforcement.

Image credits: Guru Pratap volunteered to make pictures of BOF sessions at The Fifth Elephant Winter edition.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

Jumpstart better data engineering and AI futures

Supported by

Masterclass sponsorship

Thoughtworks is a global technology consultancy that integrates strategy, design and engineering to drive digital innovation.

Round table partners

Law and public policy firm with a sharp focus on tech and innovation