The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Abhijeet Kumar

@abhijeet3922

Unlock Data with NL2SQL: Building Low-code Data Assistant for Business using Code LLMs

Submitted May 15, 2024

This BoF session talks about building low-code data assistant for business using code LLMs. Generative AI (LLMs) for codes had become very popular and powerful tool for developers to leverage with rise of enterprise solution like GitHub Copilot, AWS Code Whisperers, Google Duet etc along with numerous open source code assist models for generating codes for hundreds of programming languages including python, java, SQL etc. These models can take english comments or instructions to write code for developers.

Apart for improving productivity of developers, these models can also be used to create impactful NL2SQL solution in data space.

  • Allows business users (non-tech) to use natural language to answer day to day queries.
  • Speed up data analysis for insights & report generations.

CEO of Snowflake Inc. (cloud-based data company), Sridhar Ramaswamy in press briefing said, “Our dream here is, within a year, to have an API that our customers can use so that business users can directly talk to data,”. Natural language querying can enables various stakeholders — including executives, employees, customers, prospects, and partners — to pose questions about data in natural language and receive relevant responses.

Who is the Audience ?

  • Data enginners, Data scientists or researchers who are closer to enterprise data.
  • Business analysts or Business consumers who require data querying and analysis on day to day basis.
  • Data leaders as an initiative to democratize data for enterprises.

The talk will align to any professional working on LLMs & Gen AI usecases.

What problem are you trying to solve (for the audience)?

  • Many business users (non-tech) often struggle to fetch data from data lakes and systems which leads to higher effort from data teams to produce the same on regular basis. How can we automate the same ?
  • Many business and financial analyst perform ad-hoc analysis to answer business queries on the fly on downloaded Excel, CSVs data. How can we automate it ?
  • Data governance teams in enterprise validates data migration process with many test cases to ensure data quality. How can we reduce extensive effort in writing data queries ?
  • Data governance teams to create metadata in terms of sample SQL queries for Data Catalog.
  • The talk helps data scientists & data teams on going about the implementing such solution with right tools & techniques.

Scope

  1. Enroads to Enterprise: Introduction to an enterprise problem, potential use-cases and application of NL-2-SQL.

    • Adhoc Data Queries: Automation
    • Validation in Data Migration
    • Enriching Data Catalog
  2. State of Code LLMs:

    • Closed vs Open models.
    • Generalist vs Specialist models
  3. Practical challenges: Potential challneges in scaling and strategies in rescue.

    • Multi-Table scenario, How to find right tables in Snowflake ?
    • Few shot - Retrieval examples.
    • Metadata pruning.
    • Hallucination correction (SQL Glot)
    • Instructions in Prompt
    • Fine-Tuning
  4. Building Data Assistants: Implementation components & process flow.

Benefit & Key takeaways for Participants

  • A potential idea to democratize data to Business or Enterprise.
  • Practical details on solutioning the low code/No code data assistant capability to make data closer to business.
  • Current state of Code generative Large Language Models.
  • Indepth insights in effort and challenges in creating NL2SQL solution.
  • Techniques and Tools to implement such solution and overcome challenges.

Mind Map

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Together, we can build for everyone.

Workshop sponsor

Datastax, the real-time AI Company.

Lanyard Sponsor

We reimagine the way the world moves for the better.

Sponsor

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.