The Fifth Elephant

The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Bangalore International Centre, Bangalore

All submissions

Previous Next

Chat with Tables: Query tabular data in English using self-hosted Large Language Models

Submitted May 15, 2024

Session type: Workshop

Business users and non-technical professionals often need to quickly analyse or transform tabular data in spreadsheets for ad hoc business intelligence. However, they might lack the necessary programming knowledge to do so themselves and therefore must reach out to a data analyst. Such unexpected delays have the potential to incur huge opportunity costs for time-sensitive business decisions which must be informed by accurate analysis of data.

Generative AI powered by Large Language Models (LLMs) is being used to create novel text, images, and even videos. LLMs specialising in generating code are already being used in enterprise solutions like GitHub Copilot, Gemini Code Assist by Google, watsonx by IBM, and Amazon Q Developer (previously Amazon CodeWhisperer) to boost productivity for developers and programmers. Along the same lines, there now exist LLMs specialising in generating Structured Query Language (SQL), which is widely used across enterprise domains to manage databases and analyse and transform tabular data.

Workshop Objective

In this workshop we demonstrate how to create a web application from scratch using Streamlit and Ollama which can be used to analyse and query CSV files using natural language and the power of LLMs.

Outline

Quick overview of the workshop
Demo of the application
Discussion on running LLMs locally for data privacy
Hands-on: Setting up Ollama model server
Hands-on: Setting up Streamlit and building quick interactive front-end applications
Hands-on: Pipeline for using natural language prompts to transform tabular data using CSV files
Hands-on: Data processing techniques like Prompt Pruning and Correcting LLM Hallucinations using Static Analysis with sqlglot
Discussion on how to create generic “Chat with X” capabilities

Intended Audience

This workshop is intended for data engineers, data scientists, and researchers with basic Python experience who are working on Generative AI use-cases and want to leverage enterprise data. This might also interest business analysts or business consumers who require data querying and analysis services regularly.

Overall, any professional with at least some experience with Python programming who is interested in getting started with Gen AI will stand to benefit from this workshop since it covers both the end-to-end data pipeline as well how to prepare a demo-worthy front-end user interface.

Takeaways

How to run LLMs locally or within your organisation network using Ollama
How to quickly develop interactive web applications using Streamlit
How to analyse tabular data in CSV format using English language queries
How to create “Chat with X” applications for other data formats

Additional Reading

Here are some links to open-source and proprietary products currently available which leverage LLMs to generate SQL and power database interactions.

Vanna: an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.
Dataherald: a natural language-to-SQL engine built for enterprise-level question answering over relational data.
ChatDB: build dashboards for your database with AI.
DB-GPT: an open source AI native data app development framework for building infrastructure in the field of large models.

All submissions

Previous Next

Comments

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Google

Together, we can build for everyone.

Workshop sponsor

Datastax

Datastax, the real-time AI Company.

Lanyard Sponsor

Uber

We reimagine the way the world moves for the better.

Sponsor

Monster API

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United Foundation

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.