BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//HasGeek//NONSGML Funnel//EN
DESCRIPTION:Hands-on workshop going from zero to a functional query engine
  - Rootconf Topical Edition on Databases
X-WR-CALDESC:Hands-on workshop going from zero to a functional query engin
 e - Rootconf Topical Edition on Databases
NAME:Build a SQL query engine from scratch
X-WR-CALNAME:Build a SQL query engine from scratch
REFRESH-INTERVAL;VALUE=DURATION:PT12H
SUMMARY:Build a SQL query engine from scratch
TIMEZONE-ID:Asia/Kolkata
X-PUBLISHED-TTL:PT12H
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
SUMMARY:Build a SQL query engine from scratch
DTSTART:20260612T080000Z
DTEND:20260612T120000Z
DTSTAMP:20260627T212705Z
UID:session/NNPRVgDuHrKS2Ck5Qp5v8t@hasgeek.com
SEQUENCE:30
CREATED:20260607T020618Z
DESCRIPTION:# 🚨 Venue changed. This workshop will take place at Sahaj S
 oftware.\n## Target audience\n- Engineers who use databases (especially an
 alytics databases) and want to understand what happens after they hit "run
 " on a query\n- No prior database internals knowledge required\n- We only 
 expect some basic familiarity with Python (see below for why python)\n\n##
  Workshop overview\nDuration: 4 hours (3 hours hands-on\, 40 mins of break
 s\, 20 mins of discussing real-world engines and questions)\n\nWe first lo
 ok at some simple SQL queries like\n- `select * from ...`\n- `select a\, b
  from ...`\n- `select ... where y > 0`\n- `select sum(x) ...`\n\nand write
  python scripts by hand for each them. This gives us a starting point for 
 _what_ the engine should do. We then start building a proper engine -- rea
 ding data from Parquet files\, SQL parsing\, "operator" model (aka Volcano
  model). We then build operators one-by-one: **projections**\, **filters**
 \, **aggregations** and **joins**.\n\nMid-way we switch the execution from
  row-based to **vectorized** (columnar)\, supported by real benchmarks. Th
 is gives us a feel for one of the most important optimizations in modern q
 uery engines.\n\nWe adopt a codecrafters-inspired model\, where each stage
  has tests and benchmarks that need to pass.\n\n## Learning outcomes\nBy t
 he end of the workshop\, participants will be able to understand:\n\n- Com
 piled vs pipelined execution models\n- How group-by (aggregations) and joi
 ns work internally\n- Volcano operator model (open / next / close)\n- SQL 
 → AST → logical plan → physical plan\n- Row vs columnar layout\n- Ro
 w-based vs vectorized execution\n\n## Workshop stages\n1. **Compiled Execu
 tion** — Python script that produces output for `select *`\, `select x w
 here y`\, etc. Starting point for _what_ the engine must do.\n2. **Tablesc
 an** — Here we start with the _how_. Read Parquet and output rows. Segue
  into the Volcano model while building this.\n3. **Volcano Model (Theory)*
 * — Operator trees using simple open()\, next()\, close() methods. Opera
 tors are the unit of composition in query engines.\n4. **SQL to Plan (Theo
 ry)** — Use sqlparser to turn a SQL string into an AST. Walk through AST
  → physical plan (actual functions).\n5. (SKIPPED) Logical plans: all re
 al engines first convert the AST to a logical plan. This is where query op
 timizations like join ordering\, converting subqueries to joins\, etc are 
 actually performed. We skip this to keep this workshop focused.\n6. **Proj
 ection** — the `select` part of a query.\n  - (SKIPPED) expression simpl
 ification\, dictionary optimizations are not implemented. We briefly menti
 on them.\n7. **Filter** — the `where` part of a query. We also introduce
  filter pushdown (at a row group level).\n  - (SKIPPED) partition pruning\
 , parquet page pruning\, later materialization are some optimizations that
  modern query engines use. We won't implement them.\n8. **Aggregation** 
 — group-bys and aggregate functions like sum()\, avg()\, etc are perform
 ed by this operator. We implement a row-based version and then segue into 
 vectorized execution.\n  - (SKIPPED) multi-threaded aggregations\, sort-ba
 sed aggregations are not covered.\n9. **Vectorized Execution** — Rewrite
  the engine (all operators) to process column batches instead of single ro
 ws. Benchmarks make the difference tangible.\n10. **Joins** — the `join`
  part of a query. We look at two join algorithms: nested-loop and hash joi
 n.\n   - (SKIPPED) outer joins\, anti-join\, semi-join. We only look at `i
 nner join`.\n   - (SKIPPED) sort-merge join\, perfect hash join\, multi-th
 readed hash join\, distributed broadcast join\, distributed shuffled join 
 are some alternate join implementations that we won't look at.\n11. **Furt
 her Reading** — Logical plan\, join ordering\, plan optimisation\, Parqu
 et file format\, multi-threaded execution\, distributed execution. Papers:
  [Volcano](https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186
 061.pdf)\, [MonetDB](https://15721.courses.cs.cmu.edu/spring2024/papers/04
 -execution1/boncz-cidr2005.pdf)\, [Morsel-Driven](https://15721.courses.cs
 .cmu.edu/spring2024/papers/08-scheduling/p743-leis.pdf)\, [Compiled vs Vec
 torized](https://15721.courses.cs.cmu.edu/spring2024/papers/06-vectorizati
 on/p2209-kersten.pdf).\n\n## Tooling\n**Python**\, pyarrow\, sqloxide\n\nA
  skeleton repo with sample data will be provided. Please go through the se
 t up steps before attending the workshop - no additional time will be prov
 ided for setup. [`uv`](https://docs.astral.sh/uv/) will be the main tool w
 e will be using\, please use it! (We cannot help with issues arising from 
 raw `pip` or system python usage).\n\n[Code repository link](https://githu
 b.com/Samyak2/query-engine-from-scratch)\n\n### Why Python?\nIt's true tha
 t no query engine is written (purely) in Python. The engines we have worke
 d on are in Rust and Java. In this workshop\, we want to focus on the fund
 amentals instead of the specifics of a language. We believe that Python mo
 stly gets out of the way\, while also providing some of the libraries we n
 eed - namely pyarrow and sqlparse.\n\n### About LLM usage\nDuring the work
 shop\, please type out the code by hand. You are free to use LLMs to under
 stand things or ask questions\, but we would prefer if you ask us instead 
 :)\n[Here](https://haskellforall.com/2026/05/type-out-the-code) is an arti
 cle which explains why typing out the code is much better for learning tha
 n an LLM doing it for you (and better than the old-school way of copy-past
 ing code).\n\n## About the instructors\n**Aayush Naik** and **Samyak Sarna
 yak**: we work at the query engine team at e6data. We have worked on both 
 the older Java-based engine (which was pretty much written from scratch) a
 nd the new Rust-based engine (using Apache DataFusion as the base).\n- [Aa
 yush Naik](https://www.linkedin.com/in/naikaayush/): has worked on Decimal
 128 data-type support for the java engine and is currently working on addi
 ng DML support on Deltalake tables to the rust engine.\n- [Samyak Sarnayak
 ](https://www.linkedin.com/in/samyaks/): has worked on variant data-type s
 upport\, distributed shuffled hash joins for the java engine and is curren
 tly working on bringing in distributed query execution to the rust engine.
 \n\n## How to attend this workshop\nThis workshop is open for **[Rootconf 
 members](https://hasgeek.com/rootconf#memberships)** and for **[Rootconf D
 atabase Edition ticket buyers](https://hasgeek.com/rootconf/topical-editio
 n-on-databases#tickets)**\n\n**This workshop is open to 30 participants (i
 n-person) & hybrid access for remote attendees. Seats for in-person partic
 ipants will be available on first-come-first-served basis.** 🎟️    \n
 \n## Contact information ☎️\nFor inquiries about the workshop\, contac
 t +91-7676332020 or write to [info@hasgeek.com](mailto:info@hasgeek.com).
LAST-MODIFIED:20260611T162339Z
LOCATION:Bangalore - https://hasgeek.com/rootconf/build-a-sql-query-engine
 -from-scratch-workshop/
ORGANIZER;CN=Rootconf:MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/rootconf/build-a-sql-query-engine-from-scratch-wor
 kshop/
BEGIN:VALARM
ACTION:display
DESCRIPTION:Build a SQL query engine from scratch in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
END:VCALENDAR
