Beyond the jargon: making research papers work in the real world

Name: Beyond the jargon: making research papers work in the real world
Start: 2025-07-18T17:45:00+05:30
End: 2025-07-18T19:30:00+05:30
Location: Thoughtworks Data Lab, Ground Floor

A workshop on developing skills to read ML-Sys papers

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri 05:45 PM – 07:30 PM IST

19 Sat

20 Sun

Thoughtworks Data Lab, Ground Floor, Bengaluru

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri 05:45 PM – 07:30 PM IST

19 Sat

20 Sun

Thoughtworks Data Lab, Ground Floor, Bengaluru

Pinned update

Link to the slides of the workshop This update is for participants only

About the workshop

This workshop is designed to bring together engineers, researchers, and technologists at the intersection of machine learning and systems to learn how to effectively read, understand, and apply research papers in real-world scenarios.

The goal is not to passively read or summarize academic content, but to actively dissect the paper, understand its motivation, techniques, and trade-offs - just like we do when building or adapting a system ourselves.

Takeaways for participants

Equip participants with a framework to approach applied research papers.
Demonstrate how to extract reusable techniques and mental models from academic work.
Encourage critical thinking around performance bottlenecks, architectural trade-offs, and design decisions as outlined in state-of-the-art research.
Foster a community of engineers and researchers passionate about production-grade ML systems.

About the paper + session format

We’ll be discussing “Bullion: A Column Store for Machine Learning”.

Bullion is a next-gen columnar storage system tailored specifically for machine learning workloads, offering specialized support for deletion-compliance, long-sequence sparse features, and feature quantization. It optimizes wide-table metadata parsing and enables efficient multimodal and vector-based ML data handling, significantly reducing I/O costs and storage overhead compared to traditional column stores.

This is a deeply systems-oriented paper with real-world implications, a perfect start to demonstrate:

How to extract implementation ideas from a paper
How to reason about tradeoffs
How to identify assumptions and possible extensions

We won’t just “go through” the paper. Instead, we’ll critically explore:

What problem is being solved?
What are the design decisions?
Could this apply in our context and/or day jobs?

What participants need to do or know

This is a hands-on reading workshop, not a lecture.

Before the session:

Read the abstract, introduction, and conclusion of the paper
Think about: What do you understand about the problem space? What would you do differently?
Bring a notepad or laptop to jot down observations
Optional but encouraged: skim through how Parquet file formats work and typical ML data loading pipelines

Ideal for folks who:

Work at the systems/ML boundary
Are interested in research-to-production translation
Want to sharpen their research reading and technical reasoning skills

About the facilitators

Aditi Ahuja
Currently working in the Search team at Couchbase, focused on full text and vector search. Has been a speaker at Fifth Elephant, Gophercon India, and PromDay (a Kubecon co-located event). Past internships include an LFX mentorship at the Thanos project (a CNCF project).

Harini Anand
CSE undergrad passionate about Computational Cognition, ML, and AI in Healthcare. SDE Intern at IBM Data & AI, working on watsonx™. Formerly at Niramai & IIT Hyderabad, researching ML for breast cancer and gene regulatory networks. Built cognitive tools for dementia prevention as a student entrepreneur. Google KaggleX Mentee, AWS Scholar, Harvard WE Tech Fellow, and Oxford & MIT Summer School alumna. Advocate for STEM representation, speaker, and published AI researcher.

Abhinav Upadhyay
Independent systems engineer who explores the internals of software and hardware through his writing. He publishes Confessions of a Code Addict, a newsletter focused on compilers, interpreters, operating systems, and performance engineering. With over a decade of experience in backend systems and machine learning, he enjoys diving deep into how things work and sharing those insights with fellow engineers.

About the organizers

Bengaluru Systems Meetup brings together Bengaluru’s systems enthusiasts in meetups that cover Databases, Distributed Systems, Compilers, Orchestration Systems, and Dataflow Systems.

The Fifth Elephant is a community of practitioners, who share feedback on data, AI and ML practices in the industry.

💬 Post a comment with your questions here.