This workshop is designed to bring together engineers, researchers, and technologists at the intersection of machine learning and systems to learn how to effectively read, understand, and apply research papers in real-world scenarios.
The goal is not to passively read or summarize academic content, but to actively dissect the paper, understand its motivation, techniques, and trade-offs - just like we do when building or adapting a system ourselves.
- Equip participants with a framework to approach applied research papers.
- Demonstrate how to extract reusable techniques and mental models from academic work.
- Encourage critical thinking around performance bottlenecks, architectural trade-offs, and design decisions as outlined in state-of-the-art research.
- Foster a community of engineers and researchers passionate about production-grade ML systems.
We’ll be discussing “Bullion: A Column Store for Machine Learning”.
Bullion is a next-gen columnar storage system tailored specifically for machine learning workloads, offering specialized support for deletion-compliance, long-sequence sparse features, and feature quantization. It optimizes wide-table metadata parsing and enables efficient multimodal and vector-based ML data handling, significantly reducing I/O costs and storage overhead compared to traditional column stores.
This is a deeply systems-oriented paper with real-world implications, a perfect start to demonstrate:
- How to extract implementation ideas from a paper
- How to reason about tradeoffs
- How to identify assumptions and possible extensions
We won’t just “go through” the paper. Instead, we’ll critically explore:
- What problem is being solved?
- What are the design decisions?
- Could this apply in our context and/or day jobs?
This is a hands-on reading workshop, not a lecture.
Before the session:
- Read the abstract, introduction, and conclusion of the paper
- Think about: What do you understand about the problem space? What would you do differently?
- Bring a notepad or laptop to jot down observations
- Optional but encouraged: skim through how Parquet file formats work and typical ML data loading pipelines
Ideal for folks who:
- Work at the systems/ML boundary
- Are interested in research-to-production translation
- Want to sharpen their research reading and technical reasoning skills
Aditi Ahuja
Currently working in the Search team at Couchbase, focused on full text and vector search. Has been a speaker at Fifth Elephant, Gophercon India, and PromDay (a Kubecon co-located event). Past internships include an LFX mentorship at the Thanos project (a CNCF project).
Harini Anand
CSE undergrad passionate about Computational Cognition, ML, and AI in Healthcare. SDE Intern at IBM Data & AI, working on watsonx™. Formerly at Niramai & IIT Hyderabad, researching ML for breast cancer and gene regulatory networks. Built cognitive tools for dementia prevention as a student entrepreneur. Google KaggleX Mentee, AWS Scholar, Harvard WE Tech Fellow, and Oxford & MIT Summer School alumna. Advocate for STEM representation, speaker, and published AI researcher.
Abhinav Upadhyay
Independent systems engineer who explores the internals of software and hardware through his writing. He publishes Confessions of a Code Addict, a newsletter focused on compilers, interpreters, operating systems, and performance engineering. With over a decade of experience in backend systems and machine learning, he enjoys diving deep into how things work and sharing those insights with fellow engineers.
Bengaluru Systems Meetup brings together Bengaluru’s systems enthusiasts in meetups that cover Databases, Distributed Systems, Compilers, Orchestration Systems, and Dataflow Systems.
The Fifth Elephant is a community of practitioners, who share feedback on data, AI and ML practices in the industry.
💬 Post a comment with your questions here.
📞 Call The Fifth Elephant at (91) 7676332020
📧 For inquiries about registration, drop an email to info@hasgeek.com