Rootconf Mini 2024 (on 22nd & 23rd Nov)

Geeking out on systems and security since 2012

Tickets

Loading…

Siddharth Balyan

Siddharth Balyan

@sidbin

Chatting with Logs: An exploratory study on Finetuning LLMs for LogQL

Submitted Oct 12, 2024

Abstract

Monitoring and observability tools are a cornerstone in debugging processing for any large organization.

The current state of log query languages and interfaces in popular log analysis tools presents significant usability challenges that hinder productivity, such as a lack of standardized query languages, steep learning curves, and high complexity.

Large Language Models, while promising for log analysis, parsing, and summarization, haven’t quite been explored for log search.

This talk covers our study of creating the first text-to-logql dataset, fine-tuning LLMs for LogQL, Grafana’s Log Query Language; and evaluating various approaches to using LLMs to write LogQL queries.

We have been working on a paper on the same topic and are excited to share our learnings with fellow systems nerds!

Key Takeaways

  • Various challenges to using LLMs to solve log search problems.
  • Evaluation and performance of various fine-tuned models and approaches for LogQL.
  • How to build a dataset for a domain-specific language from scratch!
  • How this fits in with a bigger goal of making a semantic search engine for observability data.

Audience

  • Anybody who has written a LogQL/DatadogQL/SPL query recently and used or thought about using LLMs for it.
  • Systems & Infra engineers interested in LLMs for observability.

Comments

Login to leave a comment

  • Siddharth Balyan

    @sidbin Submitter

    hello fellow geeks!

    first of all, a big shoutout to @zainabbawa @saurabh_hirani and all the organizer and editors of rootconf.


    the research paper and work that powered this talk is now on arxiv!

    the dataset and models are on huggingface!

    and the training recipe, scripts and the infra set up on github.

    we want to talk to more observability and infra nerds for the next stages of this project!

    our idea is to build a fully semantic search system for logs, this will involve tuning our own embedding models and making our own storage solutions as well as an actual usable frontend for querying this.

    if this problem space interests you, please reach out. a blue tokai coffee on me :D

    X: https://x.com/sidbing
    email: balyan.sid@gmail.com

    Posted 3 months ago (edited 3 months ago)
Hybrid Access Ticket

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Sponsor

Community sponsor

Peak XV Partners (formerly Sequoia Capital India & SEA) is a leading venture capital firm investing across India, Southeast Asia and beyond.

Venue host - Rootconf workshops

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Community Partner

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ Twitter (not updated frequently): https://twitter.com/rustlangin more