The Fifth Elephant 2023 Monsoon

On AI, industrial applications of ML, and MLOps

Tickets
  • Select Tickets
  • Payment
  • invoice
  • Attendee details

Membership

The Fifth Elephant annual membership

The Fifth Elephant membership is valid for one year - 12 months. The member get the following benefits:

  • Participation in all online peer review sessions.
  • Access to all recordings from online reviews.
  • Priority access to all offline meet-ups and online workshops hosted by The Fifth Elephant during the one year period.
  • Access to The Fifth Elephant’s Annual Conference on 18 and 19 July 2025 in Bangalore - in-person and virtually (via live stream).

Corporate Members-only benefits (bulk ticket purchase):

  • Transfer of memberships across individuals in the organization.

Memberships can be cancelled within 1 hour of purchase.

₹5100

×

Sale at this price closes on December 31, 2025

Total ₹0

Cancellation and refund policy

Memberships can be cancelled within 1 hour of purchase

Workshop tickets can be cancelled or transferred upto 24 hours prior to the workshop.

For further queries, please write to us at support@hasgeek.com or call us at +91 7676 33 2020.

Samik Raychaudhuri

@samikr

Shub Jain

@shubjain

Transforming Document Curation: LMs and vector databases at scale

Submitted Jun 30, 2023

Abstract

Auquan is an AI startup that serves institutional investors and investment managers with curated news and documents to help them make better investment decisions.

In this presentation, I will discuss our approach for using a vector database and a tuned language model to curate news items at scale.

I will walk through the process and pitfalls for using these technologies in production, and provide best practices for achieving high performance. Specifically, I will discuss the metrics that we used for selecting a vector database and tune our language model.

Audience

ML Engineers, early stage Data Scientists

Takeaways

  • How to use a vector database and a language model to curate news items at scale
  • Best practices for using vector databases in production
  • Pitfalls to avoid when using vector databases

Presentation Outline:

  • Introduction
    • About Auquan
    • Problem description
  • Vector databases for news curation
    • Choosing a vector database
    • Using embeddings with vector databases for different tasks
    • Offline population and real time inference
  • Tuning a language model for news curation
    • Tuning an LM
    • Using tuned model for embedding
  • Stack architecture
  • Conclusion/QA

https://drive.google.com/file/d/1FubDUOYBJhzgaW-H6qxrlkEO49d3abDr/view?usp=sharing

Comments

Login to leave a comment

Hybrid access (members only)

Hosted by

Jump starting better data engineering and AI futures

Supported by

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing