[Know your Laws in Bharatiya Nyay Sanhita]

This submission has been added to the schedule

[Know your Laws in Bharatiya Nyay Sanhita]

Submitted Mar 29, 2025

Choose the topic your submission falls under: Finetuning LLMs Type of session: Demo - side project I am submitting for: Blr OSAI meetup in April 2025

{LLMs are well equipped to solve tasks that require world knowledge. There is a considerable gap between frontier LLMs on English language vs the best language models we have in Indic languages. While there is considerable ongoing effort towards merging this gap, a more practical and scalable solution approach comes through a well designed alignment: where an existing LLM is stratetically tuned for optimal performance on a domain and/or task in indic language. InstructLAB (https://github.com/instructlab) is an open source tool which is designed to make alignment very easy for developers, data scientists and any AI practioner at SME roles without requiring AI/ML technical skills. It is a developed as community project soliciting contributions from practioners solving real world problems via LLM alignment. In this work we will demonstrate how InstructLAB can be used to solve one such real world problem: Legal Statue Identification in Indian context from a common Indian citizen’s perspective and in both English and Hindi. This work is co-developed by IBM research, IITB, BharatGen.

Legal Statue Identification (LSI) is a known task in legal domain where summaries from court proceedings is used to identify what all relevant laws gets applicable for the proceedings. Summaries are often full of legalese as prepared by dedicated personnel attending court proceedings and thus it requires legal experts to figure out relevant laws being applicable across various segments of the summary. In contrast, various legal help websites such as “Indian Kanoon” or “LawRato”, “LegalKart” are thriving with examples of posts coming from Indian citizens describing their problem and seeking legal advice as per Indian law from designated legal experts. In this session we will show how we can use “InstructLAB”, an open source language model alignment tool, to align a small embedding model towards this specific use-case : Given a complaint description in ordinary citizen’s langauage, retrieve relevant laws from Bharatiya Nyay Sanhita (BNS) as applicable for the complaint scenario. More specifically, we will show how InstructLAB can be used to generate diverse and strategically curated synthetic data over the entire BNS corpus and how good quality synthetically generated data is enough to tune a small embedding model of only ~560M to accurately do law retrieval and thus solving LSI task for common citizens. your session in 2 paragraphs}

{1. InstrcutLAB is an efficient and easy to use tool for domain alignment.
2. Legal Statue Identification starting from Citizen’s grievance can be solved by aligning a small embedding model over India’s latest legal document BNS(Bharatiya Nyay Sanhita).
}

{The major audience would be AI practioners including developers, SMEs, CTOs looking to use LLMs on specific domains/tasks. Also working professionals in legal domain will find this application useful. }

{I’m Jaydeep Sen, a senior research scientist at IBM Research, India and manager of speech and NLP technologies team there. Jaydeep primarily works on Question Answering and related NLP technologies and is currently engaged in cutting edge research on designing robust neural retrievers. He has authored 20+ papers in top conferences like ACL, EMNLP, IJCAI, VLDB, SIGMOD etc and 30+ patents and a designated Master Inventor at IBM.

Collaborators:
IBM Research: Rudra Murthy, Ashish Mittal, Sachindra Joshi, Amith SInghee
BharatGen: Amrith Krishna, Kundeshwar Pundalik, Ajay Nagpal, Piyushi Anand, Smitha Gautam, John Nirmal
IITB: Ganesh Ramakrishnan
}

Comments

Login to leave a comment

No comments posted yet

Advancing multimodal and agentic AI: systems, storage & scalability

[Know your Laws in Bharatiya Nyay Sanhita]

Comments