The Fifth Elephant OSAI meet-up - Hyderabad edition

The Fifth Elephant OSAI meet-up - Hyderabad edition

Call for Proposals - make a submission; give visibility to your work

Shruti Dhavalikar

@shrutidhavalikar

Evaluating Agentic Applications in the SDLC: Ensuring Reliability with OSAI

Submitted Sep 30, 2025

Title

Evaluating Agentic Applications in the SDLC: Ensuring Reliability with OSAI


Abstract

Agentic applications, built using open-source large language models (LLMs) and frameworks, are redefining how we approach collaborative software development and intelligent workflows. Yet, their non-deterministic nature raises critical challenges for evaluation and testing. How do we ensure correctness, reliability, and consistency when working with inherently probabilistic systems?

This talk will showcase practical evaluation strategies for agentic applications within the software development lifecycle (SDLC), using a conversational agent built with open-source models as a running use case. We will break down the agent into its core components—query understanding, data orchestration, tool invocation, and response synthesis—and demonstrate methods to design deterministic evaluation frameworks around stochastic behaviours.


Key Takeaways

  • Practical methods to evaluate open-source agentic applications at the component and system level
  • Metrics beyond accuracy: goal completion, grounding, latency, and consistency
  • How to embed evaluation into the SDLC testing cycle, ensuring robustness from development to deployment
  • Lessons learned from real-world use cases of open-source agentic frameworks

Target Audience

  • Data scientists, AI/ML Engineers & Researchers
  • Architects working with AI and agentic use cases
  • Open Source AI model evaluators/explorers
  • AI enthusiasts exploring agentic applications

Prerequisites

  • Basic knowledge of Python
  • Basic understanding of SQL, APIs
  • Interest in building or maintaining production-grade AI applications (Who’s Not :P)

Whether you’re building autonomous agents or conversational assistants, this session will equip you with the tools and frameworks to test open-source AI models with confidence in a world of unpredictability.


Speaker Bio

Shruti Dhavalikar is a Data Scientist at Sahaj Software with over six years of experience in building data-driven solutions. She specialises in transforming complex datasets into actionable business insights and has led end-to-end product cycles within Agile environments. Her work emphasises scalable and robust development practices across diverse technology stacks. In addition to her industry contributions, she engages in applied research aligned with real-world challenges and has presented and published her work at international conferences. Outside of work, she nurtures a keen interest in cosmology and space, and enjoys discovering new cuisines as an avid travelling foodie.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures

Supported by

Community sponsor