The Fifth Elephant Open Source AI Hackathon 2024

GenAI makers and creators contest and showcase



Yash Malviya


Pushkar Aggrawal


K Santanu Sekhar Senapati


Knowledge Preservation

Submitted Feb 4, 2024

Github link (most upto date roadmap / proposal here):

Dynamic Survey

Core Idea

Knowledge Gathering: Utilizing minimal initial input to generate targeted questions that maximize information gathering by having a dialogue with people.
Knowledge Organization: Streamlining the structuring and analysis of gathered data for enhanced insights.

Another use case with a lot of overlap with the current one: Journalist bot - conduct interview with people to distill and preserve niche and novel information


Traditionally, surveys have predominantly consisted of multiple-choice and close-ended short answer questions. This is partly because analyzing responses to open-ended questions can be challenging. However, advancements in Natural Language Processing (NLP), such as Large Language Models (LLMs), have made it easier to tackle this issue. The efficiency and depth of traditional survey tools are often compromised by their inability to engage respondents fully, resulting in low participation rates and superficial data. The rigidity of preset questionnaires contributes to this problem, as they fail to adapt to the respondent’s unique perspectives or probe for deeper understanding. However, the evolution of machine learning, especially in natural language processing, now offers a solution to these challenges by enabling more nuanced interpretation of text-based responses.

Open-ended questions allow survey participants to express their thoughts and opinions more freely, potentially revealing insights that the survey creator hadn’t anticipated. In contrast, multiple-choice questions can inadvertently influence responses by presenting predefined options. However, open-ended questions also pose challenges, such as vagueness or lack of specificity, which may hinder the extraction of useful insights. Moreover, the process of identifying when a respondent’s answers lack clarity or context—and therefore require follow-up for clarification or additional detail—is traditionally manual, leading to inconsistency and missed opportunities for deeper data collection. In such cases, follow-up questions can help prompt participants to provide more specific information. Our focus is on addressing the challenges associated with open-ended questions, particularly regarding vagueness and staying aligned with the purpose of the question. This challenge is often only recognizable after the fact, underscoring the need for a more dynamic and responsive survey mechanism.


The introduction of a Large Language Model (LLM)-based survey tool revolutionizes data collection by dynamically interacting with respondents in a conversational manner. This tool is designed to understand and evaluate users’ responses in real-time, enabling it to ask follow-up questions that are tailored to the individual’s answers and the nuances within them. By employing a combination of advanced language understanding capabilities and real-time response evaluation, this application not only enhances engagement and participation rates but also ensures the collection of more detailed and meaningful data.


Dynamic Survey Demo


Presentation Link

How to run?

  1. Setup backend, frontend, lm-server as mentioned in in their respective folders
  2. Then UI can be access at http://localhost:8501
  3. Recommended hardware Nvidia T4 GPU (you need about 16GB GPU RAM)

User Experience

User Journey

User opens our application - what do they see, what can they do there. With every user action what changes and are the the next possible user actions

Survey Creator

  1. Create a new Survey
    1. Add topic
      1. Describe topic
    2. Add questions
      1. We will suggest some possible questions based on the topic and previously added questions
      2. Select question type -
        1. MCQ
        2. Text
          1. Creator specified configs -
            1. Maximum follow up question depth
            2. Question objective
            3. Criteria to evaluate whether to ask follow up questions or not
              1. Was answer specific?
              2. Was answer ambiguous?
              3. Was an example given in the answer?
              4. Did user understand the question? Did they answer the asked question or something else?
              5. Did user find the question irrelevant?
              6. Is question objective reached?
          2. With every question creator gets a field to explain / rephrase the question differently
            1. Suggest options using LLM
  2. Survey Analysis
    1. Research
    2. Can we use analysis to improve the follow up questions (P4)

Survey Participant (Filler)

Basic UI were user answers the configured questions one after the other

Solution Details

Survey Creation (High Level Design)

Survey Creation

Survey Bot Chain of Agents (High Level Design)

Survey Bot Chain of Agents

Tech Architecture

Tech Architecture

A fronted app written using streamlit can be used to create surveys and for filling survey
The fronted app interacts with a backend service written using FastAPI
The backend service contains the survey bot which use two agent - objective met agent, question generation agent to generate follow up questions wherever needed
The data for survey questions, conversation done with a survey participant and state of survey is stored in mongodb.
For LLM capabilities we host the model using vLLM which comes with a lot of LLM inference optimisations out of the box.
LLM used is quantised gemma-7b-it

Automated Evaluation

Objective Met Agent

We generated 20 surveys with questions (about 3 questions each survey) and associated motivation (some motivation were also added manually). We generated associated survey participant descriptions and question answers conversation based on survey questions. Then we sliced the conversations into multiple as the expected input by the agent and manually annotated the data (i.e. manually marked which conversation slice had which objectives met). This gave use approximately 100 test cases which we used to evaluate different prompts and thresholds for prompts

All the generations were done by prompt engineering and using GPT


Priority - P0 to P4

High Priority

  1. Multiple type of questions
    1. MCQ (Single select and multi select) P1
    2. Text paragraph P0
  2. Multilingual Support P1
  3. Survey Bot (Collection of agents) P0
  4. Authentication P1

Low priority

  1. Voice integration
    1. STT P3
    2. TTS P4


Yash Malviya, Santanu Senapati, and Pushkar Aggrawal, representing Search Relevance at Myntra, are honored to participate. With collective expertise, we aim to innovate solutions. Our team has worked on Gen AI enabled features and Deep Learning tools ex. MyFashionGPT for Myntra

Journal Bot (Previous Draft)


Preserving and circulating knowledge across diverse domains can be challenging due to the sheer volume of information generated in various conversations. There is a need for a streamlined process to capture, distill, and present conversational knowledge in a readable format for broader consumption.


  1. Arnab (Journalist Bot): Arnab’s role is to facilitate discussions, ask relevant questions, and extract valuable knowledge from these conversations.
  2. Cataloging Pipeline: Converts Arnab’s recordings into a readable format, creating a dynamic encyclopedia.
  3. Consumption Platform: A user-friendly platform for exploring, searching, and validating knowledge across domains.

Expected Outcome

  1. Knowledge Preservation: Captures valuable insights, preventing loss.
  2. Knowledge Circulation: Breaks down domain barriers, encouraging cross-disciplinary learning.
  3. Collaborative Validation: Allows users to cross-reference information for accuracy and give feedback on the recorded information
  4. Continuous Learning: A growing encyclopaedia adapting to changing information, fostering continuous learning.


  1. The bot will interact with users in a non confrontational, curious and friendly manner. Asking relevant questions and keeping the conversation alive and easy. First POC is planned on OpenAI GPT interface however to enhance sticky conversation skills fine-tuning might be needed on data sources such as interviews and podcast transcripts.
  2. Bot should have multilingual support to enable wide variety of people to interact with bot and store their knowledge
  3. Distilling conversation into a transcript and writing unbiased digestible content such as a blog.
  4. Building a wikipedia-like repository of knowledge and keeping relevant data close by cataloging the generated blogs.
  5. If multiple instances of the same information are present it can be used to validate the information automatically. If differing opinions on the same topic is present we can detect that there is a chance of subjectivity in the topic
  6. The bot should ask questions that do not overwhelm. Do not ask a lot of questions to specific types of users, and it might be a bad experience for those users

Some Example Use cases

  1. Some indigenous skills and cultural heritage such as Yoga, ayurveda, weaving, looming, handcraft techniques etc can be lost to time with advancement of technology, these can be preserved digitally with the help of this bot.
  2. Documentation of any tech product or scientific concepts.
  3. Journaling a trip.

Motivation References



Datasets, Models, APIs

Tasks to be solved -

  1. Translation
    1. If the language model is multilingual it’s not needed
    2. If not
      1. Bing translate API
      2. or
  2. TTS
    1. or
  3. STT
    1. or
  4. Language Model
    1. Microsoft ChatGPT Open AI API
    2. We could Fine tuning LLM to make the conversation more engaging and to make smaller models more accurate for question generation task
      1. Data sets
        1. Podcast datasets (To make the conversation more engaging)
          2. dataset contained over 100%2C000,take requests to access it. (No longer available)
        2. We can reuse QnA datasets, instead of generating answers we generate questions. If we have different QnA on a single topic, merging a list of question and answers and expecting the bot to generate the next question is our task
          1. SQUAD dataset is QnA over wikipedia article. We have the topic mentioned in title field
          2. Clustering on QnA Dataset to group similar QnA topics together
      2. Models
        1. Mixtral
        2. Llama
  5. Vector Search (Context for RAG)
    1. Dataset for retrieval
      1. and other search tools
    2. Models
  6. Moderation
    2. Basic word based blacklisting / tagging
    3. Toxic
    4. Dangerous
    5. Bias
    6. Subjectivity
  7. Summarisation
    2. Text summarisation with Annotation

Tech stack

  1. Bot
    1. Stream Lit for UI
    2. Langchain and various other libraries for hosting models
  2. Cataloging pipeline
    1. Simple python script based indexing pipeline
    2. Periodic crons to generate summary of conversation topics
  3. Platform
    1. React for UI
    2. FastAPI for backend
    3. Databases
      1. Elastic search for search index database
      2. Mongo for document store
      3. Or depending on time left, in-memory databases


Evaluating our prompt (basically generated questions)

User Feedback based :

Generating questions in different contexts like -

  1. Artistic
  2. Political
  3. Technical

evaluation question answering agents summary strategy

Task : Ask better questions

  1. Answering agent here becomes the examiner / evaluator
  2. Answering agent is provided a detailed context and exposed to the questioning agent. Answering agent is programmed to be vague and evasive.
  3. The questioning agent is exposed to the answering agent and at the end of their interaction we match the input context with the final questioning bot’s summary. SQuaD qualifies here as a prospect benchmarking dataset.

Answering agent can have different personalities like Easy going, difficult to talk to etc

Requirements and User Experience

  1. Chatbot - user interface
    1. Streamlit app for it’s simplicity and as team has familiarity with it
    2. Bot can start like - What do you want to document today
    3. User describes the topic to some extent
    4. Two options (We will decide exact one with POC)
      1. Generating questions
        1. Single prompt with instructions to be curious and friendly
        2. Or sequence of agents - Curious agent chained to friendly agent
      2. Fetch other relevant context (to evaluate)
        1. Unless the bot is somewhat aware of the content and challenges involved it might not be able to ask good questions
    5. User feedback - repeated question, irrelevant question (Streamlit has integration to take feedback too)
    6. Repeat questioning
    7. When to stop (POC needs to be done)
      1. Explicit action in the UI for the user to end the conversation
      2. Stopping implied by the conversation
      3. Giving the user an option to pick the conversation back up at any time will be useful
    8. Stores the conversation in a structured format automatically in the background
  2. Cataloging
    1. Moderation flagging
    2. Tagging the content for unsafe content to filter out for showing to others
    3. Conversation topic summarisation
  3. Showcase portal (will make a basic website)
    1. User experience (Like stack overflow + wikipedia)
      1. Search
      2. Moderation filtering
      3. View the conversation web page
        1. Comment / feedback section
      4. Topic summary page (collates different conversation from conversation pages)
        1. Comment / feedback section
      5. Traceback information to conversation it came from
    2. Component
      1. Moderation
      2. Subjectivity detection
      3. Bias detection
      4. Noting human crowdsourced validation of recorded information

Task Breakdown

  1. Chatbot
    1. Prompting POC
    2. Basic Streamlit UI
    3. TTS and STT integration
    4. Prompting engineering experiments
    5. Store conversation information in the background
    6. User feedback UI
  2. Cataloging
    1. Convert from format saved by bot to write in database
    2. Moderation tagging
    3. Conversation topic summarisation
  3. Platform
    1. Frontend functionality
      1. Search
        1. Basic Search page
        2. Moderation filtering
      2. View the conversation web page
        1. Basic conversation web page
        2. Comment / feedback section
      3. Topic summary page (collates different conversation from conversation pages)
        1. Basic Topic summary page
        2. Comment / feedback section
        3. Traceback information to conversation it came from
    2. Backend functionality
      1. Search build the index
      2. Search API
      3. Search Moderation filtering functionality
      4. Conversation web page view API
      5. Add Comment / feedback Conversation web page API
      6. Topic summary page view API
      7. Add Comment / feedback Topic summary page section
      8. Show traceback information to conversation it came from


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

The Fifth Elephant hackathons

Supported by


All about data science and machine learning

Venue host

Welcome to the events page for events hosted at The Terrace @ Hasura. more


Providing all founders, at any stage, with free resources to build a successful startup.