The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

An open Assistive translation framework for Indic Language - Samantar

Submitted by Deepthi Chand (@deepthichand) on Thursday, 13 June 2019

Session type: Short talk of 20 mins Status: Confirmed & Scheduled


India is a land of many languages. There are 23 official and much more unofficial languages prevalently used in day-to-day conversations. Unfortunately, information dissemination to the low resource languages get difficult because of the geo-spatial distances. Popular translation platforms helped to fill this gap in major languages but their efficiency is challenged by the lack of availability of proper datasets and their generic nature. This problem is very evident when more domain information gets involved.

We present Samantar, an open translation suggestion framework targeted at Indian languages. Samantar is built with open parallel corpora and opensource technologies. The translations can be tuned to suggest according to different target domains.


In this case study, we will be discussing the following areas

  • Requirement of a translation systems in social context
  • State of current translation systems
  • Collation of open corpora for various languages
  • Parallel corpora collection in various sectors like Budgets, Judiciary etc..
  • Various approaches of translation systems
  • Natural Language Processing techniques crucial to translation systems.
  • Evaluation and usage of existing open source translation systems like moses, open NMT etc..
  • Highlevel architecture of samantar
  • Various ways of interacting and colloboration with the framework
  • Domain adoptation with the translation framework
  • Road ahead

This session addresses following points/areas

  • Overview of NLP for Indic Languages
  • Open translation systems and their applications
  • Open Parallel Corpora available
  • A Indic language translation framework
  • Challenges working with Indic Languages for NLP
  • Domain based translation mechanisms

Speaker bio

Deepthi Chand has been on the forefront of the data-for-good movement in India. Over the last six years he has dabbled in various roles from an application developer in MNCs to a data strategist for various civil-society organizations and government agencies. He is co-founder and director of CivicDataLab, where he works to harness data, tech, design and social science to strengthen civic-engagements in India. He has been leading DataKind Bangalore, a community of data scientists volunteering their time to help non-profits do data-driven decision making over the weekends. He is determined to work on key issues in social sector using open-source software, open data and algorithmic research.


Preview video


  • Abhishek Balaji (@booleanbalaji) 11 months ago

    Hi Deepthi,

    Thank you for submitting a proposal. We need to see more detailed slides to evaluate your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/frameworks available in the market to solve this problem? How did you evaluate these, and what metrics did you use for the evaluation? Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    We need your updated slides by Jun 27, 2019 to evaluate your proposal. If we do not receive an update, we’d be moving your proposal for evaluation under a future event.

    • Abhishek Balaji (@booleanbalaji) 11 months ago

      Hi Deepthi,

      This proposal will not fit into the breadth of topics we’re discussing at this year’s edition. However, this is a very interesting project and we’d like to invite you to present this in the showcase session. We’re still working out the slots and timing for each slot, and will keep you posted. Let me know if you’d like to go ahead with this?

Login to leave a comment