Open Source Tools and Archive for Tackling Misinformation on ChatApps in India
Submitted by Keshav Joshi (@kmjoshi) via Tarunima Prabhakar (@flicker91) on Thursday, 7 November 2019
Session type: Short talk of 20 mins
Tattle is a civic tech project in India that is creating an archive of content circulated on WhatsApp and other chat apps, and building open source tools to navigate this archive. Such an archive is useful for research on information networks as well as for increasing the efficiency and reach of fact checking efforts. One of Tattle’s goals is opening the archive, even if in a limited scope, to the general public.
We will describe some of the challenges in data collection on encrypted platforms; and our approach for different kinds of search operations (duplicate, approximate, semantic) on multi-lingual and multi-media content. We will conclude with some of the ethical considerations in doing this work.
- Motivation and Goals of the Project
- How does it aim to affect the misinformation challenge in India
- Data Collection
- Ways of collecting media from Chat Apps
- Collecting media from allied sources (fact checking websites)
- Data Processing (Tools to navigate the archive)
- Duplicate Detection
- Approximate Search
- Semantic Search
- Use of embeddings over hashing
- Ethical Considerations in this work
- Consent frameworks for data collection
- Managing access and use
- Managing violent and pornographic content
An interest in misinformation!
Keshav Joshi is a data scientist @Tattle working to bring together an archive of misinformation and keep developing the data science stack. Keshav has several years of experience as a data scientist/researcher/lecturer, with two Masters in Physics & CS from Georgia Tech.