About the 2019 edition:
The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule
The conference has three tracks:
- Talks in the main conference hall track
- Poster sessions featuring novel ideas and projects in the poster session track
- Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:
- Myths and realities of labelling datasets for Deep Learning.
- Practical experience with using Knowledge Graphs for different use cases.
- Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
- Pros and cons of using custom and open source tooling for AI/DL/ML.
Who should attend Anthill Inside:
Anthill Inside is a platform for:
- Data scientists
- AI, DL and ML engineers
- Cloud providers
- Companies which make tooling for AI, ML and Deep Learning
- Companies working with NLP and Computer Vision who want to share their work and learnings with the community
For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to firstname.lastname@example.org
Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.
Anthill Inside 2019 sponsors:
Naman Kumar, Robotics and AI Lead at TartanSense
What can software learn from robots and mathIf you have no clue about what is going on, don’t worry. In this presentation, I will try to build your intuition with a series of simple examples. Then, with a little bit of math, I will demonstrate how the Kalman filter works its charm. Finally, I will end by giving you a glimpse of its numerous applications in different fields and how you can probably use it in your own project.
Sandya Mannarswamy, Independent NLP researcher
Rigorous evaluation of NLP models for real-world deploymentWe motivate why rigorous evaluation of NLP models beyond simple metrics such as F1-score/accuracy are needed for real world deployment with a few historical use-cases/examples. We then talk about the “CleverHans Moment for NLP” (https://www.linkedin.com/posts/sandya_nlps-clever-hans-moment-has-arrived-activity-6573894455768768512-MDVW). We discuss the latest research around model evaluation for NLP. We then take up the example of a sentiment analysis task as a case-study and discuss the methodology for rigorous evaluation. We conclude by pointing out future work directions in this topic.
Divij Joshi, Technology policy fellow at the Mozilla Foundation
Poster session: Model interpretability, explainable AI and the Right to Information (RTI)Consequential machine decision making is now pervasive. Automated decisions (to different degrees of automation) are now applied in fields of welfare allocation, policing and criminal justice, finance and insurance and online content moderation, among others. Many of these tools use complex algorithmic systems, including machine learning techniques, which are conventionally difficult to interpret. Efforts toward interpretation have traditionally focused on model interpretation through explaining the ‘black box’ of algorithmic systems (for example through local linear explanations or models). However, these techniques of interpretability have limited significance where end-users are concerned, for a number of reasons, including the ability of a lay citizen to parse technical models, as well as the limited information it provides for achieving instrumental purposes of explanation (for example, the ability to use an explanation to overturn a decision). Some techniques have focused on explainability without opening the black box, including through methods like counterfactual explanations. However, limited work exists on how the non-interpretability of machine decisions impacts important constitutional concepts of due process and the right to information as well as legal mechanisms like the RTI Act which actualise these rights. The RTI Act, in particular, places positive obligations upon the state to explain certain decisions, including administrative decisions taken that impact individuals. The extent to which techniques of explainability in AI can be incorporated to ensure that the RTI remains a robust instrument for holding government systems accountable will be the focus of this session.
Mira Abboud, CTO and data scientist at Neotic.ai
Artificial Intelligence for automated investmentThe talk will cover the following areas: - AI in finance vs AI in other fields. - Challenges faced while applying machine learning algorithms on stock market data (Daily data, problems of Over/Under fitting, fat tails, etc). - Limitations/problems of Supervised and Unsupervised learning - State of the art solutions.
Vijay Gabale, Co-founder and CTO of Infilect Technologies
Birds of Feather (BOF) session: Myths and realities of data labeling for Deep Learningsetting the context : data labeling for NLP and CV how to define a data labeling task : novice vs expert does crowd sourcing of data labeling really work : adv vs disadv. how to manage in house data labeling teams : adv vs disadv what is the criticality of the correctness of data labels what is the experience and expertise expectation of data labelers how to ensure correctness of data labels : manual vs automated checks how to resolve labeling conflicts how does an engineer know if she has enough labeled data what are the time, cost, correctness trade-offs how to ensure and execute class balanced data labeling how to plan and execute weakly supervised data labeling how to train models on small set of labeled data and generate ‘soft tags’ for the rest of the unlabeled data how does one know if a model is performing well in practice on unseen and real-time inputs how does feedback loop work when some of the unseen and real-time inputs are labeled to fine-tune the models
Nischal HP, VP of Engineering and data science at omni:us
Document digitization: rethinking with Deep LearningThis talk will outline: * The problems and approaches we faced when building deep learning networks to solve problems in the information extraction process. * Thought process on why and how we chose certain deep learning strategies * The requirement for supervised learning * Limitations of deep learning networks * Planning and executing research activities in short cycles * Evolution of team structures to support AI product building * Engineering practises required in building AI systems.
Radhika Radhakrishnan, Programme officer at the Centre for Internet and Society (CIS)
Why smart-device based virtual assistants are incapable of assisting with gender based violence concerns in India.Part 1. Introduction to Gendered Biases The talk will begin with a brief introduction to fairness and gendered bias concerns in Artificial Intelligence technologies with relevant examples. Part 2. Are Smart-Device Based Virtual Assistants Capable of Assisting with Gender Based Violence Concerns in India? I will present my research which critically examines the responses of five Virtual Assistants in India – Siri, Google Now, Bixby, Cortana, and Alexa – to a standardized set of concerns related to Gender-Based Violence (GBV). A set of concerns regarding Sexual Violence and Cyber Violence were posed in the Virtual Assistant’s natural language, English. Non-crisis concerns were asked to set a baseline. All crisis responses by the Virtual Assistants were characterized based on the ability to (1) recognize the crisis, (2) respond with respectful language, and (3) refer to an appropriate helpline, or other resources. The findings of my study indicate missed opportunities to leverage technology to improve referrals to crisis support services in response to gender-based violence. Read my paper here: https://itforchange.net/e-vaw/wp-content/uploads/2018/01/Are-Smart-Device-Based-Virtual-Assistants-Capable-of-Assisting-with-Gender-Based-Violence-Concerns-in-India-1.pdf Part 3. Feminist Perspectives on the Social Media Construction of Artificial Intelligence I will analyse how Microsoft’s Twitter bot Tay went from tweeting “can i just say that im stoked to meet u? humans are super cool” to “I .... hate feminists and they should all die and burn in hell” and how we can avoid designing such biased AI technologies for the future. Read my work here: https://gendermediacultureblog.wordpress.com/2018/12/24/feminist-perspectives-on-the-social-media-construction-of-artificial-intelligence/
Srujana Merugu, Independent machine learning researcher
ML application lifecycle: what is important at each stageBuilding good ML systems is not very unlike developing good software. Just as developing good software requires mastering not only programming theory, tools, and design patterns, but also the process of software development itself, building a good ML system entails familiarity with the ML application lifecycle. In this talk, we will discuss the various stages of ML application life cycle - problem formulation, data definitions, modeling, production system design &implementation, testing, deployment & maintenance, online evaluation & evolution, and some key learnings that are relevant for each of these stages.
Bikram Sengupta, Director of Research and Innovation at iMerit
Why you need an enterprise grade data labelling pipeline to scale your ML/AI pipelinesIn Software 2.0, Data is code. A mindful approach to your data annotation pipeline and practices is critical to the outcomes of your ML algorithms. If not done right, your ability to scale this pipeline can often prove to be a major blocker to productionization. In this talk we focus on why and how to build your data labeling pipeline to be enterprise grade. We will describe the considerations and insights that go into making your data pipeline a mindful part of your development pipeline, so that you can follow the journey from PoC to production. We describe best practices and provide pointers to designing a high quality, iterative, and scalable data annotation practice. A pipeline designed for human judgement and incremental training on edge cases, can provide that last mile of acceptability to roll out a machine learning solution in production. We will describe successful examples of this approach.
Keshav Joshi, Data Scientist at Tattle
Poster session: Open source tools and archive for tackling misinformation on chat apps in IndiaMotivation and Goals of the Project How does it aim to affect the misinformation challenge in India Data Collection Ways of collecting media from Chat Apps Collecting media from allied sources (fact checking websites) Data Processing (Tools to navigate the archive) Duplicate Detection Approximate Search Semantic Search Use of embeddings over hashing Ethical Considerations in this work Consent frameworks for data collection Managing access and use Managing violent and pornographic content
Nishant Sinha, Independent researcher and consultant at OffNote Labs
The shape of UIn this talk, we will showcase our efforts at OffNote Labs to improve the developer experience when programming with tensors. In particular, we will discuss: The idea of naming dimensions of tensors and how named shapes can make tensor programming dramatically less painful. The tsalib library, which allows used named dimensions in Python 3.x programs with multiple backend libraries (numpy, tensorflow, pytorch, …). The tsanley library, which builds on tsalib, and helps catch tricky tensor shape errors at runtime and annotate existing programs with named shapes.
Willem Pienaar, Lead of data science platform at GO-JEK
Closing talk: Feast - feature store for Machine LearningGOJEK, Indonesia’s first billion-dollar startup, has seen an explosive growth in both users and data over the past three years. Today, it uses big data-powered machine learning to inform decision making in its ride-hailing, lifestyle, logistics, food delivery, and payment products, from selecting the right driver to dispatch to dynamically setting prices to serving food recommendations to forecasting real-world events. Hundreds of millions of orders per month, across 18 products, are all driven by machine learning. Features are at the heart of what makes these machine learning systems effective. However, many challenges still exist in the feature lifecycle. Developing features from big data is often an engineering heavy task, with challenges in both the scaling of data processes and the serving of features in production systems. Teams also face challenges in enabling discovery, reducing duplication, improving understanding, and providing standardization of features throughout organizations. Willem will explain the need for features at organizations like GOJEK and discuss the challenges faced in creating, managing, and serving them in production. He’ll describe how in partnership with Google, they designed and built a feature store called Feast to address these challenges and explore their motivations, the lessons they learned along the way, and the impact the feature store had on GOJEK. Finally, he will talk about the open source plans for Feast and their roadmap going forward.