Anthill Inside 2019

A conference on AI and Deep Learning


Hacking Self-attention architectures to address Unsupervised text tasks

Submitted by Venkata Dikshit Pappu (@vdpappu) on Thursday, 11 April 2019

Preview video


Self-attention architectures like BERT, OpenAI GPT, MT-DNN are current state-of-the art feature extractors for several supervised downstream tasks for text. However, their ability on unsupervised tasks like document/sentence similarity are inconclusive. In this talk, I intend to cover brief overview of self attention architectures for Language Modelling, fine-tuning/feature selection approaches for unsupervised tasks that can be used for a variety of tasks. This talk is for NLP practitioners interested in using Self-attention architectures for their applications.


  1. Overview of Transformer/Self-attention architectures - BERT
  2. Document representations using BERT
  3. Formulating a sentence relevance score with BERT features
  4. Seaching and ranking feature sub-spaces for specific tasks
  5. Other reproducible hacks

Speaker bio

Venkat is ML Architect working for Ether Labs based out of Bangalore
6+ years of Experience in ML and related fields
Worked on Machine Vision and NLP solutions for Retail, Customer electronics, embedded verticals
Venkat leads ML team at Ether Labs and his team is responsible for building scalable AI components for Ether Video collaboration platform - Vision, NLU and Graph learning.



Preview video


  • Anwesha Sarkar (@anweshaalt) 8 months ago

    Thank you for submitting the proposal. Submit your slides and preview video by 20th April (latest) it helps us to close the review process.

  • Venkata Dikshit Pappu (@vdpappu) Proposer 8 months ago

    Sure. Will do that.

  • Venkata Dikshit Pappu (@vdpappu) Proposer 7 months ago

    Dear Admin, I would be submitting the video by tomorrow. Hope that’s fine. Also, I intend to add some metrics/code samples into the slides. Please consider.

  • Venkata Dikshit Pappu (@vdpappu) Proposer 7 months ago

    Dear Admin, Missed out sharing the slides for my video. Hope that’s okay.

  • Zainab Bawa (@zainabbawa) Reviewer 7 months ago

    This proposal fits into the tutorial format, and is appropriate for Anthill Inside.

    The way to structure the presentation for a 60-90 min tutorial on BERT itself is:

    1. Mention the background knowledge that participants should have for attending this tutorial.
    2. What is this concept of self-attention architectures? What is the scope and application of the concept?
    3. Who can use this concept – in terms of specific domains and organizations at what scale in their life-cycles?
    4. Why hack self-attention architectures? Who can hack them?
    5. Show examples of real-life use cases and applicability.
    6. Explain pros and cons of the proposed approach.
    7. Demo for partcipants + time allotted for participants to try this.

    Next steps: submit slides incorporating the above comments and questions and structure the proposal as a tutorial. All of the above has to be done by or before 21 May to close the decision.

  • David Brine (@david56) 6 months ago

  • Chris Stucchio (@stucchio) 4 months ago

    I’ve been asked to leave a review, so here’s what I can come up with.

    1) It’s not clear to me who this talk is for. I’m very much not an expert in NLP, so perhaps it’s not for me? But who is it for?
    2) The slides give me a rough idea of what BERT does - it generates word embeddings that purport to contain context as well.
    3) I am not able to tell from the slides what problem this solves, or why I want something that does what BERT does.
    4) As of (2), BERT supposedly puts context into the word embedding. But how could I test if this claim is true or false? How would things be different if BERT didn’t do this?

    Ultimately what would be really helpful to fully evaluate this would be some answers to the questions (3) and (4).

  • Abhishek Balaji (@booleanbalaji) Reviewer 3 months ago

    Hi Venkata, as suggested by the reviewers above, you’ll need to rework the presentation to a tutorial format and address the questions put forth in the comments.

Login with Twitter or Google to leave a comment