Anthill Inside 2018

Anthill Inside 2018

On the current state of academic research, practice and development regarding Deep Learning and Artificial Intelligence.

Karanbir Chahal


A Hitchhiker's Guide to Modern Object Detection: A deep learning journey since 2012

Submitted Mar 31, 2018

The ability to detect objects in images has captured the world’s imagination. One can make a decent case that this application is the poster child of deep learning. What really put it on the map.
But few people really understands how computers have begun to detect these objects in images with a high accuracy. Which is surprising since it is the backbone of the tech powering self driving cars, drones and so many other vision related tasks. This talk aims to go through the history of object detection all the way from 2012 to present day. It will focus on 2 types of algorithms, 2 step approaches (Faster R-CNN) and 1 step approches (Yolo, SSD) and culminate to what the state of the art is right now.
This talk intends to go into the internals of object detection focusing on what the transformations that an image goes through exaclty are. This information is not found easily on the open web as most explanations skimp over very important little details. The aim is that after the talk, the listener will be able to go home and implement these complex algorithms in a day or two on their own. The aim is for the listener to understand the internals of these systems in and out.


  1. Short explanation of convolutional and pooling layers and the concept of a feature map (short, assuming the listeners know about this)
  2. The 2 directions of how object detectors are made , the 2 step and the 1 step approach.
  3. Starting with the 2 Step Approach
  4. The problems with this approach
  5. Now onto the 1 step approaches, Yolo ,SSD (including the latest version of Yolo that came out in Feb)
  6. The problems with the 1 step approaches
  7. The multi scale problem, dealing with images of various scales and resolutions. (not explained well in current age blog posts). Will explain the FPN Model which solves these resolution/scale problems.
  8. The newest most promising approach, Retina Net, which tries to tackle both the problems of the first 2 approaches by using a novel loss function.
  9. Explain the intuition of the loss. And its implications.
  10. Code samples in Pytorch, try to get the intuition through easy to understand simple code.
  11. Resources from where people can find out more about this topic.


The participants must be aware of CNN’s and understand back propogation, gradients etc
They must have experience with a deep learning python framework like Tensorflow, Keras , Pytorch

Speaker bio

I am a 21-year-old software engineer and a Computer Engineering graduate of Netaji Subhash Institute of Technology (NSIT) — a premier engineering institution in India. I am currently working as a software engineer at HSBC in Pune where I do work on interesting deep learning projects from time to time.

I have been very interested in learning deep learning since my second year in college. I have always tried to apply deep learning in my projects from college and in HSBC too.

I recently won the NIPS Paper Implementation Challenge 2017 and did a feature on me which you can read about here =>

I would be the best person to deliver this talk as I am currently writing a survey paper on modern object recognition and have been following object recognition for a while now. I have dived through the code of the best repositories on object detection and can confidently say that I know how each part of the various model works. Also I aim to provide code samples of how each part works , so one can actually apply the knowledge people have gained in this talk . I will make all code open source so that people can play with it. The biggest shortcoming of material on object recognition found online is that, it is very scattered and unless you dive into the code, there are a lot of unanswered questions. As I have some experience with the dirty little tricks and internals of object detectors, I would be a good person to explain all of it and not provide just a high level description of it.

  • Aim to cover all the developments of these highly cited papers in the field.
  • Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2017.
  • Focal Loss for Dense Object Detection. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017.
  • Feature Pyramid Networks for Object Detection. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • Aggregated Residual Transformations for Deep Neural Networks. Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • R-FCN: Object Detection via Region-based Fully Convolutional Networks. Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2016.
  • Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  • Yolo
  • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2015.
  • Fast R-CNN. Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2015.



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Anthill Inside is a forum for conversations about risk mitigation and governance in Artificial Intelligence and Deep Learning. AI developers, researchers, startup founders, ethicists, and AI enthusiasts are encouraged to: more