A Hitchhiker's Guide to Modern Object Detection: A deep learning journey since 2012
Submitted by Karanbir Chahal (@karanchahal) on Friday, 30 March 2018
The ability to detect objects in images has captured the world’s imagination. One can make a decent case that this application is the poster child of deep learning. What really put it on the map.
But few people really understands how computers have begun to detect these objects in images with a high accuracy. Which is surprising since it is the backbone of the tech powering self driving cars, drones and so many other vision related tasks. This talk aims to go through the history of object detection all the way from 2012 to present day. It will focus on 2 types of algorithms, 2 step approaches (Faster R-CNN) and 1 step approches (Yolo, SSD) and culminate to what the state of the art is right now.
This talk intends to go into the internals of object detection focusing on what the transformations that an image goes through exaclty are. This information is not found easily on the open web as most explanations skimp over very important little details. The aim is that after the talk, the listener will be able to go home and implement these complex algorithms in a day or two on their own. The aim is for the listener to understand the internals of these systems in and out.
- Short explanation of convolutional and pooling layers and the concept of a feature map (short, assuming the listeners know about this)
- The 2 directions of how object detectors are made , the 2 step and the 1 step approach.
- Starting with the 2 Step Approach
- The problems with this approach
- Now onto the 1 step approaches, Yolo ,SSD (including the latest version of Yolo that came out in Feb)
- The problems with the 1 step approaches
- The multi scale problem, dealing with images of various scales and resolutions. (not explained well in current age blog posts). Will explain the FPN Model which solves these resolution/scale problems.
- The newest most promising approach, Retina Net, which tries to tackle both the problems of the first 2 approaches by using a novel loss function.
- Explain the intuition of the loss. And its implications.
- Code samples in Pytorch, try to get the intuition through easy to understand simple code.
- Resources from where people can find out more about this topic.
The participants must be aware of CNN’s and understand back propogation, gradients etc
They must have experience with a deep learning python framework like Tensorflow, Keras , Pytorch
I am a 21-year-old software engineer and a Computer Engineering graduate of Netaji Subhash Institute of Technology (NSIT) — a premier engineering institution in India. I am currently working as a software engineer at HSBC in Pune where I do work on interesting deep learning projects from time to time.
I have been very interested in learning deep learning since my second year in college. I have always tried to apply deep learning in my projects from college and in HSBC too.
I recently won the NIPS Paper Implementation Challenge 2017 and nurture.ai did a feature on me which you can read about here => https://medium.com/nurture-ai/karanbir-singh-chahal-implementing-ai-research-paper-for-the-first-time-72670b1763bc
I would be the best person to deliver this talk as I am currently writing a survey paper on modern object recognition and have been following object recognition for a while now. I have dived through the code of the best repositories on object detection and can confidently say that I know how each part of the various model works. Also I aim to provide code samples of how each part works , so one can actually apply the knowledge people have gained in this talk . I will make all code open source so that people can play with it. The biggest shortcoming of material on object recognition found online is that, it is very scattered and unless you dive into the code, there are a lot of unanswered questions. As I have some experience with the dirty little tricks and internals of object detectors, I would be a good person to explain all of it and not provide just a high level description of it.