Inference in Deep Neural Networks
Submitted by saurabh agarwal (@saurabh-agl) on Wednesday, 1 November 2017
A lot of focus is currently on training neural networks and better architecture. But we don’t focus alot on inference because well we are busy making our models work. Inference is supposed to run millions of time more than training and alot of times the inference is supposed to run on embeded devices. This talk will go into details of how the advancements in hardware have made Deep Learning possible. We will also talk of certain optimization which can be done to speed up computaion when deploying a model on CPU. We will debunk terms GeMM, SIMD, BLAS and SIMT on the way.
- Intro DL Networks.
- How do typical Deep Learning Architetures look.
- A small section using example of one CNN and one LSTM on what mathematical operations do they perform.
- Advancements in Hardware
- Intel Knight CPU’s
- Volta GPU’s
- How exactly the operations are done on garden-variety hardware
- Different type of Architectures
- CPU and GPU’s
- How do these work and bottlenecks
- CPU and GPU’s
- Role Played by Memory access in speeds
- How a lot of times memory is the bottleneck instead of Compute
- Changes in algortihms made to utilise these functionalities
- Example of Google’s Inception V3 model
- Two different type of RNN’s
- How to make your model more efficient at inference.
- Some practical examples
Saurabh has been working at MAD Street Den, Chennai as a Machine Learning Engineer since past year and a half,specifically working on Deep Learning based products. He loves to train Convolutional Neural Networks of all types and sizes for different applications. Apart from CNN’s he has special interest in recurrent architectures and discovering their powers. When he is not working on DL based stuff, he loves to play around with micro-controllers.