The Fifth Elephant 2016

India's most renowned data science conference


RNNs for multimodal information fusion

Submitted by Om Deshmukh (@omdesh) on Thursday, 9 June 2016

Section: Crisp talk Technical level: Intermediate


Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. I will discuss a novel generalized deep neural network architecture where temporal streams from multiple modalities can be combined. The hybrid Recurrent Neural Network (RNN) exploits the complimentary nature of the multimodal temporal information by allowing the network to learn both modality-specific temporal dynamics as well as the dynamics in a multimodal feature space. The efficacy of the proposed model is also demonstrated on multiple datasets.


Deep Learning overview, RNN overview, proposed model, performance and comparisons

Speaker bio


Login with Twitter or Google to leave a comment