The Fifth Elephant 2016

India's most renowned data science conference

Om Deshmukh


RNNs for multimodal information fusion

Submitted Jun 9, 2016

Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. I will discuss a novel generalized deep neural network architecture where temporal streams from multiple modalities can be combined. The hybrid Recurrent Neural Network (RNN) exploits the complimentary nature of the multimodal temporal information by allowing the network to learn both modality-specific temporal dynamics as well as the dynamics in a multimodal feature space. The efficacy of the proposed model is also demonstrated on multiple datasets.


Deep Learning overview, RNN overview, proposed model, performance and comparisons

Speaker bio


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}