The Fifth Elephant 2023 Monsoon

On AI, industrial applications of ML, and MLOps




Representation and Reasoning on dynamically composed multimodal structures

Submitted Jun 20, 2023

1, A deep understanding of Multimodality as a phenomenon
2. The gist of the Axiomatic treatment of multimodality - covering challenges pertaining to Representation, Alignment, Reasoning, Generation, Transference and Quantification
3. The heart of Graph Neural Network Modeling frameworks including GCNs, Graph Attention and their capacity to model Heterogeneity and spatio-temporality
4. How does a graph generation process coupled with a Language model based reasoning provides useful insights by reducing the complex multimodality challenges of multimodality.
5. Examples pertaining to a monologue product review, a conversational dialogue and a discussion in a group setting would be used to convey the trials, successes, challenges and failures in the modeling effort.

Screenplay :

This talk would be at the focal intersection of Graph Neural Nets, Graph Embeddings, Large Language models and Causal Reasoning. We would start with a fascinating tour of McGurk effect, the concept of intermodal perception, Inter-sensory redundancy and express the challenges plus the state of the art in multimodality treatment. We would convincingly establish heterogeneity and interconnectedness as the seminal principles of interest. Serially, we would also expound on the intuition behind graph neural nets and the power and expressivity of graph embeddings through the GCNs, GAT and Spatio-temporal graphs. Some sound literature works starting from Dan Jurafsky)2012), Peter V, Jure Leskovic, up and until L P Morency (2022) would be covered summarily. Once convinced that, Heterogeneity and interconnectedness are inherent and cardinal both in multimodal and Graphical worlds (GNN based frameworks and embeddings), we would delve into the world of LLMs. The specific simplification offered by Large Language Models on the multi-modally composed input would be described with examples, successes, failures and challenges.


Takeaways for beginners – Since we broach multiple vast areas like Graph Nets, Reasoning with LLMs and Multimodality, the beginners could be initiated on one or more of them.

Takeaways for Intermediate – Deep intuitive understanding of multimodality and Graph Nets. For example, in the world of Graph Neural nets, Aggregation schemes and embeddings are not so intuitive even for the initiated.

Takeaways for the advanced – An alternate view on how to compose multimodal information using Graphs and the usage of LLMs for some smart simplifications.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

All about data science and machine learning

Supported by

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing