Datta Nimmaturi

Datta Nimmaturi

@im_datta0

Transformer architectures Royal Rumble

Submitted Mar 24, 2025

Transformers have evolved a lot in the last few years ever since the introduction in 2017. We take a look at such different architectures and dive into what are the benifits and drawbacks of each.

Agenda

  1. Transformer architecture overview
  2. How attention mechanism evolved over time
    a. MHA, MQA and GQA
    b. MLA from DeepSeek
  3. Current improvements like Diff Transformer and nGPT
  4. How the differences stack up against each other.
  5. What are the inference time implications of each.
  6. Comparison on small scale models.
  7. Analysis and Conclusion

Takeaways

Intution and Functional explanation as to why changes have been incorporated since the inception of transformers.

Target audience

People willing to understand and learn the evolution of transformers and possible future. ML enthusiasts and professionals.

Comments

Login to leave a comment

No comments posted yet

Hosted by

Jump starting better data engineering and AI futures

Supported by

Meet-up sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Community sponsor