RWKV: Reinventing RNNs for the transformer era

Submitted Feb 7, 2024

If you stepped into language modeling and Natural Language Processing (NLP) in the last three years, you are excused for being less familiar with Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks. Why?
RNNs could not keep up with the unparalleled capabilities (pun intended) of Transformers and have since fallen out of favor as the go-to architecture in the modern deep learning practitioner’s toolbox for modeling language.

The promise of Receptive Weighted Key Value (RWKV) is that this novel architecture combines the desirable aspects of both RNNs and Transformers: the massively parallelizable Transformer-esque training and the RNN’s consistent computational and memory complexity during inference. RWKV (pronounced “RwaKuv,” for some reason) is an attention-free language model, theoretically capable of handling an “infinite” context length.

In this session, we’ll:

Provide an intuitive understanding of RWKV’s formulation, using math and code.
Discuss how it performs on benchmarks and the scaling laws.
Demo RWKV’s inference prowess.

All submissions

Previous Next

Comments

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Call for Papers

RWKV: Reinventing RNNs for the transformer era

Comments