Call for Papers

Call for Papers

The Fifth Elephant Papers Reading community





RWKV: Reinventing RNNs for the transformer era

Submitted Feb 7, 2024

If you stepped into language modeling and Natural Language Processing (NLP) in the last three years, you are excused for being less familiar with Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks. Why?
RNNs could not keep up with the unparalleled capabilities (pun intended) of Transformers and have since fallen out of favor as the go-to architecture in the modern deep learning practitioner’s toolbox for modeling language.

The promise of Receptive Weighted Key Value (RWKV) is that this novel architecture combines the desirable aspects of both RNNs and Transformers: the massively parallelizable Transformer-esque training and the RNN’s consistent computational and memory complexity during inference. RWKV (pronounced “RwaKuv,” for some reason) is an attention-free language model, theoretically capable of handling an “infinite” context length.

In this session, we’ll:

  1. Provide an intuitive understanding of RWKV’s formulation, using math and code.
  2. Discuss how it performs on benchmarks and the scaling laws.
  3. Demo RWKV’s inference prowess.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning