This is a Call for Proposals to discuss papers.
This group seeks to curate sessions and discussions around the papers related to the domain of Artificial Intelligence, Machine Learning, Deep Learning, and Large Language Models - be it the research, applications, and surveys around landscapes relevant to these.
Those interested can propose a session under the submissions tab above by highlighting the paper of their choice with a small gist on why they want to discuss this paper keeping in mind the below guidelines.
- Artificial Intelligence, Robotics, Reinforcement learning research and applications
- Machine Learning and Deep Learning research and applications.
- Large language models, Multimodal models, and Large Visual Models
- Advances in Hardware and Infrastructure to handle data science operations and workloads
- Best practices to be taken into consideration around:
-- implementation, training
-- data augmentation
-- inference deployments and
-- applications wrt safety, ethics, security, etc.
- The paper being taken up for the session must be highly cited/reviewed. It should be one of the popular and key papers in the domain of AI/ML/DL and LLMs.
- The presenter must prepare the slides that simplify the paper into easily understandable essence and topics to focus on.
- Code notebooks to show how the concepts in the paper can be applied are useful and encouraged most of the time.
- Review of the slides and checks on understanding of the paper and the relevant material will happen before confirmation.
- Once this is done, a discussant from the relevant domain will be matched with the presenter to anchor the discussion and session.
Bharat Shetty Barkur has worked across different organizations such as IBM India Software Labs, Aruba Networks, Fybr, Concerto HealthAI, and Airtel Labs. He has worked on products and platforms across diverse verticals such as retail, IoT, chat and voice bots, ed-tech, and healthcare leveraging AI, Machine Learning, NLP, and software engineering. His interests lie in AI, NLP research, and accessibility.
Simrat Hanspal, Technical Evangelist (CEO’s office) and AI Engineer at Hasura, has over a decade of experience as an NLP practitioner. She has worked with multiple startups like Mad Street Den, Fi Money, Nirvana Insurance, and large organizations like Amazon and VMware. She will anchor and lead the discussion.
Sachin Dharashivkar is the founder of AthenaAgent, a company that creates AI-powered cybersecurity solutions. Before this, he worked as a Reinforcement Learning engineer. In these roles, he developed agents for doing high-volume equity trading at JPMorgan and for playing video games at Unity.
Sidharth Ramachandran works at a large European media company and has been applying text-to-image techniques as part of building data products for a streaming platform. He is also a part-time instructor and has co-authored a book published by O’Reilly.
The Fifth Elephant is a community funded organization. If you like the work that The Fifth Elephant does and want to support meet-ups and activities - online and in-person - contribute by picking up a membership
For inquiries, leave a comment or call The Fifth Elephant at +91-7676332020.
RWKV: Reinventing RNNs for the transformer era
If you stepped into language modeling and Natural Language Processing (NLP) in the last three years, you are excused for being less familiar with Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks. Why?
RNNs could not keep up with the unparalleled capabilities (pun intended) of Transformers and have since fallen out of favor as the go-to architecture in the modern deep learning practitioner’s toolbox for modeling language.
The promise of Receptive Weighted Key Value (RWKV) is that this novel architecture combines the desirable aspects of both RNNs and Transformers: the massively parallelizable Transformer-esque training and the RNN’s consistent computational and memory complexity during inference. RWKV (pronounced “RwaKuv,” for some reason) is an attention-free language model, theoretically capable of handling an “infinite” context length.
In this session, we’ll:
- Provide an intuitive understanding of RWKV’s formulation, using math and code.
- Discuss how it performs on benchmarks and the scaling laws.
- Demo RWKV’s inference prowess.