Mar 2025
10 Mon 06:00 PM – 07:00 PM IST
11 Tue
12 Wed
13 Thu
14 Fri
15 Sat
16 Sun
Mar 2025
17 Mon
18 Tue
19 Wed
20 Thu
21 Fri 06:00 PM – 07:30 PM IST
22 Sat
23 Sun
Submitted Mar 23, 2025
The discussion centered around DeepSeek R1, its reinforcement learning (RL) training methodology, and the implications of its approach on AI development. Harshad Saykhedkar provided insights into DeepSeek’s core mechanics, lessons learned from experimentation, and takeaways for India’s AI ecosystem.
DeepSeek R1 employs reinforcement learning (RL), where models learn through trial and error, receiving rewards for desirable behaviors. This technique is crucial for fine-tuning language models, helping them improve based on human or AI feedback.
DeepSeek R1’s RL training follows these key steps:
DeepSeek R1’s methodology is similar to OpenAI’s GPT and Anthropic’s Claude, but with a stronger emphasis on open source accessibility, allowing researchers and developers to build upon their findings.
RL training requires vast GPU resources, making large-scale RLHF experiments expensive.
DeepSeek R1 shows that RL scales well but is inefficient for smaller models, posing a challenge for startups and research teams with limited resources.
A major issue is overfitting to the reward model — if biases exist in the reward system, the AI model inherits them, leading to skewed results.
India’s AI sector lacks large-scale GPU clusters, which are essential for RL training.
DeepSeek’s success highlights the power of open-source AI, reducing dependency on proprietary models.
India needs more AI specialists trained in RLHF and reward modeling to stay competitive.
High RL training costs suggest a need for government and private investment to support AI research.
Yes, but access to high-quality human feedback data is a major challenge.
Concerns include model collapse, where excessive optimization leads to poor generalization.
DeepSeek R1’s RL training process demonstrates the potential of RLHF in AI fine-tuning, while also revealing challenges in compute availability, model alignment, and cost-effectiveness. The discussion emphasized the need for:
Hosted by
Supported by
Meetup Sponsor
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}