-
DeepSeek used GRPO (Group Reward Policy Optimization), a variant of PPO (Proximal Policy Optimization) for training
-
GRPO differs from PPO in several ways:
- Samples multiple completions (G of them) instead of just one
- Doesn't use a separate value network, instead using the group of completions as a Monte Carlo estimate
- Uses a reference policy (SFT model) with KL divergence to prevent drift
-
The model is based on a distilled version of Qwen 7B (7.62 billion parameters)
-
They used a custom 8-bit format (F8 E4M3) for lower precision training
# train_grpo.py | |
# | |
# See https://github.com/willccbb/verifiers for ongoing developments | |
# | |
import re | |
import torch | |
from datasets import load_dataset, Dataset | |
from transformers import AutoTokenizer, AutoModelForCausalLM | |
from peft import LoraConfig | |
from trl import GRPOConfig, GRPOTrainer |
I am building a prompt for an LLM (gpt-4o) while building a conversational assistant. The LLM is expected to predict one of the available commands based on the instructions given. Here is the current prompt - | |
``` | |
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to small talk and knowledge requests. | |
These are the flows that can be started, with their description and slots: | |
transfer_money: send money to friends and family | |
slot: transfer_money_recipient (the name of a person) | |
slot: transfer_money_amount_of_money (the amount of money without any currency designation) |
GPT-3's computational requirements depend heavily on model size, precision, and hardware optimization. For the full 175B-parameter model, running inference requires at least 5 NVIDIA A100 GPUs (80GB each) to meet the 350GB memory requirement when using FP16 precision[4][9][22]. However, practical implementations often use 8 GPUs for improved parallelism and throughput[3][22].
-
Memory:
- 175B parameters require 350GB of VRAM at FP16 precision (2 bytes/parameter)[4][9][14].
- Consumer GPUs like RTX 3090s (24GB VRAM) can technically work in multi-GPU setups (e.g., 8x24GB = 192GB), but require aggressive memory optimizations like 8-bit quantization[2][28].
-
Hardware Recommendations:
- Data center GPUs: 5–8 NVIDIA A100/A800 (80GB) GPUs for stable deployment[3][4][13].
-
Consumer GPUs: 8x RTX 3090/4090 with PCIe 5.0 and NVLink for reduced communication bottlenecks[2][28].
The field of artificial intelligence has seen remarkable progress in recent years, particularly in the domain of large language models (LLMs). This article explores the journey from the foundational Transformer architecture to the cutting-edge DeepSeek-R1 model, highlighting key developments and breakthroughs along the way.
The Transformer architecture, introduced in 2017, revolutionized natural language processing. Its attention mechanism allowed for more efficient processing of sequential data, paving the way for larger and more capable language models1.
Ollama is an engine that allows you to run large language models (LLMs) such as Deepseek R1 on your local machine.
- Download Ollama from the Ollama Download Page.
- Follow the installation instructions specific to your operating system.
Introducing DeepSeek R1: A New Era in Open-Source AI
Version 1.58-bit Dynamic
Released: January 27, 2025
Authors: Andrew
DeepSeek R1 is making headlines by challenging OpenAI's O1 reasoning model, all while being completely open-source. We've worked on making it more accessible for local users by reducing the model's size from 720GB to 131GB, an impressive 80% reduction, without compromising its functionality.
By analyzing DeepSeek R1's structure, the Unsloth team was selectively able to quantize certain layers to higher bits (like 4-bit) while keeping most MoE layers at 1.5-bit. This approach prevents the model from producing errors like endless loops or nonsensical outputs, which occur if all layers are naively quantized.
The release of DeepSeek-R1 marks a pivotal moment in the progress of AI, particularly for the machine learning research community. This innovative model stands out due to its open weights, compact distilled versions, and a transparent training process aimed at replicating reasoning-focused models, such as OpenAI O1. Below, we explore its development and unique approach.
Large Language Models (LLMs), like DeepSeek-R1, are designed to generate tokens sequentially while excelling in tasks like reasoning and mathematics. This success is rooted in the model's ability to generate "thinking tokens" that articulate its thought process. To better understand its training methodology, let’s break it into three established stages:
- Base Model Training: The model learns to predict the next token from massive web datasets, forming its foundational capabilities.
🚀 Ready to supercharge your productivity? Dive into the world of AI with DeepSeek-R1 and Browser-Use, a game-changing combination that turns your computer into a personal AI powerhouse. This comprehensive guide will transform you from a curious tech enthusiast to an AI maestro, capable of creating an assistant that browses the web, gathers data, and delivers insightful analysis—all while keeping your data 100% private and under your control.
By the end of this guide, you'll be able to: