Skip to content

Instantly share code, notes, and snippets.

@willccbb
willccbb / grpo_demo.py
Last active May 14, 2025 09:41
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@rain-1
rain-1 / LLM.md
Last active May 15, 2025 04:52
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

@Niranjankumar-c
Niranjankumar-c / plotweights.py
Created October 11, 2019 06:24
plotting weights
def plot_weights(model, layer_num, single_channel = True, collated = False):
#extracting the model features at the particular layer number
layer = model.features[layer_num]
#checking whether the layer is convolution layer or not
if isinstance(layer, nn.Conv2d):
#getting the weight tensor data
weight_tensor = model.features[layer_num].weight.data
@f213
f213 / fabfile.py
Created January 6, 2019 10:11
Fabfile example
from fabric.api import env, run as fabric_run
from fabric.contrib.project import rsync_project
env.app_path = '/home/backend'
env.user = 'circle'
env.use_ssh_config = True
env.disable_knodwn_hosts = True
env.colorize_errors = True
@mburakerman
mburakerman / package.json
Last active September 26, 2022 17:32
Webpack 4 config.js (SCSS to CSS and Babel) 👌 The Simplest Usage 👌
{
"name": "webpack-sass",
"version": "1.0.0",
"scripts": {
"start": "webpack-dev-server --open --mode development",
"build": "webpack -p"
},
"devDependencies": {
"babel-core": "^6.26.0",
"babel-loader": "^7.1.4",
@kopiro
kopiro / three.doc.js
Created March 21, 2017 16:12
ThreeJS DeviceOrientationControls
/**
* @author richt / http://richt.me
* @author WestLangley / http://github.com/WestLangley
*
* W3C Device Orientation control (http://w3c.github.io/deviceorientation/spec-source-orientation.html)
*/
THREE.DeviceOrientationControls = function( object ) {
var scope = this;
@alecjacobson
alecjacobson / real-time-latex-in-browser.html
Last active September 26, 2020 03:49
Real-time LaTeX web application using MathJaX (e.g., for doing math during a Skype call)
<!DOCTYPE html>
<html>
<head>
<title>Real-time LaTeX in browser</title>
<!-- https://github.com/mathjax/MathJax/blob/master/test/sample-dynamic-2.html -->
<!-- Copyright (c) 2012-2015 The MathJax Consortium -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1">
@htp
htp / curl-websocket.sh
Last active February 8, 2025 05:24
Test a WebSocket using curl.
curl --include \
--no-buffer \
--header "Connection: Upgrade" \
--header "Upgrade: websocket" \
--header "Host: example.com:80" \
--header "Origin: http://example.com:80" \
--header "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
--header "Sec-WebSocket-Version: 13" \
http://example.com:80/
@jogonba2
jogonba2 / A5.py
Last active November 18, 2021 16:31
A5/1 python implementation
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# A5.py
#
# Author: overxfl0w13
#
def header():
print """