Skip to content

Instantly share code, notes, and snippets.

@willccbb
willccbb / grpo_demo.py
Last active June 29, 2025 03:17
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@rain-1
rain-1 / LLM.md
Last active June 28, 2025 14:59
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

@Niranjankumar-c
Niranjankumar-c / plotweights.py
Created October 11, 2019 06:24
plotting weights
def plot_weights(model, layer_num, single_channel = True, collated = False):
#extracting the model features at the particular layer number
layer = model.features[layer_num]
#checking whether the layer is convolution layer or not
if isinstance(layer, nn.Conv2d):
#getting the weight tensor data
weight_tensor = model.features[layer_num].weight.data
@f213
f213 / fabfile.py
Created January 6, 2019 10:11
Fabfile example
from fabric.api import env, run as fabric_run
from fabric.contrib.project import rsync_project
env.app_path = '/home/backend'
env.user = 'circle'
env.use_ssh_config = True
env.disable_knodwn_hosts = True
env.colorize_errors = True
@mburakerman
mburakerman / package.json
Last active September 26, 2022 17:32
Webpack 4 config.js (SCSS to CSS and Babel) 👌 The Simplest Usage 👌
{
"name": "webpack-sass",
"version": "1.0.0",
"scripts": {
"start": "webpack-dev-server --open --mode development",
"build": "webpack -p"
},
"devDependencies": {
"babel-core": "^6.26.0",
"babel-loader": "^7.1.4",
@kopiro
kopiro / three.doc.js
Created March 21, 2017 16:12
ThreeJS DeviceOrientationControls
/**
* @author richt / http://richt.me
* @author WestLangley / http://github.com/WestLangley
*
* W3C Device Orientation control (http://w3c.github.io/deviceorientation/spec-source-orientation.html)
*/
THREE.DeviceOrientationControls = function( object ) {
var scope = this;
@alecjacobson
alecjacobson / real-time-latex-in-browser.html
Last active September 26, 2020 03:49
Real-time LaTeX web application using MathJaX (e.g., for doing math during a Skype call)
<!DOCTYPE html>
<html>
<head>
<title>Real-time LaTeX in browser</title>
<!-- https://github.com/mathjax/MathJax/blob/master/test/sample-dynamic-2.html -->
<!-- Copyright (c) 2012-2015 The MathJax Consortium -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1">
@htp
htp / curl-websocket.sh
Last active May 28, 2025 02:24
Test a WebSocket using curl.
curl --include \
--no-buffer \
--header "Connection: Upgrade" \
--header "Upgrade: websocket" \
--header "Host: example.com:80" \
--header "Origin: http://example.com:80" \
--header "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
--header "Sec-WebSocket-Version: 13" \
http://example.com:80/
@jogonba2
jogonba2 / A5.py
Last active November 18, 2021 16:31
A5/1 python implementation
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# A5.py
#
# Author: overxfl0w13
#
def header():
print """