NickyDark1’s gists

cgpeter96 / grpo_demo.py

Created February 7, 2025 13:53

a grpo modifaction for deepspeed in multigpu from https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

	# train_grpo.py
	from typing import *
	import re
	import torch
	from datasets import load_dataset, Dataset, load_from_disk
	from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
	from peft import LoraConfig
	from trl import GRPOConfig, GRPOTrainer, TrlParser
	from dataclasses import dataclass, field

infoslack / grpo_demo.py

Created January 27, 2025 17:59

Group Relative Policy Optimization (GRPO) implementation

	# This implementation is based on the paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
	#
	# pip install torch transformers
	# python grpo_demo.py

	import torch
	import torch.nn as nn
	import torch.optim as optim
	from transformers import BertTokenizer, BertModel

willccbb / grpo_demo.py

Last active July 10, 2025 02:53

GRPO Llama-1B

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},

abodacs / whisper-static-cache.ipynb

Created June 3, 2024 09:53 — forked from huseinzol05/whisper-static-cache.ipynb

example of whisper static cache

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

lewtun / sft_trainer.py

Last active April 21, 2025 16:04

Fine-tuning Mistral 7B with TRL & DeepSpeed ZeRO-3

	# This is a modified version of TRL's `SFTTrainer` example (https://github.com/huggingface/trl/blob/main/examples/scripts/sft_trainer.py),
	# adapted to run with DeepSpeed ZeRO-3 and Mistral-7B-V1.0. The settings below were run on 1 node of 8 x A100 (80GB) GPUs.
	#
	# Usage:
	# - Install the latest transformers & accelerate versions: `pip install -U transformers accelerate`
	# - Install deepspeed: `pip install deepspeed==0.9.5`
	# - Install TRL from main: pip install git+https://github.com/huggingface/trl.git
	# - Clone the repo: git clone github.com/huggingface/trl.git
	# - Copy this Gist into trl/examples/scripts
	# - Run from root of trl repo with: accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero3.yaml --gradient_accumulation_steps 8 examples/scripts/sft_trainer.py

younesbelkada / train_adapters_transformers.py

Created August 3, 2023 09:43

Train adapters using transformers integration of PEFT

	from datasets import load_dataset
	import torch
	from peft import LoraConfig, prepare_model_for_int8_training
	from trl import SFTTrainer
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer, TrainingArguments

	dataset_name = "timdettmers/openassistant-guanaco"
	dataset = load_dataset(dataset_name, split="train")

	model_name = "facebook/opt-350m"

younesbelkada / finetune_mpt30b_guanaco.py

Last active August 30, 2023 06:04

Fine tune MPT-30B on Guanaco dataset and turn it into a chatbot - read the docstrings to install the correct versions of the required libraries.

	# coding=utf-8
	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software

younesbelkada / finetune_sft_trl.py

Last active July 4, 2025 14:31

Benchmarking SFT trainer with 8bit models

	# coding=utf-8
	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software

abodacs / GPT4all-langchain-demo.ipynb

Created April 4, 2023 10:52 — forked from psychemedia/GPT4all-langchain-demo.ipynb

Example of running GPT4all local LLM via langchain in a Jupyter notebook (Python)

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Created September 23, 2020 22:28