Skip to content

Instantly share code, notes, and snippets.

View daviddwlee84's full-sized avatar
💭
Just wanna be happier

David Lee daviddwlee84

💭
Just wanna be happier
View GitHub Profile
@jlia0
jlia0 / agent loop
Last active May 5, 2025 04:50
Manus tools and prompts
You are Manus, an AI agent created by the Manus team.
You excel at the following tasks:
1. Information gathering, fact-checking, and documentation
2. Data processing, analysis, and visualization
3. Writing multi-chapter articles and in-depth research reports
4. Creating websites, applications, and tools
5. Using programming to solve various problems beyond development
6. Various tasks that can be accomplished using computers and the internet
@daviddwlee84
daviddwlee84 / fingpt_forecaster.ipynb
Last active June 13, 2024 09:00
fingpt_forecaster.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

This book is all about patterns for doing ML. It's broken up into several key parts, building and serving. Both of these are intertwined so it makes sense to read through the whole thing, there are very many good pieces of advice from seasoned professionals. The parts you can safely ignore relate to anything where they specifically use GCP. The other issue with the book it it's very heavily focused on deep learning cases. Not all modeling problems require these. Regardless, let's dive in. I've included the stuff that was relevant to me in the notes.

Most Interesting Bullets:

  • Machine learning models are not deterministic, so there are a number of ways we deal with them when building software, including setting random seeds in models during training and allowing for stateless functions, freezing layers, checkpointing, and generally making sure that flows are as reproducible as possib
@ustayready
ustayready / gpt.py
Created January 16, 2023 23:49
CloudGPT - Use ChatGPT to analyze AWS policies for vulnerabilities
import openai
import boto3
import json
import time
from typing import Dict, List
openai.api_key = '### SET YOUR OPENAPI API KEY HERE ###'
session = boto3.session.Session()
client = session.client('iam')
@yhatt
yhatt / diagrams.md
Last active April 4, 2025 22:49
Marp diagram plugin example (Powered by kroki.io)
marp theme
true
default

Marp diagram plugin example

Powered by kroki.io

@DannyQuah
DannyQuah / 2022.01-D.Quah-Obsidian-iPad-syncing-via-iSH-git.md
Last active January 4, 2025 09:44
Obsidian on iPad syncing via iSH, git, and GitHub

Obsidian iPad syncing via iSH git

by Danny Quah, Jan 2022

This gist describes using Obsidian on iPad while syncing to other Obsidian platforms. The procedure uses git in iSH on iOS, and thus differs from using either Obsidian Sync or Working Copy as described in Obsidian/iOS+app.

(To be clear, Obsidian is one of my favourite Apps, and I'm all for supporting the team financially. Moreover, everything I've heard suggests the paid Obsidian Sync is excellent. However, I don't want my syncing processes to proliferate --- each service using a different client sync flow --- so I keep my systems minimal: just syncthing and git. After writing this I found an Obsidian Forum writeup which uses the same tools I do to achieve the same goal, but you'll want to read that with its accumulated contributions dispersed across the comments. So at least I was thinking

@4skinSkywalker
4skinSkywalker / VPVR.pine
Last active April 30, 2025 19:40
Volume Profile Visible Range in Pine Script
// This source code is subject to the terms of the Mozilla Public License 2.0 at https://mozilla.org/MPL/2.0/
// © Fr3d0C0rl30n3
//@version=4
study("Fr3d0's Volume Profile Visible Range", "VPVR", overlay=true, max_boxes_count=500)
DEFAULT_COLOR = color.new(color.gray, 0)
BORDER_COLOR = color.new(color.black, 80)
BUY_COLOR = color.new(color.green, 0)
SELL_COLOR = color.new(color.red, 0)
@karolzlot
karolzlot / tqdm_cpu_ram.py
Last active March 24, 2025 07:13
Monitoring real time cpu and ram usage with tqdm. If you like it please upvote this answer: https://stackoverflow.com/a/69511430/8896457
from tqdm import tqdm
from time import sleep
import psutil
with tqdm(total=100, desc='cpu%', position=1) as cpubar, tqdm(total=100, desc='ram%', position=0) as rambar:
while True:
rambar.n=psutil.virtual_memory().percent
cpubar.n=psutil.cpu_percent()
rambar.refresh()
cpubar.refresh()
@schaumb
schaumb / redirect.py
Last active February 4, 2025 12:20
streamlit redirect
import streamlit as st
import io
import contextlib
import sys
import re
import threading
class _Redirect:
class IOStuff(io.StringIO):