Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Yoav Goldberg, April 2023.
With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much
{-# LANGUAGE TypeSynonymInstances #-} | |
data Dual d = D Float d deriving Show | |
type Float' = Float | |
diff :: (Dual Float' -> Dual Float') -> Float -> Float' | |
diff f x = y' | |
where D y y' = f (D x 1) | |
class VectorSpace v where | |
zero :: v |
// start with this | |
let widget = new ListWidget() | |
// | |
// Set colors | |
let bidenColor = new Color("1a68ff") | |
let trumpColor = new Color("ff4a43") | |
let bidenColor2 = new Color("1a68ff", 0.8) | |
let trumpColor2 = new Color("ff4a43", 0.8) |
Note: This tutorial is written for Linux based systems.
To install the latest version of R please flollow the download and install instructions at https://cloud.r-project.org/
Neovim (nvim) is the continuation and extension of Vim editor with the aim to keep the good parts of Vim and add more features. In this tutorial I will be using Neovim (nvim), however, most of the steps are equally applicable to Vim also. Please follow download and installation instructions on nvim's GitHub wiki https://github.com/neovim/neovim/wiki/Installing-Neovim.
I often train machine learning/deep learning models and it takes a very long time to finish. Even an epoch in a moderately complex model takes near to half an hour to train. So, I constantly need to check (baby sit) the training process.
To help reduce the pain, I need a way to notify me on the training metrics. The idea is, we will send the training metrics (messages) as notifications on mobile using PyTorch Callbacks.
I have written some Python code snippets that helps me send my training metrics log as mobile push notifications using Pushover service. They have a limit of 7500 requests per month per user—which is fine for my usecase.
Those who'd like to have something like this, you can grab those little hacky scripts.
""" | |
A bare bones examples of optimizing a black-box function (f) using | |
Natural Evolution Strategies (NES), where the parameter distribution is a | |
gaussian of fixed standard deviation. | |
""" | |
import numpy as np | |
np.random.seed(0) | |
# the function we want to optimize |
/** | |
* Get the date and days within a week from week number. | |
* eg: date range for 8th week in 2013 is 17th Feb to 23rd Feb. This | |
* code snippet will give you. | |
* | |
* It is not my code completely, Bit of modification from something | |
* i found on net. Cant find it anymore so keeping a backup. | |
* | |
* @param {[Integer]} weekNo [From week 1 to Week 52/53 based on the system date setting] | |
* @return {[Date]} [description] |