Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Prelude

Neural network links before starting with transformers.

Key

🟢 = easy, 🟠 = medium, 🔴 = hard
⏱️ = short, 🕰️ = long
🙉 = low quality audio

Youtube Lessons

🟢🕰️ Łukasz Kaiser Attention is all you need; Attentional Neural Network Models This talk is from 6 years ago.
🟢🕰️ Andrej Karpathy The spelled-out intro to language modeling: building makemore: basic. bi-gram name generator model by counting, then by NN. using pytorch.
🟢🕰️ Andrej Karpathy Building makemore Part 2: MLP:
🕰️ Andrej Karpathy Building makemore Part 3: Activations & Gradients, BatchNorm):
🕰️ Andrej Karpathy Building makemore Part 4: Becoming a Backprop Ninja:
🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings: Tokens are embedded into a semantic space. sine/cosine position encoding explained very well.
🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention: Clear overview of multi-head attention.
🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 3) Decoder’s Masked Attention: Further details on the transformer architecture.
🟠🕰️ Andrej Karpathy Andrej Karpathy - Let's build GPT: from scratch, in code, spelled out.: build up a Shakespeare gpt-2-like from scratch. starts with bi-gram and adds features one by one. pytorch.
🔴🕰️ Chris Olah CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning: Interpretation. Deep look into the mechanics of induction heads. Companion article
🟢⏱️ Jay Alammar The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
🟢⏱️ Jay Alammar How GPT3 Works - Easily Explained with Animations: extremely high level basic overview.
🟢🕰️ Jay Alammar The Narrated Transformer Language Model: much deeper look at the architecture. goes into detail. Companion article.
Sebastian Raschka L19: Self-attention and transformer networks Academic style lecture series on self-attention transformers
🟢🕰️🙉 Mark Chen Transformers in Language: The development of GPT Models including GPT3 A chunk of this lecture is about applying GPT to images. Same lecture series as the Chris Olah one. Rest of the series. Papers listed in the talk:
- "GPT-1": Liu et. al. Generating Wikipedia by Summarizing Long Sequences
- "GPT-2": Radford et. al. Language Models are Unsupervised Multitask Learners github.com/openai/gpt-2 OpenAI: Better Language Models Fermats Library
- "GPT-3": Brown et. al. Language Models are Few-Shot Learners (I think this is it, can't find the quoted text inside this paper)

Articles

🟠 Jay Mody GPT in 60 Lines of NumPy
🟠 PyTorch Language Modeling with nn.Transformer and TorchText
🟠 Sasha Rush et. al. The Annotated Transformer
🟢 Jay Alammar The Illustrated Transformer companion video above.
🔥 Chris Olah et. al. In-context Learning and Induction Heads companion video lecture above

Research Paper Lists

Sebastian Raschka Understanding Large Language Models -- A Transformative Reading List This article lists some of the most important papers in the area.
OpenAI Research Index

Research Papers

Radford et. al. Improving Language Understanding by Generative Pre-Training a page accompanying this paper on the OpenAI blog Improving language understanding with unsupervised learning

ahmadalnaib/LLM.md

Purpose

Prelude

Key

Youtube Lessons

Articles

Research Paper Lists

Research Papers