Purpose

Bootstrap knowledge of LLMs ASAP.

Prelude

Neural network links before starting with transformers.

Key

🟢 = easy, 🟠 = medium, 🔴 = hard
⏱️ = short, 🕰️ = long

Youtube Lessons

🟢🕰️ Łukasz Kaiser Attention is all you need; Attentional Neural Network Models This talk is from 6 years ago.
🟢🕰️ Andrej Karpathy The spelled-out intro to language modeling: building makemore: basic. bi-gram name generator model by counting, then by NN. using pytorch.
🟢🕰️ Andrej Karpathy Building makemore Part 2: MLP:
🕰️ Andrej Karpathy Building makemore Part 3: Activations & Gradients, BatchNorm):
🕰️ Andrej Karpathy Building makemore Part 4: Becoming a Backprop Ninja:
🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings: Tokens are embedded into a semantic space. sine/cosine position encoding explained very well.
🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention: Clear overview of multi-head attention.
🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 3) Decoder’s Masked Attention: Further details on the transformer architecture.
🟠🕰️ Andrej Karpathy Andrej Karpathy - Let's build GPT: from scratch, in code, spelled out.: build up a Shakespeare gpt-2-like from scratch. starts with bi-gram and adds features one by one. pytorch.
🔴🕰️ Chris Olah CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning: Interpretation. Deep look into the mechanics of induction heads. Companion article
🟢⏱️ Jay Alammar The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
🟢⏱️ Jay Alammar How GPT3 Works - Easily Explained with Animations: extremely high level basic overview.
🟢🕰️ Jay Alammar The Narrated Transformer Language Model: much deeper look at the architecture. goes into detail. Companion article.
Sebastian Raschka L19: Self-attention and transformer networks Academic style lecture series on self-attention transformers

Articles

🟠 Jay Mody GPT in 60 Lines of NumPy
🟠 PyTorch Language Modeling with nn.Transformer and TorchText
🟠 Sasha Rush et. al. The Annotated Transformer
🟢 Jay Alammar The Illustrated Transformer companion video above.
🔥 Chris Olah et. al. In-context Learning and Induction Heads companion video lecture above

Research Papers

Sebastian Raschka Understanding Large Language Models -- A Transformative Reading List This article lists some of the most important papers in the area.

ahmadalnaib/LLM.md

Purpose

Prelude

Key

Youtube Lessons

Articles

Research Papers