Bootstrap knowledge of LLMs ASAP.
Neural network links before starting with transformers.
- https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
- https://www.3blue1brown.com/topics/neural-networks
- http://neuralnetworksanddeeplearning.com/
- https://distill.pub/
- 🟢 = easy, 🟠 = medium, 🔴 = hard
- ⏱️ = short, 🕰️ = long
- 🟢🕰️ Łukasz Kaiser Attention is all you need; Attentional Neural Network Models This talk is from 6 years ago.
- 🟢🕰️ Andrej Karpathy The spelled-out intro to language modeling: building makemore: basic. bi-gram name generator model by counting, then by NN. using pytorch.
- 🟢🕰️ Andrej Karpathy Building makemore Part 2: MLP:
- 🕰️ Andrej Karpathy Building makemore Part 3: Activations & Gradients, BatchNorm):
- 🕰️ Andrej Karpathy Building makemore Part 4: Becoming a Backprop Ninja:
- 🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings: Tokens are embedded into a semantic space. sine/cosine position encoding explained very well.
- 🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention: Clear overview of multi-head attention.
- 🟢⏱️ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 3) Decoder’s Masked Attention: Further details on the transformer architecture.
- 🟠🕰️ Andrej Karpathy Andrej Karpathy - Let's build GPT: from scratch, in code, spelled out.: build up a Shakespeare gpt-2-like from scratch. starts with bi-gram and adds features one by one. pytorch.
- 🔴🕰️ Chris Olah CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning: Interpretation. Deep look into the mechanics of induction heads. Companion article
- 🟢⏱️ Jay Alammar The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
- 🟢⏱️ Jay Alammar How GPT3 Works - Easily Explained with Animations: extremely high level basic overview.
- 🟢🕰️ Jay Alammar The Narrated Transformer Language Model: much deeper look at the architecture. goes into detail. Companion article.
- Sebastian Raschka L19: Self-attention and transformer networks Academic style lecture series on self-attention transformers
- 🟠 Jay Mody GPT in 60 Lines of NumPy
- 🟠 PyTorch Language Modeling with nn.Transformer and TorchText
- 🟠 Sasha Rush et. al. The Annotated Transformer
- 🟢 Jay Alammar The Illustrated Transformer companion video above.
- 🔥 Chris Olah et. al. In-context Learning and Induction Heads companion video lecture above
- Sebastian Raschka Understanding Large Language Models -- A Transformative Reading List This article lists some of the most important papers in the area.