Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.
Neural network links before starting with transformers.
- https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
- https://www.3blue1brown.com/topics/neural-networks
- http://neuralnetworksanddeeplearning.com/
- https://distill.pub/
- π’ = easy, π = medium, π΄ = hard
- β±οΈ = short, π°οΈ = long
- π = low quality audio
- π’π°οΈ Εukasz Kaiser Attention is all you need; Attentional Neural Network Models This talk is from 6 years ago.
- π’π°οΈ Andrej Karpathy The spelled-out intro to language modeling: building makemore: basic. bi-gram name generator model by counting, then by NN. using pytorch.
- π’π°οΈ Andrej Karpathy Building makemore Part 2: MLP:
- π°οΈ Andrej Karpathy Building makemore Part 3: Activations & Gradients, BatchNorm):
- π°οΈ Andrej Karpathy Building makemore Part 4: Becoming a Backprop Ninja:
- π’β±οΈ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings: Tokens are embedded into a semantic space. sine/cosine position encoding explained very well.
- π’β±οΈ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention: Clear overview of multi-head attention.
- π’β±οΈ Hedu AI Visual Guide to Transformer Neural Networks - (Episode 3) Decoderβs Masked Attention: Further details on the transformer architecture.
- π π°οΈ Andrej Karpathy Andrej Karpathy - Let's build GPT: from scratch, in code, spelled out.: build up a Shakespeare gpt-2-like from scratch. starts with bi-gram and adds features one by one. pytorch.
- π΄π°οΈ Chris Olah CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning: Interpretation. Deep look into the mechanics of induction heads. Companion article
- π’β±οΈ Jay Alammar The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
- π’β±οΈ Jay Alammar How GPT3 Works - Easily Explained with Animations: extremely high level basic overview.
- π’π°οΈ Jay Alammar The Narrated Transformer Language Model: much deeper look at the architecture. goes into detail. Companion article.
- Sebastian Raschka L19: Self-attention and transformer networks Academic style lecture series on self-attention transformers
- π’π°οΈπ Mark Chen Transformers in Language: The development of GPT Models including GPT3 A chunk of this lecture is about applying GPT to images. Same lecture series as the Chris Olah one. Rest of the series. Papers listed in the talk:
- "GPT-1": Liu et. al. Generating Wikipedia by Summarizing Long Sequences
- "GPT-2": Radford et. al. Language Models are Unsupervised Multitask Learners github.com/openai/gpt-2 OpenAI: Better Language Models Fermats Library
- "GPT-3": Brown et. al. Language Models are Few-Shot Learners (I think this is it, can't find the quoted text inside this paper)
- π Jay Mody GPT in 60 Lines of NumPy
- π PyTorch Language Modeling with nn.Transformer and TorchText
- π Sasha Rush et. al. The Annotated Transformer
- π’ Jay Alammar The Illustrated Transformer companion video above.
- π₯ Chris Olah et. al. In-context Learning and Induction Heads companion video lecture above
- Sebastian Raschka Understanding Large Language Models -- A Transformative Reading List This article lists some of the most important papers in the area.
- OpenAI Research Index
- Radford et. al. Improving Language Understanding by Generative Pre-Training a page accompanying this paper on the OpenAI blog Improving language understanding with unsupervised learning