๐Ÿ”ฅ Flash Attention derived and coded from first principles with Triton (Python)

๐ŸŒ… Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

๐Ÿชข ML Interpretability: feature visualization, adversarial example, interp. for language models

๐Ÿ•ธ๏ธ Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

๐ŸŽฏ Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math derivations

๐Ÿ“ Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code

๐Ÿ Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

๐ŸŒˆ Mistral 7B and Mixtral 8x7B Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer (KV) Cache, Model Sharding

๐Ÿ”ฌ Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

โš›๏ธ Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

๐Ÿ—ƒ๏ธ Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

๐Ÿ‘จ BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

๐ŸŒ„ Coding Stable Diffusion From Scratch

๐Ÿฆ™ Coding LLaMA 2 From Scratch

๐Ÿฆ™ LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

๐ŸŒ Segment Anything - Model explanation with code

๐Ÿงฎ LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

โ›“ LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

๐Ÿ–ผ How diffusion models work - explanation and code!

โš™๏ธ Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

๐ŸŽ› Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

๐Ÿชฌ Attention is all you need (Transformer) - Model explanation (including math), Inference and Training