🔥 Flash Attention derived and coded from first principles with Triton (Python)
🌅 Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation
🪢 ML Interpretability: feature visualization, adversarial example, interp. for language models
🕸️ Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
📐 Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code
🐍 Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
🔬 Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
⚛️ Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
🗃️ Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
👨 BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
🌄 Coding Stable Diffusion From Scratch
🦙 LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
🌍 Segment Anything - Model explanation with code
🧮 LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
⛓ LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
🖼 How diffusion models work - explanation and code!
⚙️ Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
🎛 Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
🪬 Attention is all you need (Transformer) - Model explanation (including math), Inference and Training