Umar Jamil

Hi! I’m Umar Jamil 👨🏽, a machine learning engineer from Milan, Italy. I speak Italian, English, Urdu and Mandarin. My wife’s family calls me 小乌 (xiǎowū).

You can connect with me on 💼 LinkedIn.

I run a 🎬 YouTube channel to teach machine learning and AI concepts in a simple way.

Here are some of my projects:

🕸️ Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

🎯 Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math derivations

📐 Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code

🐍 Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

🌈 Mistral 7B and Mixtral 8x7B Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer (KV) Cache, Model Sharding

🔬 Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

⚛️ Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

🗃️ Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

👨 BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

🌄 Coding Stable Diffusion From Scratch

🦙 Coding LLaMA 2 From Scratch

🦙 LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

🌍 Segment Anything - Model explanation with code

🧮 LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

🖼 How diffusion models work - explanation and code!

⚙️ Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

🎛 Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

🪬 Attention is all you need (Transformer) - Model explanation (including math), Inference and Training