I’ve always been learning by starting from the foundations and then building up something complex from it. In the era of reasoning LLMs, I believe a top-down approach works better. Imagine you want to learn a new deep learning architecture.

  1. Read the paper. Don’t stop for things you don’t understand, force yourself to read the entire thing. Even if you grasp only 5% of the paper initially, it’s fine. The point is to get a general overview of how it works and what it does.
  2. Skim through the code, run it, commenting the code as you go through it. The main objective should be to make the code runnable and debuggable.
  3. Read the paper again. This time, you’ll understand more than the first time. You’ll have a better understanding of the architecture and the math behind it.
  4. Leverage GPT to help you with any part of the paper or section you don’t understand.
  5. Read the paper again. Keep going back and forth between the paper, the code and GPT. You’ll start to understand more and more.
  6. Implement the architecture from scratch by using the reference code. It doesn’t matter if you have to copy some sections line by line, force yourself to deliver a working implementation; only then focus on understanding the little details.
  7. Document the learning process with a blog post / article. If you can explain it, you understood it.

Learning is painful as it makes you feel stupid and incompetent. But if you don’t feel stupid, you’re not learning.

Start feeling stupid. Start sucking at things. It’s the only way to get better.