So I just finished DeepLearning.AI’s Post-Training of LLMs course, and honestly? It was pretty much exactly what I needed—a straightforward intro to how you actually fine-tune these big language models after they’ve done their initial training.
What the Course Covers
They break it down into three main ways to do this stuff:
Supervised Fine-Tuning (SFT) is basically when you want to make big changes to how your model behaves. Want to turn a regular foundation model into something that actually follows instructions? Or maybe teach it to use tools? That’s SFT territory. The big takeaway here is that quality beats quantity every time—1,000 really good, diverse examples will crush a million mediocre ones.
Direct Preference Optimization (DPO) is kind of like showing the model examples of “do this, not that.” You give it both good and bad responses so it learns what you actually want. This works great for smaller adjustments like making it safer, better at multilingual stuff, or just following instructions better. Pro tip: start with a model that can already answer questions, then use DPO to polish it up.
Online Reinforcement Learning is where things get really interesting (and complicated). The model generates responses in real-time, gets scored by humans or other models, and then updates itself based on that feedback. Think about how ChatGPT was trained with PPO, or what DeepSeek does with GRPO.
What I Actually Liked About It
The best part? They actually tell you when to use each method instead of just throwing theory at you. You get real advice on how to curate your data, what mistakes to avoid (like when DPO gets obsessed with surface-level patterns), and how much memory each approach is going to eat up.
Plus, they handle all the setup through their Jupyter notebook thing, which is honestly a relief when you just want to learn the concepts without spending half your time fighting with dependencies.
The Not-So-Great Parts
Okay, real talk—some of the hands-on stuff felt a bit like when your older sibling lets you “play” video games but gives you the controller that’s not actually plugged in. 😄 You’re going through the motions, but you’re not really in control. Still, it gives you a decent foundation if you want to actually implement this stuff yourself later.
Also, this definitely isn’t for people who are new to LLMs. You should already get the basics of how language models work before jumping into the fine-tuning world.
For me, this course was pretty much perfect for what I needed—an intro to post-training methods without having to slog through dense academic papers. It’s short, well-organized, and gives you enough understanding to figure out which rabbit holes are actually worth exploring.
Resources
- Post-Training of LLMs Course - The main course
- Additional Code Examples - Extra code that expands on what’s covered in the course