This note covers adaptation: changing a model's weights so its default behavior changes. It spans the post-training spectrum — supervised fine-tuning (SFT) and preference alignment (RLHF, DPO) — the parameter-efficient methods that make tuning cheap (LoRA, QLoRA), the central hazard of catastrophic forgetting, and distillation (compressing a big model into a small one). It excludes pretraining (a foundations topic) and the input-side levers from earlier notes — retrieval and evaluation — though it depends on both: adaptation needs evaluation to verify it didn't break anything, and competes with retrieval as a way to inject knowledge.
The earlier disciplines change what the model sees. Adaptation changes what the model is. Prompting and retrieval feed information in at inference time but leave the weights untouched — which is the right move most of the time. Some goals, though, can't be reached that way: a consistent house style or output format, a skill internalized so it needn't be re-explained on every call, behaviors too nuanced to spell out in a prompt, or lower latency from a smaller specialized model.
Reaching those means editing the weights — and weights are expensive and fragile to edit. Training needs curated data, compute, and care, and it risks catastrophic forgetting: a model losing its general abilities after being tuned on a narrow task, because the new training overwrites knowledge from pretraining.[5] So the real discipline is changing the behavior you want while doing the least collateral damage — and only when a cheaper lever wouldn't have worked.
SFT trains the model on prompt → ideal-response pairs, nudging it to make the good response more likely. It learns by imitation: shown enough high-quality examples, it gets better at following instructions and matching a desired style.[2] It is the foundation step that turns a raw model into something that reliably follows instructions, and its success hinges on dataset quality and diversity across task types.[3]
Some qualities — helpfulness, harmlessness, honesty — are hard to demonstrate with a single "correct" answer, so they are taught from comparisons instead: A is better than B.[3] Two methods dominate. RLHF runs three stages — SFT, then train a separate reward model on human preference rankings, then optimize the model against that reward with reinforcement learning; at training time it juggles roughly four model instances at once.[1] DPO skips the separate reward model and the RL loop, folding preference directly into a single classification loss over chosen/rejected pairs and needing only two models — simpler and more stable to run.[1] The conceptual split: SFT imitates, preference methods optimize — and that optimization has a cost, sometimes trading away output diversity to chase the goal.[2]
Full fine-tuning updates every weight and stores a complete copy of the model for each task — heavy on memory and storage.[5] LoRA takes a smarter route: it freezes the base model and injects small low-rank matrices that approximate the needed weight update, training only about 0.5–5% of the parameters. The resulting adapters are megabytes rather than gigabytes, portable and easy to merge, and often match full fine-tuning.[4][6] QLoRA pushes further by quantizing the frozen base to 4-bit, so even 10B+ parameter models can be fine-tuned on a single consumer GPU.[5] A side benefit: because the base weights stay frozen, LoRA is more resistant to catastrophic forgetting than full tuning.[4]
Fine-tuning on a narrow dataset can overwrite what pretraining taught — and can even erode safety guardrails baked into the base model.[5] PEFT mitigates this but does not eliminate it: research finds that LoRA still forgets, with the amount of forgetting rising predictably as you train more parameters for more steps, and not escapable simply by stopping early.[7] The practical rule: after any tune, re-test general capability, not just the target task — which is precisely where adaptation hands the baton to evaluation.
Distillation compresses a large, capable "teacher" model into a small, cheap "student." The common recipe: run bulk inference with the teacher to generate labeled data, then train the smaller student to imitate those outputs.[8] The student serves far faster and cheaper, though its predictions are usually a notch below the teacher's.[8] One important constraint: many commercial APIs' terms forbid using their outputs to train competing models, so enterprise distillation typically restricts itself to open models.[10]
Adaptation is not always the answer, and it is the most expensive lever. A rough guide to what to reach for, and when.[9]
Start here. Cheapest and instant. Often enough on its own for behavior and formatting nudges.
When knowledge changes often or you need source citations. Update documents without retraining.
When you need a consistent behavior, tone, or style, or specialized reasoning — or lower latency than retrieval.
When a capable model works but is too slow or costly to serve, and you need a smaller deployable version.
Run as a sequence, adaptation is a refinement chain on top of a base model: teach it to follow with SFT, optionally steer it with preference alignment, optionally compress it with distillation, then deploy.
The throughline: prompting and retrieval change the model's inputs; adaptation changes the model itself, working inward from imitation (SFT) to fine preference steering (RLHF/DPO), with PEFT making each step affordable and distillation making the result cheap to serve. The discipline ties tightly back to the rest of the roadmap. It competes with retrieval — fine-tune for durable behavior and style, retrieve for facts that change — and the two are routinely combined. And it depends on evaluation: because every tune risks forgetting, you must regression-test general capability afterward, and the training data must stay firewalled from the golden test set, or your scores become meaningless.
Forgetting is not fully solvable. Even parameter-efficient tuning with early stopping leaves some of it, so budget for re-testing general ability after every tune and prefer the smallest intervention that achieves the goal.
Fine-tuning is reached for too early. A large share of "we need to fine-tune" problems are solved by better prompting or retrieval at a fraction of the cost and risk — and distilling a closed-API model may breach its terms. Adapt last, not first.