Model Adaptation for AI Systems: Fine-Tuning, Alignment, and Distillation

TL;DR — Scope

This note covers adaptation: changing a model's weights so its default behavior changes. It spans the post-training spectrum — supervised fine-tuning (SFT) and preference alignment (RLHF, DPO) — the parameter-efficient methods that make tuning cheap (LoRA, QLoRA), the central hazard of catastrophic forgetting, and distillation (compressing a big model into a small one). It excludes pretraining (a foundations topic) and the input-side levers from earlier notes — retrieval and evaluation — though it depends on both: adaptation needs evaluation to verify it didn't break anything, and competes with retrieval as a way to inject knowledge.

The Problem

The earlier disciplines change what the model sees. Adaptation changes what the model is. Prompting and retrieval feed information in at inference time but leave the weights untouched — which is the right move most of the time. Some goals, though, can't be reached that way: a consistent house style or output format, a skill internalized so it needn't be re-explained on every call, behaviors too nuanced to spell out in a prompt, or lower latency from a smaller specialized model.

Reaching those means editing the weights — and weights are expensive and fragile to edit. Training needs curated data, compute, and care, and it risks catastrophic forgetting: a model losing its general abilities after being tuned on a narrow task, because the new training overwrites knowledge from pretraining.^[5] So the real discipline is changing the behavior you want while doing the least collateral damage — and only when a cheaper lever wouldn't have worked.

FIG. 1 — The training spectrum. Pretraining builds general ability; the two adaptation stages add instruction-following, then fine behavioral steering. Moving right means less data but more control over how the model behaves.

The Concepts

Supervised Fine-Tuning — learning by imitation

SFT trains the model on prompt → ideal-response pairs, nudging it to make the good response more likely. It learns by imitation: shown enough high-quality examples, it gets better at following instructions and matching a desired style.^[2] It is the foundation step that turns a raw model into something that reliably follows instructions, and its success hinges on dataset quality and diversity across task types.^[3]

Preference Alignment — RLHF vs DPO

Some qualities — helpfulness, harmlessness, honesty — are hard to demonstrate with a single "correct" answer, so they are taught from comparisons instead: A is better than B.^[3] Two methods dominate. RLHF runs three stages — SFT, then train a separate reward model on human preference rankings, then optimize the model against that reward with reinforcement learning; at training time it juggles roughly four model instances at once.^[1] DPO skips the separate reward model and the RL loop, folding preference directly into a single classification loss over chosen/rejected pairs and needing only two models — simpler and more stable to run.^[1] The conceptual split: SFT imitates, preference methods optimize — and that optimization has a cost, sometimes trading away output diversity to chase the goal.^[2]

FIG. 2 — Two roads to the same goal. RLHF routes preference through a reward model and an RL loop; DPO collapses the whole thing into one training step. DPO is simpler; RLHF still anchors the most capable production systems.

Parameter-Efficient Fine-Tuning — LoRA & QLoRA

Full fine-tuning updates every weight and stores a complete copy of the model for each task — heavy on memory and storage.^[5] LoRA takes a smarter route: it freezes the base model and injects small low-rank matrices that approximate the needed weight update, training only about 0.5–5% of the parameters. The resulting adapters are megabytes rather than gigabytes, portable and easy to merge, and often match full fine-tuning.^[4]^[6] QLoRA pushes further by quantizing the frozen base to 4-bit, so even 10B+ parameter models can be fine-tuned on a single consumer GPU.^[5] A side benefit: because the base weights stay frozen, LoRA is more resistant to catastrophic forgetting than full tuning.^[4]

FIG. 3 — How LoRA works. The big pretrained matrix W is frozen; a tiny pair of low-rank matrices (B·A) learns the task-specific adjustment. You train ~1% of the parameters and store a few megabytes per task instead of a full model copy.

Catastrophic Forgetting — the central hazard

Fine-tuning on a narrow dataset can overwrite what pretraining taught — and can even erode safety guardrails baked into the base model.^[5] PEFT mitigates this but does not eliminate it: research finds that LoRA still forgets, with the amount of forgetting rising predictably as you train more parameters for more steps, and not escapable simply by stopping early.^[7] The practical rule: after any tune, re-test general capability, not just the target task — which is precisely where adaptation hands the baton to evaluation.

Distillation — teacher into student

Distillation compresses a large, capable "teacher" model into a small, cheap "student." The common recipe: run bulk inference with the teacher to generate labeled data, then train the smaller student to imitate those outputs.^[8] The student serves far faster and cheaper, though its predictions are usually a notch below the teacher's.^[8] One important constraint: many commercial APIs' terms forbid using their outputs to train competing models, so enterprise distillation typically restricts itself to open models.^[10]

FIG. 4 — Distillation. The student learns to mimic the teacher's outputs, inheriting much of its skill in a fraction of the size — the standard move for making a capable model cheap enough to serve at scale.

The Decision — adapt, retrieve, or just prompt?

Adaptation is not always the answer, and it is the most expensive lever. A rough guide to what to reach for, and when.^[9]

PROMPT

Start here. Cheapest and instant. Often enough on its own for behavior and formatting nudges.

RAG

When knowledge changes often or you need source citations. Update documents without retraining.

FINE-TUNE

When you need a consistent behavior, tone, or style, or specialized reasoning — or lower latency than retrieval.

DISTILL

When a capable model works but is too slow or costly to serve, and you need a smaller deployable version.

How It All Fits Together

Run as a sequence, adaptation is a refinement chain on top of a base model: teach it to follow with SFT, optionally steer it with preference alignment, optionally compress it with distillation, then deploy.

Start

Base model

general ability from pretraining

→

Step 1

SFT

learn to follow instructions

→

Optional

Align

RLHF / DPO for preferences

→

Optional

Distill

shrink for cheap serving

→

Ship

Deploy

serve the adapted model

FIG. 5 — The adaptation chain. Solid nodes are near-universal; dashed nodes are added only when the goal requires them. Each step is optional except SFT and deploy — most projects stop well before the end.

The throughline: prompting and retrieval change the model's inputs; adaptation changes the model itself, working inward from imitation (SFT) to fine preference steering (RLHF/DPO), with PEFT making each step affordable and distillation making the result cheap to serve. The discipline ties tightly back to the rest of the roadmap. It competes with retrieval — fine-tune for durable behavior and style, retrieve for facts that change — and the two are routinely combined. And it depends on evaluation: because every tune risks forgetting, you must regression-test general capability afterward, and the training data must stay firewalled from the golden test set, or your scores become meaningless.

①

Forgetting is not fully solvable. Even parameter-efficient tuning with early stopping leaves some of it, so budget for re-testing general ability after every tune and prefer the smallest intervention that achieves the goal.

②

Fine-tuning is reached for too early. A large share of "we need to fine-tune" problems are solved by better prompting or retrieval at a fraction of the cost and risk — and distilling a closed-API model may breach its terms. Adapt last, not first.

Model Adaptation for AI Systems Fine-Tuning, Alignment, and Distillation