How small models learn from big ones — soft labels, temperature, and the techniques behind DistilBERT and TinyLlama.