Where training data comes from when real data is scarce or private — generation techniques, quality metrics, model collapse risk, and production pipelines.