How AI puts text, images, audio, and video into the same mathematical space — enabling cross-modal search and multimodal RAG.