Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehensionPublished in ICASSP, 2026Share on Bluesky Facebook LinkedIn X (formerly Twitter) Previous Next