Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension

Published in ICASSP, 2026