ImageMining tests whether visual agents can actually search with their eyes
Z.ai’s new ImageMining benchmark asks multimodal agents to inspect images, crop details, search outward, and reason across sources. That is a better test for many real visual workflows than another captioning score.