From upenn BLINK multimodal large language models can see but not perceive.
From UPenn.
Multimodal large language models can see but not perceive.
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.
Join the discussion on this paper page.
Comments are closed.