Papers
arxiv:2603.27982

CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models

Published on Apr 1
Authors:
,
,
,
,

Abstract

Vision-language models exhibit commonsense-driven hallucination when visual evidence conflicts with commonsense knowledge, as demonstrated by a new benchmark measuring model behavior under such conditions.

AI-generated summary

Vision-language models (VLMs) achieve strong performance on many benchmarks, yet a basic reliability question remains underexplored: when visual evidence conflicts with commonsense, do models follow what is shown or what commonsense suggests? A characteristic failure in this setting is that the model overrides visual evidence and outputs the commonsense alternative. We term this phenomenon commonsense-driven hallucination (CDH). To evaluate it, we introduce CDH-Bench, a benchmark designed to create explicit visual evidence--commonsense conflicts. CDH-Bench covers three dimensions: counting anomalies, relational anomalies, and attribute anomalies. We evaluate frontier VLMs under binary Question Answering (QA) and multiple-choice QA, and report metrics including Counterfactual Accuracy (CF-Acc), Commonsense Accuracy (CS-Acc), Counterfactual Accuracy Drop (CFAD), Commonsense Collapse Rate (CCR), and Relative Prior Dependency (RPD). Results show that even strong models remain vulnerable to prior-driven normalization under visual evidence--commonsense conflict. CDH-Bench provides a controlled diagnostic of visual fidelity under visual evidence--commonsense conflict.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.27982
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.27982 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.27982 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.