Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression

Hamidreza Dastmalchi, Aijun An, Ali Cheraghian, Hamed Barzamini

CVPR 2026

Abstract

While large vision–language models (LVLMs) achieve strong performance on multimodal tasks, they frequently generate hallucinations—unfaithful outputs misaligned with the visual input. To address this issue, we introduce CIPHER (Counterfactual Image Perturbations for Hallucination Extraction and Removal), a training-free method that suppresses vision-induced hallucinations via lightweight feature-level correction. Unlike prior training-free approaches that primarily focus on text-induced hallucinations, CIPHER explicitly targets hallucinations arising from the visual modality.

CIPHER operates in two phases. In the offline phase, we construct OHC-25K (Object-Hallucinated Counterfactuals, 25,000 samples), a counterfactual dataset consisting of diffusion-edited images that intentionally contradict the original ground-truth captions. We pair these edited images with the unchanged ground-truth captions and process them through an LVLM to extract hallucination-related representations. Contrasting these representations with those from authentic (image, caption) pairs reveals structured, systematic shifts spanning a low-rank subspace characterizing vision-induced hallucination.

In the inference phase, CIPHER suppresses hallucinations by projecting intermediate hidden states away from this subspace. Experiments across multiple benchmarks show that CIPHER significantly reduces hallucination rates while preserving task performance.

Method Overview

CIPHER estimates feature directions triggered by hallucination-inducing visual cues and suppresses these components during inference. The method operates in two stages: (1) an offline phase, where we construct a counterfactual dataset (OHC-25K) and estimate the hallucination subspaces from the LVLM’s hidden representations; and (2) an inference phase, where we nullify hallucination-prone components in hidden states during generation by projecting onto the subspace orthogonal to the learned hallucination directions.

CIPHER offline phase: counterfactual dataset generation and hallucination subspace estimation. — Figure 1: Offline phase. (a) Counterfactual image–caption pairs are generated via diffusion-based editing. (b) Hallucination directions are extracted from the LVLM and the principal subspace is obtained via SVD, forming the hallucination basis bank.

CIPHER inference phase: test-time nullification of hallucination components. — Figure 2: Inference phase. At each decoding step, hidden states at selected layers are projected onto the subspace orthogonal to the hallucination space using the basis bank from the offline phase, suppressing hallucination-prone components while preserving core semantics.

Results

Method	LLaVA-1.5			MiniGPT-4			mPLUG-Owl2
	CHAIR_S↓	CHAIR_I↓	BLEU↑	CHAIR_S↓	CHAIR_I↓	BLEU↑	CHAIR_S↓	CHAIR_I↓	BLEU↑
Greedy	20.40_±2.80	7.08_±0.33	15.72_±0.10	32.40_±2.20	12.20_±0.42	14.57_±0.11	22.90_±0.90	8.62_±0.11	15.01_±0.24
Beam Search	19.50_±2.30	6.84_±0.79	15.99_±0.14	30.10_±0.30	11.87_±0.37	15.35_±0.24	20.30_±0.70	7.62_±0.19	15.43_±0.05
DoLa (ICLR'24)	20.20_±2.80	6.75_±0.54	15.68_±0.10	31.90_±3.30	12.15_±0.89	14.54_±0.12	22.40_±1.80	8.36_±0.04	15.13_±0.21
OPERA (CVPR'24)	17.50_±0.50	6.07_±0.32	16.02_±0.02	29.70_±0.30	11.96_±0.29	14.82_±0.05	20.07_±2.07	7.18_±0.39	15.41_±0.12
VCD (CVPR'24)	20.30_±1.10	7.28_±0.10	14.53_±0.01	29.00_±2.80	12.64_±1.19	14.42_±0.01	22.80_±0.80	8.68_±0.17	15.14_±0.13
Woodpecker (SCIS'24)	23.85_±4.62	7.50_±0.01	17.05_±0.00	28.87_±2.20	10.20_±0.85	15.30_±0.01	26.33_±1.98	8.43_±0.80	16.43_±0.00
LURE (ICLR'24)	19.48_±2.35	6.50_±0.38	15.97_±0.01	27.88_±2.25	10.20_±0.85	15.03_±0.01	21.27_±0.06	7.67_±0.16	15.65_±0.15
HALC (ICML'24)	16.90_±2.10	5.72_±0.55	16.02_±0.04	25.20_±2.00	9.42_±0.41	14.91_±0.13	18.80_±1.20	7.00_±0.01	15.33_±0.24
Nullu (CVPR'25)	15.20_±0.60	5.30_±0.03	15.69_±0.04	21.40_±1.00	8.99_±0.36	14.81_±0.06	15.60_±1.20	5.77_±0.01	15.45_±0.01
CIPHER (Ours)	13.05_±0.57	4.53_±0.38	15.82_±0.25	18.48_±1.20	8.33_±0.17	15.10_±0.43	13.60_±1.06	4.92_±0.15	16.25_±0.47

Table 1: CHAIR and BLEU scores across LVLMs; lower CHAIR = less hallucination, higher BLEU = better fluency.

BibTeX

@inproceedings{dastmalchi2026cipher,
  title     = {Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression},
  author    = {Dastmalchi, Hamidreza and An, Aijun and Cheraghian, Ali and Barzamini, Hamed},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}