LPOI: Listwise Preference Optimization for Vision Language Models

Fatemeh Pesaran zadeh¹, Yoojin Oh¹, Gunhee Kim¹,

¹Seoul National University,

(ACL 2025 Main)

Aligning large VLMs with human preferences is a challenging task, as methods like RLHF and DPO often overfit to textual information or exacerbate hallucinations. Although augment ing negative image samples partially addresses these pitfalls, no prior work has employed list wise preference optimization for VLMs, due to the complexity and cost of constructing listwise image samples. In this work, we propose LPOI, the first object-aware listwise preference optimization developed for reducing hallucinations in VLMs. LPOI identifies and masks a critical object in the image, and then interpolates the masked region between the positive and negative images to form a sequence of incrementally more complete images. The model is trained to rank these images in ascending order of object visibility, effectively reducing hallucinations while retaining visual fidelity. LPOI requires no extra annotations beyond standard pairwise preference data, as it automatically constructs the ranked lists through object masking and interpolation. Comprehensive experiments on MMHalBench, AMBER, and Object HalBench confirm that LPOI outperforms existing preference optimization methods in reducing hallucinations and enhancing VLM performance.

Method

We propose a novel method for reducing hallucinations in Vision-Language Models (VLMs) by leveraging listwise preference optimization. Our approach identifies and masks a critical object in an image, then interpolates the masked region between a positive and negative image to form a sequence of incrementally more complete images. The model is trained to rank these images in ascending order of object visibility, effectively reducing hallucinations while retaining visual fidelity.

Experiments

Our results demonstrate that our method outperforms state-of-the-art (SOTA) models.

BibTeX


        @inproceedings{pesaranzadeh2025lpoi,
          title = “LPOI: Listwise Preference Optimization for Vision Language Models”,
          author = "Pesaran zadeh, Fatemeh  and Oh, Yoojin  and Kim, Gunhee",
          booktitle = "Proceedings of the 2025 Conference on Association for Computational Linguistics",
          year = "2025”,
        }

Template of this post is based on Nerfies website.

LPOI: Listwise Preference Optimization for Vision Language Models