[ICML 2025] Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

This repository contains the evaluation code and data of our ICML 2025 paper, Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage.

Prerequisites

Packages

openai>=1.14.1
python-dotenv==1.0.1

Dataset download

from huggingface_hub import hf_hub_download

local_path = hf_hub_download(
    repo_id="saehyungl/CapMAS",
    filename="images_capmas.tar.gz",
    repo_type="dataset"
)
print("Downloaded to:", local_path)

Or you can download it using this URL. Our evaluation uses a subset of the DOCCI images.

Captioning

Please generate captions for the 1,000 downloaded images for captioning evaluation. Summarize the generated captions into a dictionary where the key is the corresponding image file name, and save it as a .json file.

{
    "aar_test_04600.jpg": <caption_aar_test_04600>,
    "aar_test_04601.jpg": <caption_aar_test_04601>,
    ...
    "test_00599.json": <caption_test_00599>,
}

You may refer to the sample captions for guidance.

Evaluation

We provide the evaluation codes for the three metrics used in our paper: Factuality, Coverage, and CLAIR (Chan et al., EMNLP 2023). These evaluations rely on GPT-4o, so please fill in your OpenAI API key OR Azure OpenAI credentials in the conf/gpt4o file.

Factuality (ours)

python eval_factuality.py --image-dir <the image directory path> --captions-file <the caption .json file path>

Coverage (ours)

python eval_coverage.py --vqa-dir data/COVERAGE_TEST_VQA --captions-file <the caption .json file path>

CLAIR

python eval_clair.py --captions-file <the caption .json file path>

References

Cite

If you use the CapMAS dataset, filtering pipeline, or code from this repository, please cite the paper:

@article{lee2024toward,
  title={Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage},
  author={Lee, Saehyung and Yoon, Seunghyun and Bui, Trung and Shi, Jing and Yoon, Sungroh},
  journal={arXiv e-prints},
  pages={arXiv--2412},
  year={2024}
}

License

The evaluation code and needle set data is licensed under the Adobe Research License. The license prohibits commercial use and allows non-commercial research use.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
conf		conf
data		data
sample_captions/llava1.6-vicuna_llama3_th1.0		sample_captions/llava1.6-vicuna_llama3_th1.0
sample_data		sample_data
util		util
LICENSE		LICENSE
README.md		README.md
eval_clair.py		eval_clair.py
eval_coverage.py		eval_coverage.py
eval_factuality.py		eval_factuality.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICML 2025] Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

Prerequisites

Packages

Dataset download

Captioning

Evaluation

Factuality (ours)

Coverage (ours)

CLAIR

References

Cite

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

adobe-research/CapMAS

Folders and files

Latest commit

History

Repository files navigation

[ICML 2025] Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

Prerequisites

Packages

Dataset download

Captioning

Evaluation

Factuality (ours)

Coverage (ours)

CLAIR

References

Cite

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages