VisualThinker-R1-Zero: First ever R1-Zero's Aha Moment on just a 2B non-SFT Model

VisualThinker-R1-Zero is a replication of DeepSeek-R1-Zero in visual reasoning. We are the first to successfully observe the emergent “aha moment” and increased response length in visual reasoning on just a 2B non-SFT models.

For more details, please refer to the notion report.

Training dynamics of our VisualThinker-R1-Zero training starting from the Qwen-VL-2B, without SFT or reward models. An aha moment and increasing response length is ever observed at a multimodal model.

🔮 Highlights

We are the first to successfully produce the emergent “aha moment” and increased response length for multimodal reasoning on just a non-SFT 2B model.
We showed that vision-centric tasks could also benefit from improved reasoning capabilities.

Similar to DeepSeek R1, self reflection behavior is also observed during our RL training on vision-centric reasoning tasks. The model exhibits an emergent ability to rethink and correct its mistakes:

. . .
Therefore, dark brown wooden bed with white blanket is not above the doorway.
But wait! I can think of something else.
Maybe it's just higher than above the doorway, but slightly lower than above the doorway.
. . .

📢 Updates

2025-03-16: 🤗We released the model checkpoint at huggingface!
2025-02-26: 🔥We share our main findings in this notion blog.
2025-02-26: 🔥We release the VisualThinker R1 Zero repo.

💻 Hardware Requirements

* estimated

Method	Bits	2B
GRPO Full Fine-Tuning	AMP	4*80GB

🧱 Setup

bash setup.sh

🤗 Prepare Dataset

cd src/data/SAT
bash prepare_dataset.sh

🏋️ Training

GRPO Training

To reproduce the multimodal aha moment, run the following code to train the non-SFT model with GRPO on SAT:

cd src/open-r1-multimodal
bash run_grpo_SAT.sh # Adjust open-r1-multimodal/configs/zero3.yaml or zero2.yaml accordingly

SFT Training

To obtain SFT model for comparison, run the following code to train the non-SFT model on SAT:

cd src/open-r1-multimodal
bash run_sft.sh # Adjust open-r1-multimodal/configs/zero3.yaml or zero2.yaml accordingly

📈 Evaluation

CVBench Evaluation

We provide following commands to reproduce our evaluation results on the CVBench. First change to evaluation directory:

cd src/eval

To evaluate Base + GRPO (VisualThinker R1 Zero) model:

python evaluate_Qwen2_VL_CVBench-base.py --model_path <path_to_your_model> \
    --bs 8 \
    --use_reasoning_prompt

To evaluate Base model:

python evaluate_Qwen2_VL_CVBench-base.py --model_path <path_to_your_model> \
    --bs 8 \
    --no-use_reasoning_prompt

To evaluate Instruct + GRPO model:

python evaluate_Qwen2_VL_CVBench.py --model_path <path_to_your_model> \
    --bs 8 \
    --use_reasoning_prompt

To evaluate Instruct model:

python evaluate_Qwen2_VL_CVBench.py --model_path <path_to_your_model> \
    --bs 8 \
    --no-use_reasoning_prompt

🔍 Resources

Full experiment log: Upcoming

Models CKPT: 🤗VisualThinker-R1-Zero at huggingface

☕ Stay Connected!

We are always open to engaging discussions, collaborations, or even just sharing a virtual coffee. To get in touch or join our team, visit TurningPoint AI's homepage for contact information.

📖 Acknowledgements

We sincerely thank DeepSeek, Open-R1, QwenVL, Open-R1-Multimodal, R1-V, SAT, and CV-Bench for providing open source resources that laid the foundation of our project.

🤝 Contributors

Here are the key contributors from TurningPoint AI to this project:

Hengguang Zhou¹^*, Xirui Li¹^*, Ruochen Wang¹^†, Minhao Cheng², Tianyi Zhou³ and Cho-Jui Hsieh¹⁴

^* Project Leads, ^† Main Advisor ¹University of California, Los Angeles, ²Penn State University, ³University of Maryland and ⁴Google Research

✅ Cite

If you find our work useful for your projects, please kindly cite the following BibTeX:

@misc{zhou2025r1zerosahamomentvisual,
      title={R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model}, 
      author={Hengguang Zhou and Xirui Li and Ruochen Wang and Minhao Cheng and Tianyi Zhou and Cho-Jui Hsieh},
      year={2025},
      eprint={2503.05132},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.05132}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
src		src
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisualThinker-R1-Zero: First ever R1-Zero's Aha Moment on just a 2B non-SFT Model

🔮 Highlights

📢 Updates

💻 Hardware Requirements

🧱 Setup

🤗 Prepare Dataset

🏋️ Training

GRPO Training

SFT Training

📈 Evaluation

CVBench Evaluation

🔍 Resources

☕ Stay Connected!

📖 Acknowledgements

🤝 Contributors

✅ Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

turningpoint-ai/VisualThinker-R1-Zero

Folders and files

Latest commit

History

Repository files navigation

VisualThinker-R1-Zero: First ever R1-Zero's Aha Moment on just a 2B non-SFT Model

🔮 Highlights

📢 Updates

💻 Hardware Requirements

🧱 Setup

🤗 Prepare Dataset

🏋️ Training

GRPO Training

SFT Training

📈 Evaluation

CVBench Evaluation

🔍 Resources

☕ Stay Connected!

📖 Acknowledgements

🤝 Contributors

✅ Cite

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages