streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
-
Updated
Jun 9, 2025 - Python
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
Use PaliGemma to auto-label data for use in training fine-tuned vision models.
Minimalist implementation of PaliGemma 2 & PaliGemma VLM from scratch
PyTorch implementation of PaliGemma 2
Notes for the Vision Language Model implementation by Umar Jamil
AI-powered tool to convert text from images into your desired language. Gemma vision model and multilingual model are used.
Leverage PaliGemma 2's DOCCI fine-tuned variant capabilities using LitServe.
Image Captioning with PaliGemma 2 Vision Language Model.
Leverage PaliGemma 2 mix model variant capabilities using LitServe.
MAESTRO is an AI-powered research application designed to streamline complex research tasks.
Add a description, image, and links to the paligemma topic page so that developers can more easily learn about it.
To associate your repository with the paligemma topic, visit your repo's landing page and select "manage topics."