image-to-text

Here are 121 public repositories matching this topic...

lucidrains / CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

deep-learning transformers artificial-intelligence image-to-text attention-mechanism multimodal contrastive-learning

Updated Dec 12, 2023
Python

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Jun 11, 2025
Python

Flame-Code-VLM / Flame-Code-VLM

Star

Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.

react open-source front-end ai vue deep-learning frontend code-generation image-to-text vlm frontend-development multimodal data-synthesis design-to-code llm vision-language-model deepseek image-to-code screen-to-code

Updated Mar 26, 2025
Python

Yushi-Hu / tifa

Star

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

image-to-text text-to-image visual-question-answering large-language-models

Updated Apr 29, 2024
Python

NormXU / nougat-latex-ocr

Star

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

image-to-text

Updated Sep 25, 2024
Python

yardstick17 / image_text_reader

Star

The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.

ocr tesseract-ocr image-to-text image-reader read-image ocr-text-reader

Updated Apr 3, 2019
Python

shoryasethia / markdrop

Sponsor

Star

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

open-source pdf-to-text image-to-text marker agents pypi-package table-to-text markitdown llm pdf-to-markdown docling markdrop

Updated Mar 27, 2025
Python

MIMICLab / L-Verse

Star

L-Verse: Bidirectional Generation Between Image and Text

deep-learning pytorch transformer image-captioning image-to-text text-to-image vq-vae pytorch-lightning l-verse

Updated Apr 1, 2025
Python

BEPb / image_to_ascii

Star

Everything is very simple: you either download a picture file or specify its link when running a python script, and output you get a text file, and you can immediately view on the command line how it will look the result of your conversion.