Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
-
Updated
Dec 12, 2023 - Python
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Codebase for fine-tuning / evaluating nougat-based image2latex generation models
The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
L-Verse: Bidirectional Generation Between Image and Text
Everything is very simple: you either download a picture file or specify its link when running a python script, and output you get a text file, and you can immediately view on the command line how it will look the result of your conversion.
To extract details from Indian National Identification Cards such as PAN (completed) & Aadhar, Passport, Driving License (WIP) in a structured format
OCR with Google's AI technology (Cloud Vision API)
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models. ECCV 2024
macOS OCR command-line tool for almost any image format
Telegram bot to convert image to text using python
DangoOCR: screenshot OCR recognize 文字识别,支持多种语言,识别后翻译,播放声音
Image to text with CLIP ViT-L/14 in ComfyUI
Easiest way to use AI models without coding (Web UI & API support)
A simple image to text converter with GUI!
A little Python application to auto tag your photos with the power of machine learning.
Add a description, image, and links to the image-to-text topic page so that developers can more easily learn about it.
To associate your repository with the image-to-text topic, visit your repo's landing page and select "manage topics."