LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
May 19, 2025 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A simple, high-quality voice conversion tool focused on ease of use and performance.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
Samantha OS1 is a conversational AI assistant powered by the Realtime API from OpenAI
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.
🗣️ Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs ✨
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
Codename's rvc fork version 3, based on Applio.
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Speech to text to speech using Elevenlabs
A user-friendly interface for ElevenLabs' API with added audio transcription capability.
Dự án công cụ chuyển đổi giọng nói dành cho người Việt
Systems submitted to IWSLT 2022 by the MT-UPC group.
A cutting-edge Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.
An advanced speech-to-speech (S2S) voice assistant utilizing OpenAI’s Realtime API for ultra-low-latency, two-way audio streaming, real-time natural language understanding, and responsive, interactive dialogue through direct WebSocket communication.
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."