evaluation
Here are 621 public repositories matching this topic...
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
-
Updated
Jun 13, 2025 - Python
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
-
Updated
Jun 13, 2025 - Python
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
-
Updated
May 4, 2025 - Python
Python package for the evaluation of odometry and SLAM
-
Updated
Jun 8, 2025 - Python
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
-
Updated
Jun 14, 2025 - Python
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
-
Updated
Mar 24, 2023 - Python
A unified evaluation framework for large language models
-
Updated
May 30, 2025 - Python
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
-
Updated
Jun 13, 2025 - Python
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
-
Updated
Jun 13, 2025 - Python
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-
Updated
Aug 18, 2024 - Python
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
-
Updated
Jan 10, 2025 - Python
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
-
Updated
Mar 11, 2025 - Python
☁️ 🚀 📊 📈 Evaluating state of the art in AI
-
Updated
Jun 11, 2025 - Python
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
-
Updated
Apr 3, 2024 - Python
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
-
Updated
Jun 13, 2025 - Python
Multi-class confusion matrix library in Python
-
Updated
Jun 11, 2025 - Python
心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models
-
Updated
May 18, 2025 - Python
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
-
Updated
Aug 20, 2024 - Python
Building blocks for rapid development of GenAI applications
-
Updated
Jun 13, 2025 - Python
Improve this page
Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."