UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
-
Updated
Jul 14, 2025 - Python
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
Code scanner to check for issues in prompts and LLM calls
Cost-of-Pass: An Economic Framework for Evaluating Language Models
LLM-as-a-judge for Extractive QA datasets
JudgeGPT - (Fake) News Evaluation, a research project
A Python library providing evaluation metrics to compare generated texts from LLMs, often against reference texts. Features streamlined workflows for model comparison and visualization.
CLI tool to evaluate LLM factuality on MMLU benchmark.
Automatic multi-metric evaluation of human-bot dialogues using LLMs (Claude, GPT-4o) across different datasets and settings. Built for the Artificial Intelligence course at the University of Salerno.
An open-source evaluation suite for testing LLMs on refusal handling, tone control, and reasoning. Built to explore model behavior across nuanced user cases.
Interactive Python toolkit for exploring, testing, and benchmarking LLM tokenization, prompt behaviors, and sequence efficiency in a safe, modular sandbox environment.
Entrainement et évaluation du moteur de traduction neuronale OpenNMT sur un corpus en formes fléchies puis en lemmes
This repository contains a study comparing the web search capabilities of four AI assistants: Gemini 2.0 Flash, ChatGPT-4 Turbo, DeepSeekR1, and Grok 3
Automatic multi-metric evaluation of human-bot dialogues using LLMs (Claude, GPT-4o) across different datasets and settings. Built for the Artificial Intelligence course at the University of Salerno.
A diagnostic Gradio tool to simulate feedback loops in Retrieval-Augmented Generation (RAG) pipelines and detect Model Autophagy Disorder (MAD) risks.
A modular system for automated, multi-metric AI prompt evaluation—featuring expert models, an orchestrator, and a modern web UI.
Add a description, image, and links to the ai-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the ai-evaluation topic, visit your repo's landing page and select "manage topics."