DSIR large-scale data selection framework for language model training
-
Updated
Apr 7, 2024 - Python
DSIR large-scale data selection framework for language model training
🐂 🔥Official repository for the paper "LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning".
InstructionGPT-4
[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach
Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning".
[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).
Enhancing Efficiency in Multidevice Federated Learning through Data Selection
Keras sentence classification
Dynamic Transfer Learning for Low-Resource Neural Machine Translation
A Python package for studying neural learning
Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)
An Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection
CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay (CogSci 2024 Oral)
Code for Generative Deduplication For Socia Media Data Selection (Findings of EMNLP 2024)
This repository contains the data and code for the paper "Self-training with Two-phase Self-augmentation for Few-shot Dialogue Generation" (EMNLP2022-Findings).
Official Repository for the Paper: Chasing Random: Instruction Selection Strategies Fail to Generalize
Quilt: Robust Data Segment Selection against Concept Drifts (AAAI 2024)
Add a description, image, and links to the data-selection topic page so that developers can more easily learn about it.
To associate your repository with the data-selection topic, visit your repo's landing page and select "manage topics."