Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
-
Updated
Jan 30, 2023 - Python
Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
Code repo for the paper "AutoGO: Automated Computation Graph Optimization for Neural Network Evolution", accepted to NeurIPS 2023.
Byte-Pair Encoding tokenizer for training large language models on huge datasets
Генерация новостных заголовков
Byte Pair Encoding (BPE)
Code for the publication of WWW'22
Modern Eager TensorFlow implementation of Attention Is All You Need
Transformer implementation in pytorch trained on NVIDIA A100 in fp16
Order-agnostic lossless compressor using BPE and Huffman Coding.
an efficient ranked retrieval system for English corpora, optimised with VBE and BPE.
Byte-pair encoding implementation in Python.
Decoder-only LLM trained on the Harry Potter books.
A byte-level Byte Pair Encoding (BPE) algorithm for tokenization in Large Language Models (LLMs), similar to those used in GPT, Llama, and Mistral.
🖋️ A sleek, BPE-powered tokenizer that understands the richness of Marathi.
This Legal Document Analyzer is a proof-of-concept NLP project demonstrating the potential of transformers for legal document summarization.
Add a description, image, and links to the byte-pair-encoding topic page so that developers can more easily learn about it.
To associate your repository with the byte-pair-encoding topic, visit your repo's landing page and select "manage topics."