MTEB: Massive Text Embedding Benchmark
-
Updated
Sep 29, 2025 - Python
MTEB: Massive Text Embedding Benchmark
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
SGPT: GPT Sentence Embeddings for Semantic Search
Generative Representational Instruction Tuning
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
Codebase for RetroMAE and beyond.
Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Efficient LLM inference on Slurm clusters using vLLM.
Go module for fetching embeddings from embeddings providers
a vector embedding database with multiple storage engines and AI embedding integrations
Simple script to compute CLIP-based scores given a DALL-e trained model.
Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs
A text embedding viewer for the Jupyter environment
Official codebase for the ACL 2025 Findings paper: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval.
Perform topic classification on news articles in several limited-labeled data regimes.
Code for embedding and retrieval research.
Simple script to re-rank images using OpenAI's CLIP https://github.com/openai/CLIP.
Topic Embedding, Text Generation and Modeling using diffusion
🧠 ML-Article-Classifier is a modular Python project for classifying articles using advanced NLP techniques. It features sentence embeddings, clustering, and classification utilities, with Jupyter notebook demos, extensible helper functions, and best practices for research and production use.
SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
Add a description, image, and links to the text-embedding topic page so that developers can more easily learn about it.
To associate your repository with the text-embedding topic, visit your repo's landing page and select "manage topics."