#

multimodality

Here are 102 public repositories matching this topic...

lucidrains / big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

deep-learning artificial-intelligence multimodality generative-adversarial-networks text-to-image

Updated Feb 6, 2022
Python

BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Sep 22, 2025
Python

PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems

collaborative-filtering matrix-factorization recommendation-system recommendation-engine recommender-system recommendation-algorithms multimodality multimodal-learning

Updated Apr 26, 2025
Python

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

search retrieval ranking clip multimodality multimodal-learning multimodal activitynet retrieval-model msvd msrvtt video-text-retrieval lsmdc didemo video-clip-retrieval

Updated Apr 12, 2024
Python

FEDOT

aimclub / FEDOT

Automated modeling and machine learning framework FEDOT

machine-learning automation genetic-programming hyperparameter-optimization evolutionary-algorithms multimodality automl automated-machine-learning parameter-tuning structural-learning fedot

Updated Aug 26, 2025
Python

VITA-MLLM / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Dec 23, 2024
Python

LLM2CLIP

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip multimodality fundation-models

Updated Jul 1, 2025
Python

jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

computer-vision gpt roi multimodality llm

Updated Jun 3, 2025
Python

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 19, 2025
Python

zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language x-vlm

Updated Nov 25, 2022
Python

afiaka87 / clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

deep-learning artificial-intelligence openai image-generation multimodality text-to-image diffusion multimodal text-to-image-synthesis openai-clip

Updated Feb 8, 2022
Python

kyegomez / Med-PaLM

Towards Generalist Biomedical AI

opensource deep-learning multimodality biomedical multimodal multimodal-deep-learning gpt4

Updated Feb 17, 2024
Python

fonduer

HazyResearch / fonduer

A knowledge base construction engine for richly formatted data

machine-learning multimodality knowledge-base-construction

Updated Jun 23, 2021
Python

OmicsML / dance

DANCE: a deep learning library and benchmark platform for single-cell analysis

python data-science benchmark machine-learning bioinformatics deep-learning computational-biology dance single-cell multimodality single-cell-rna-seq graph-neural-networks spatial-transcriptomics single-cell-rna-sequencing

Updated Sep 16, 2025
Python

UCSC-VLAA / MedTrinity-25M

[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“

dataset multimodality mllms

Updated Jul 11, 2025
Python

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

video localization caption alignment segmentation coin multimodality joint multimodal-sentiment-analysis pretrain pretraining msrvtt video-text-retrieval video-text video-language youcookii retrieval-task caption-task

Updated Jul 25, 2024
Python

kyegomez / CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention multimodality attention-is-all-you-need multimodal-learning multimodal imagegeneration dalle

Updated Dec 15, 2023
Python

soujanyaporia / multimodal-sentiment-analysis

Attention-based multimodal fusion for sentiment analysis

natural-language-processing sentiment-analysis tensorflow lstm attention attention-mechanism multimodality dialogue-systems sentiment-classification conversational-agents

Updated Apr 8, 2024
Python

kyegomez / NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

vit attention-mechanism clip multimodality multimodal-learning multimodal multimodal-deep-learning gpt4

Updated Sep 8, 2025
Python

Improve this page

Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."