TrialMatchAI - Clinical Trial Matching System

A comprehensive healthcare application that matches cancer patients to relevant clinical trials using advanced natural language processing and intelligent scoring algorithms. Designed for healthcare professionals, researchers, and medical institutions.

Project Overview

TrialMatchAI combines advanced medical natural language processing with intelligent scoring algorithms to match cancer patients with relevant clinical trials. The application processes natural language patient descriptions, extracts medical entities using biomedical language models, and matches them against a comprehensive database of cancer clinical trials with confidence scoring.

Key Features

Medical NLP: Uses PubMedBERT and rule-based extraction for medical entity recognition
Intelligent Matching: Weighted scoring system prioritizing conditions, demographics, and trial phases
Confidence Scoring: Provides match confidence percentages and explanations
Analytics Dashboard: Real-time charts and visualizations of matching results
Professional Interface: Modern, responsive design for healthcare workflows

Enhanced Features

Medical Entity Extraction

Named Entity Recognition: Automatically identifies medical conditions, demographics, treatments, and lab values
Categorized Entities: Groups entities into logical categories for better matching
Fallback System: Robust rule-based extraction when advanced models aren't available
Entity Visualization: Interactive display of extracted medical entities

Intelligent Trial Matching

Weighted Scoring: Prioritizes critical fields (conditions, age, sex) over secondary criteria
Confidence Metrics: Provides match confidence percentages (0-100%)
Match Explanations: Human-readable explanations for why trials matched
Fuzzy Matching: Handles variations in medical terminology

Analytics Dashboard

Confidence Distribution: Histogram showing match quality distribution
Condition Analysis: Bar charts of most common conditions in matches
Real-time Metrics: Live statistics on total matches, average confidence, and recruiting trials
Export Functionality: Download results as CSV for further analysis

User Experience

Sample Cases: Pre-loaded patient scenarios for quick testing
Interactive Interface: Modern, responsive design with custom styling
Error Handling: Comprehensive error handling with informative messages
Help System: Built-in guidance for optimal usage

Technical Stack

Python 3.8+ - Core programming language
Streamlit - Modern web application framework
Transformers - Hugging Face library for medical NLP models
Plotly - Interactive data visualization
Pandas - Advanced data manipulation and analysis
scikit-learn - Machine learning utilities
PubMedBERT - Biomedical language model for medical entity extraction

Quick Start

Live Demo

Streamlit Cloud Deployment Coming Soon

Local Installation

# Clone the repository
git clone https://github.com/padg9912/TrialMatchAI.git
cd TrialMatchAI

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py

The app will be available at http://localhost:8501

Usage Guide

Basic Usage

Enter Patient Data: Use the text area to describe patient demographics and medical conditions
Choose Sample Cases: Select from pre-loaded patient scenarios or enter custom data
Analyze Entities: Click "Analyze Entities" to see what medical terms were extracted
Find Matches: Click "Find Matching Trials" to get ranked results with confidence scores
Review Results: Explore matching trials with detailed explanations and confidence metrics

Sample Patient Cases

The application includes pre-loaded sample cases:

Breast Cancer Patient: female, 45 years old, breast cancer, HER2 positive, no prior chemotherapy, non-smoker
Prostate Cancer Patient: male, 68, prostate cancer, stage II, hypertension, prior surgery
Pediatric Leukemia: male, 10, acute lymphoblastic leukemia, no CNS involvement, first relapse
Lung Cancer Patient: female, 62, lung cancer, smoker, stage III, prior radiation therapy
Advanced Melanoma: male, 55, metastatic melanoma, BRAF positive, immunotherapy naive

Tips for Best Results

Include age, gender, and specific cancer type
Mention cancer stage and biomarker status (HER2, EGFR, etc.)
Include treatment history and smoking status
Be specific about geographic preferences if relevant

Features in Detail

Medical Entity Extraction

Medical NER: Automatically identifies medical conditions, demographics, treatments, and lab values
Categorization: Groups entities into logical categories for better matching
Visualization: Shows extracted entities in an organized, color-coded display

Intelligent Matching

Weighted Scoring: Conditions (3x), Demographics (2x), Trial Phases (1.5x), Other fields (1x or less)
Confidence Scoring: Provides match percentages and explanations
Fuzzy Matching: Handles medical terminology variations

Analytics Dashboard

Confidence Distribution: Visual histogram of match quality
Condition Analysis: Bar chart of most common conditions in matches
Real-time Metrics: Live statistics and KPIs
Export Options: Download results as CSV for further analysis

Architecture

TrialMatchAI/
├── app.py                 # Main Streamlit application
├── models/
│   └── nlp_model.py      # Medical entity extraction (PubMedBERT + rules)
├── utils/
│   └── matcher.py        # Intelligent trial matching algorithm
├── datasets/
│   └── cancer_studies.csv # Clinical trials database
├── requirements.txt      # Python dependencies
└── README.md            # This file

Key Components

MedicalEntityExtractor: Handles NLP with fallback to rule-based extraction
TrialMatcher: Implements weighted scoring and confidence calculation
Streamlit UI: Modern, responsive interface with analytics
Data Pipeline: Efficient CSV loading and caching

Performance & Accuracy

Matching Accuracy

High Confidence Matches: >70% confidence indicates strong alignment
Medium Confidence: 40-70% indicates good potential matches
Low Confidence: <40% indicates partial or weak matches

Performance Metrics

Entity Extraction: ~2-3 seconds for complex medical descriptions
Trial Matching: ~1-2 seconds for 1000+ trial database
UI Responsiveness: Real-time updates and smooth interactions

Future Roadmap

Completed (v2.0)

Advanced medical NLP with PubMedBERT integration
Intelligent weighted scoring system
Confidence metrics and explanations
Interactive analytics and visualizations
Professional UI with sample cases
Comprehensive error handling

In Development (v2.1)

Streamlit Cloud deployment
Real-time clinical trial data updates
Advanced eligibility criteria parsing
Integration with ClinicalTrials.gov API

Planned (v3.0)

EHR integration via FHIR API
Multi-language support
Mobile-optimized interface
User accounts and saved searches
Advanced reporting and analytics
Integration with hospital systems

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/your-feature)
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature/your-feature)
Create a Pull Request

License

MIT License

Acknowledgments

ClinicalTrials.gov for open clinical trial data
Streamlit and Hugging Face for open-source tools
Biomedical NLP research community for language models

Built for healthcare professionals, researchers, and medical institutions. Contributions and feedback welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.streamlit		.streamlit
data_sources		data_sources
datasets		datasets
models		models
utils		utils
COMPETITIVE_ANALYSIS.md		COMPETITIVE_ANALYSIS.md
README.md		README.md
app.py		app.py
clean_trials.py		clean_trials.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

certainly-param/TrialMatchAI

Folders and files

Latest commit

History

Repository files navigation

TrialMatchAI - Clinical Trial Matching System

Project Overview

Key Features

Enhanced Features

Medical Entity Extraction

Intelligent Trial Matching

Analytics Dashboard

User Experience

Technical Stack

Quick Start

Live Demo

Local Installation

Usage Guide

Basic Usage

Sample Patient Cases

Tips for Best Results

Features in Detail

Medical Entity Extraction

Intelligent Matching

Analytics Dashboard

Architecture

Key Components

Performance & Accuracy

Matching Accuracy

Performance Metrics

Future Roadmap

Completed (v2.0)

In Development (v2.1)

Planned (v3.0)

Contributing

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages