Skip to content

Real-time video system for detecting and classifying people and pets, using enhanced MobileNetV2/V3 segmentation. Optimized for consumer hardware and developed in Python with PyTorch.

License

Notifications You must be signed in to change notification settings

dosqas/realtime-entity-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Real-Time Entity Classification System

A smart computer vision system that identifies and classifies people and pets in real-time using advanced deep learning techniques.

Project Demo
"Watch the app nail the purr-fect prediction β€” correctly spotting me and my friend’s cat, Felix!"


✨ Key Features

  • Four-Class Detection: Accurately identifies owner, pet, other person, and background classes
  • Adaptive Processing: Automatically switches between classification and segmentation for improved accuracy
  • Real-Time Performance: ~33 FPS on consumer hardware (NVIDIA RTX 3050)
  • Privacy-Focused: All processing happens locally on your device
  • Interactive Controls: Toggle segmentation mode and visualize confidence scores
  • Memory Efficient: Optimized for resource-constrained environments

🧠 Technical Overview

This project combines transfer learning with efficient model deployment to create a responsive computer vision system that runs smoothly on mid-range hardware:

  • Base Architecture: MobileNetV2 (finetuned from ImageNet weights)
  • Enhancement: LRASPP MobileNetV3 segmentation model for challenging cases
  • Confidence Threshold: Auto-switching between models at 0.7 confidence level
  • Training Method: Transfer learning with frozen feature extraction layers
  • Performance: 99.4% accuracy in ideal conditions, 84.2% in low light

πŸ“Š Model Specifications

Attribute Value
Architecture MobileNetV2 (finetuned)
Input Resolution 224x224 (resized from 640x480)
Output Classes ['owner', 'pet', 'other person', 'background']
Model Format .pth
Model Size ~10 MB (quantized)
Inference Speed 33 FPS @ 640x480
Hardware Tested NVIDIA RTX 3050, CUDA 11.8
Framework PyTorch 3.13.2

πŸš€ Getting Started

Prerequisites

# Clone the repository
git clone https://github.com/dosqas/Realtime-Entity-Classifier.git
cd Realtime-Entity-Classifier

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running the Classifier

python src/realtime_classifier.py

Controls

  • Press s to toggle forced segmentation mode
  • Press q to quit

πŸ—οΈ Architecture Details

Classification Model (MobileNetV2)

The system uses a modified MobileNetV2 architecture with:

model.classifier[1] = nn.Sequential(
    nn.Linear(model.classifier[1].in_features, 256),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(256, 4)
)
  • Intermediate Layer (256 neurons): Enhanced representational capacity
  • ReLU Activation: Efficient non-linearity
  • Dropout (0.2): Prevents overfitting
  • Xavier Initialization: Improves convergence speed

Training Configuration

  • Optimizer: Adam with selective training
  • Learning Rate: 5e-5
  • Weight Decay: 1e-5
  • Loss Function: CrossEntropyLoss with label smoothing (0.1)
  • Epochs: 10 (converged early)

Segmentation Enhancement

When classification confidence drops below threshold:

  1. An LRASPP MobileNetV3 segmentation model identifies people and pets
  2. Segmentation mask isolates the subject from the background
  3. Classification is re-run on the masked input
  4. System returns to normal mode after confidence improves

πŸ“ˆ Performance Metrics

Training Progress

Epoch Loss Accuracy Ξ” Accuracy
1 0.1600 95.01% +0%
2 0.0404 98.74% +3.73%
3 0.0293 99.11% +0.37%
4 0.0238 99.17% +0.06%
5 0.0209 99.34% +0.17%
6 0.0217 99.26% -0.08%
7 0.0194 99.35% +0.09%
8 0.0187 99.35% +0.00%
9 0.0158 99.48% +0.13%
10 0.0153 99.44% -0.04%

Dataset Overview

  • Total Samples: 34,575
  • Class Distribution:
    • owner: 8,750 samples (25.3%)
      • Sourced from a 2:20 min video of myself walking around the house in varied lighting conditions, angles and backgrounds.
    • pet: 4,575 samples (13.2%)
      • Extracted from a 30-second video of my friend Bogdan’s cat, Felix.
    • other person: 12,500 samples (36.2%)
    • background: 8,750 samples (25.3%)
      • Captured from a 30-second video of walking around the house with no subject in focus.

πŸ§ͺ Known Limitations

  1. Pet Detection:

    • Accuracy drops when <30% of the pet's body is visible
    • Low lighting reduces confidence by ~40%
  2. Person Identification:

    • Needs β‰₯92% confidence to reliably classify "owner" vs "other person"
    • False positives with reflections (mirrors, glass)
    • May struggle with diverse "other person" examples

πŸ”§ Customization

Setting Custom Confidence Threshold

# In realtime_classifier.py
CONFIDENCE_THRESHOLD = 0.7  # Default

Toggling Pet Detection

# In realtime_classifier.py
PET_MASK_ENABLED = True  # Set to False to disable generic pet detection

πŸ“‚ Project Structure

realtime-entity-classifier/
β”œβ”€β”€ demo/
β”‚   └── project_demo.gif             # Project demo GIF
β”œβ”€β”€ data/                            # Dataset used for training and evaluation
β”‚   β”œβ”€β”€ owner/                       # Images and optional video of the owner
β”‚   β”‚   β”œβ”€β”€ images/                  # Folder containing image samples
β”‚   β”‚   └── owner.mp4 (optional)     # Optional video for data generation
β”‚   β”œβ”€β”€ pet/                         # Images and optional video of pets (e.g., cat, dog)
β”‚   β”‚   β”œβ”€β”€ images/
β”‚   β”‚   └── pet.mp4 (optional)
β”‚   β”œβ”€β”€ other_people/               # Images and optional video of non-owners
β”‚   β”‚   β”œβ”€β”€ images/
β”‚   β”‚   └── other_people.mp4 (optional)
β”‚   └── background/                 # Background-only scenes
β”‚       β”œβ”€β”€ images/
β”‚       └── background.mp4 (optional)
β”œβ”€β”€ models/                          # Trained model weights
β”‚   └── entity_classifier.pth        # Main classifier model
β”œβ”€β”€ notebooks/                       # Jupyter notebooks
β”‚   └── classifier_build_and_train.ipynb
β”œβ”€β”€ reports/                         # Reports and visualizations
β”‚   β”œβ”€β”€ TEST_RESULTS.md              # Full test performance summary
β”‚   └── training_plots/
β”‚       └── mobilenetv2_4class_finetune_20250420.jpg  # Training progress plot
β”œβ”€β”€ src/
β”‚   └── realtime_classifier.py       # Main application script
β”œβ”€β”€ requirements.txt                 # Python dependencies
└── README.md                        # Project overview and usage guide

πŸ™ Acknowledgments

  • PyTorch – for the powerful and flexible deep learning framework
  • TorchVision – for pre-trained models and helpful computer vision utilities
  • OpenCV – for enabling efficient image and video processing
  • Human Faces Dataset (Kaggle) – used for training on diverse human faces for the "other person" class
  • My friend Bogdan and his cat Felix - for helping me with data to train the model for the "pet" class

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.


πŸ’‘ Contact

Questions, feedback, or ideas? Reach out anytime at [email protected].

About

Real-time video system for detecting and classifying people and pets, using enhanced MobileNetV2/V3 segmentation. Optimized for consumer hardware and developed in Python with PyTorch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published