ESP32-S3 WebRTC OpenAI Real-Time Assistant

An ESP32-S3/P4 based AI assistant that uses WebRTC for real-time audio/video communication with OpenAI's GPT models. This project transforms your ESP32-S3 camera board into an intelligent assistant with voice interaction and visual analysis capabilities.

Features

Real-Time Voice Interaction: Seamless bidirectional audio communication with OpenAI's real-time API
Vision Capabilities: Camera integration for visual scene analysis and object recognition
WebRTC Protocol: Low-latency, high-quality audio/video streaming using industry-standard WebRTC
Multi-Board Support: Compatible with multiple ESP32-S3/P4 development boards
Audio Feedback: Custom sound effects for system events (mic and player)
Console Interface: Interactive command-line interface for configuration and control
Memory Management: Advanced memory monitoring and optimization for stable operation
Modular Architecture: Clean component-based design for easy customization

Supported Hardware

Development Boards

Board	Flash	PSRAM	Description
FREENOVE ESP32-S3 WROOM	8MB	Octal SPI	High-performance board with fast PSRAM
ESP32-S3-N16R8-CAM	16MB	Quad SPI	Large flash storage for extended functionality

Required Peripherals

OV2640 or OV5640 camera module
I2S audio codec (MAX98357 or similar)
Microphone and speaker (INMP441 or similar)
USB cable for programming and debugging

Quick Start

Prerequisites

ESP-IDF v5.4+: Install the Espressif IoT Development Framework

# Follow instructions at https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/get-started/

OpenAI API Key: Obtain from OpenAI Platform
Git with Submodules Support: Required for dependencies

Installation

Clone the repository with submodules:

git clone --recursive https://github.com/keryc/esp32-webrtc-openai.git
cd esp32-webrtc-openai

Set up environment variables:

export OPENAI_API_KEY="your-api-key-here"

Configure WiFi credentials:

# Edit the default configuration
# Update CONFIG_AG_WIFI_SSID and CONFIG_AG_WIFI_PASSWORD
nano sdkconfig.defaults

Building and Flashing

This project uses an advanced Makefile for simplified build management:

Basic Commands

# Build for FREENOVE board (default)
make build

# Build for N16R8-CAM board
make build BOARD=n16r8cam

# Flash to device (auto-detects port)
make flash

# Monitor serial output
make monitor

# Complete cycle: Build + Flash + Monitor
make all BOARD=freenove

Production Build

# Build with production optimizations
make prod BOARD=n16r8cam

# Flash production build
make flash

Advanced Usage

# Specify custom port
make flash PORT=/dev/ttyUSB0

# Open configuration menu
make menuconfig

# Clean build files
make clean

# Erase entire flash
make erase

# Show size analysis
make size

# List available ports
make ports

Configuration

Default Settings

The project includes sensible defaults in sdkconfig.defaults:

WiFi: Configure SSID and password
Audio: Volume levels and microphone gain
Vision: Frame rate, quality, and buffer settings
WebRTC: Opus codec support and data channels
OpenAI: Model selection and voice settings

Board-Specific Configuration

Each board has its own configuration file:

sdkconfig.defaults.esp32s3.freenove - FREENOVE board settings
sdkconfig.defaults.esp32s3.n16r8cam - N16R8-CAM board settings

Audio Volume Configuration

Configure audio volume levels in sdkconfig.defaults:

# Speaker/Playback Volume (0-100)
CONFIG_AG_AUDIO_DEFAULT_PLAYBACK_VOL=95

# Microphone Gain (0-100)
CONFIG_AG_AUDIO_DEFAULT_MIC_GAIN=95

You can also adjust these at runtime using console commands:

audio volume <0-100> - Set speaker volume
audio gain <0-100> - Set microphone gain

OpenAI Settings

Configure the AI assistant behavior:

Model: gpt-realtime (default)
Voice: ash (customizable)
Tool Choice: auto (function calling)
Vision: Automatic scene analysis with look_around function

Customizing AI Prompts

You can customize the AI assistant's personality and behavior by editing components/webrtc/prompts.h:

// Audio + Vision mode prompt  
#define INSTRUCTIONS_AUDIO_VISION "You are Jarvis, a AI resident in your wearer's glasses (mode audio+vision). You can see using the function look_around."

// Vision function configuration
#define VISION_FUNCTION_NAME "look_around"
#define VISION_FUNCTION_DESCRIPTION "Gets a detailed description of the user's current field of view."

Modify these prompts to:

Change the AI assistant's personality (e.g., from "Jarvis" to your custom assistant)
Adjust the interaction style and tone
Add specific instructions or constraints
Customize the vision analysis behavior

Project Structure

esp32-webrtc-openai/
├── main/                      # Main application
│   └── main.c                 # Entry point and initialization
├── components/                # Modular components
│   ├── audio/                 # Audio processing and feedback
│   ├── system/                # System utilities and console
│   ├── vision/                # Camera and image processing
│   ├── webrtc/                # WebRTC and OpenAI integration
│   └── wifi/                  # WiFi management
├── third_party/               # External dependencies
│   └── esp-webrtc-solution/  # WebRTC implementation (submodule)
├── spiffs/                    # SPIFFS filesystem
│   └── sounds/                # Audio feedback files
├── Makefile                   # Advanced build system
├── CMakeLists.txt            # CMake configuration
├── partitions.csv            # Flash partition table
└── sdkconfig.defaults*       # Configuration files

Console Commands

The interactive console provides various commands for runtime control:

WiFi Commands

wifi connect <ssid> <password> - Connect to WiFi network
wifi disconnect - Disconnect from WiFi
wifi status - Show connection status
wifi scan - Scan for available networks

WebRTC Commands

webrtc start - Start WebRTC session
webrtc stop - Stop WebRTC session
webrtc status - Show WebRTC status
webrtc send <message> - Send text message

Camera Commands

cam start - Start camera stream
cam stop - Stop camera stream
cam capture - Capture single frame
cam preview - Start HTTP preview server

Audio Commands

audio volume <0-100> - Set speaker volume
audio gain <0-100> - Set microphone gain
audio mute - Mute microphone
audio unmute - Unmute microphone

System Commands

sys info - Show system information
sys mem - Display memory usage
sys tasks - List running tasks
sys restart - Restart the device

Dependencies

This project relies on:

esp-webrtc-solution: Fork of Espressif's WebRTC implementation (included as submodule)
ESP-IDF v5.4+: Espressif IoT Development Framework
esp32-camera: Camera driver component
OpenAI Real-Time API: For AI processing

Adding Support for New Boards

To add support for a new ESP32-S3/P4 board, follow these steps:

1. Create Board Configuration File

Create a new sdkconfig file for your board:

# Create board-specific configuration
nano sdkconfig.defaults.[IDF_TARGET].[YOUR-BOARD]

Add board-specific settings:

# Board Identification
CONFIG_AG_SYSTEM_BOARD_NAME="YOUR_BOARD_NAME"

# Flash Configuration
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
CONFIG_ESPTOOLPY_FLASHSIZE_16MB=y

# SPIRAM Configuration (Quad or Octal)
CONFIG_SPIRAM_MODE_QUAD=y  # or CONFIG_SPIRAM_MODE_OCT=y
CONFIG_SPIRAM_SPEED_80M=y

2. Update Codec Board Configuration

Edit third_party/esp-webrtc-solution/components/codec_board/board_cfg.txt:

Board: YOUR_BOARD_NAME
i2c: {sda: 17, scl: 18}
i2s: {mclk: 16, bclk: 9, ws: 45, din: 10, dout: 8}
out: {codec: MAX98357, pa: 48, pa_gain: 6, use_mclk: 0}
in: {codec: INMP441}
camera: {type: dvp, xclk: 40, pclk: 11, vsync: 21, href: 38, 
         d0: 13, d1: 47, d2: 14, d3: 3, d4: 12, d5: 42, d6: 41, d7: 39}

Adjust pin assignments according to your board's schematic.

3. Update Makefile

Add your board to the Makefile:

# Add to VALID_BOARDS list (line 30)
VALID_BOARDS := freenove n16r8cam yourboard

# Add board configuration mapping (after line 42)
else ifeq ($(BOARD),yourboard)
    BOARD_CONFIG := yourboard
    BOARD_FULL_NAME := YOUR_BOARD_FULL_NAME

5. Test Your Configuration

# Build for your new board
make build BOARD=yourboard

# Flash and monitor
make all BOARD=yourboard

Memory Optimization

The project includes aggressive memory optimization for stable operation:

Dynamic buffer allocation
Reduced WiFi and LWIP buffers
Optimized task stack sizes
Runtime memory monitoring
Automatic garbage collection

Audio Feedback

Custom audio feedback is provided through SPIFFS:

starting.wav - Played when system initializes
Additional sounds can be added to /spiffs/sounds/

Security Considerations

API Key: Store OpenAI API key as environment variable only
WiFi Credentials: Use NVS encryption for production
HTTPS: All OpenAI communication uses TLS
WebRTC: DTLS-SRTP encryption for media streams

Troubleshooting

Common Issues

Build Fails: Ensure ESP-IDF v5.4+ is installed and activated
Submodule Missing: Run git submodule update --init --recursive
Port Not Found: Use make ports to list available serial ports
Memory Issues: Reduce video quality or frame rate in configuration
WiFi Connection: Check credentials and signal strength

Debug Mode

Enable debug logging:

make menuconfig
# Navigate to Component config → Log output → Default log verbosity → Debug

Contributing

Contributions are welcome! Please follow these guidelines:

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Espressif Systems for ESP-IDF and WebRTC implementation
OpenAI for the real-time API
Community Contributors for testing and feedback

Contact

For questions and support, please open an issue on GitHub.

Note: This project depends on the forked esp-webrtc-solution repository as a submodule, which contains necessary WebRTC components not yet available in the ESP-IDF Component Registry.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
components		components
main		main
spiffs/sounds		spiffs/sounds
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
partitions.csv		partitions.csv
partitions_16mb.csv		partitions_16mb.csv
sdkconfig.defaults		sdkconfig.defaults
sdkconfig.defaults.esp32s3		sdkconfig.defaults.esp32s3
sdkconfig.defaults.esp32s3.freenove		sdkconfig.defaults.esp32s3.freenove
sdkconfig.defaults.esp32s3.n16r8cam		sdkconfig.defaults.esp32s3.n16r8cam
sdkconfig.production		sdkconfig.production

License

keryc/esp32-webrtc-openai

Folders and files

Latest commit

History

Repository files navigation

ESP32-S3 WebRTC OpenAI Real-Time Assistant

Features

Supported Hardware

Development Boards

Required Peripherals

Quick Start

Prerequisites

Installation

Building and Flashing

Basic Commands

Production Build

Advanced Usage

Configuration

Default Settings

Board-Specific Configuration

Audio Volume Configuration

OpenAI Settings

Customizing AI Prompts

Project Structure

Console Commands

WiFi Commands

WebRTC Commands

Camera Commands

Audio Commands

System Commands

Dependencies

Adding Support for New Boards

1. Create Board Configuration File

2. Update Codec Board Configuration

3. Update Makefile

5. Test Your Configuration

Memory Optimization

Audio Feedback

Security Considerations

Troubleshooting

Common Issues

Debug Mode

Contributing

License

Acknowledgments

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages