-
Notifications
You must be signed in to change notification settings - Fork 88
Enable STT for #315 #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable STT for #315 #330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new ROS2 audio processing node that enables Speech-to-Text (STT) functionality using OpenVINO. The implementation processes audio files (MP4 and WAV) and converts them to text using a wav2vec2 model.
- Adds complete ROS2 package structure with setup files and linting tests
- Implements audio processor node with OpenVINO-based STT processing
- Supports multiple audio formats with preprocessing pipeline
Reviewed Changes
Copilot reviewed 7 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| audio_processor/audio_processor/audio_processor_node.py | Main STT processing node with OpenVINO integration |
| audio_processor/setup.py | Package configuration with dependencies and entry points |
| audio_processor/package.xml | ROS2 package metadata with build and runtime dependencies |
| audio_processor/setup.cfg | Package installation configuration |
| audio_processor/test/test_pep257.py | PEP 257 docstring compliance test |
| audio_processor/test/test_flake8.py | Code style linting test |
| audio_processor/test/test_copyright.py | Copyright header compliance test |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| from pydub import AudioSegment | ||
| from openvino.runtime import Core | ||
|
|
||
| class AudioProcessorNode(Node): |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is missing a docstring. Add a docstring to describe the purpose of this audio processing node and its functionality.
| class AudioProcessorNode(Node): | |
| class AudioProcessorNode(Node): | |
| """ | |
| A ROS2 node for audio processing that loads an OpenVINO speech-to-text model, | |
| processes audio files (WAV or MP4), performs inference, and publishes the | |
| transcribed text to a ROS2 topic. | |
| Functionality includes: | |
| - Extracting audio from MP4 or reading WAV files. | |
| - Preprocessing audio for model input. | |
| - Running inference using OpenVINO. | |
| - Postprocessing model output to text. | |
| - Publishing results to the 'stt_output' topic. | |
| """ |
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | ||
| self.ie = Core() | ||
| # Load the converted OpenVINO model | ||
| # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml') | ||
| self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml') |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded absolute path makes the code non-portable. Consider using a relative path or making this configurable through a ROS parameter.
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | |
| self.ie = Core() | |
| # Load the converted OpenVINO model | |
| # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml') | |
| self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml') | |
| self.declare_parameter('model_path', 'wav2vec2-base/wav2vec2-base.xml') | |
| model_path = self.get_parameter('model_path').get_parameter_value().string_value | |
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | |
| self.ie = Core() | |
| self.model = self.ie.read_model(model=model_path) |
|
|
||
| def preprocess_audio(self, audio_data): | ||
| # Normalize audio data | ||
| audio_data = audio_data / np.max(np.abs(audio_data)) |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Division by zero will occur if audio_data contains only zeros. Add a check to prevent division by zero.
| audio_data = audio_data / np.max(np.abs(audio_data)) | |
| max_val = np.max(np.abs(audio_data)) | |
| if max_val == 0: | |
| self.get_logger().warning("Audio data contains only zeros; skipping normalization to avoid division by zero.") | |
| else: | |
| audio_data = audio_data / max_val |
|
|
||
| def postprocess_result(self, result): | ||
| # Implement postprocessing logic to convert model output to text | ||
| return "example text" |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The postprocess_result method returns a placeholder string instead of implementing actual text conversion logic. This should be implemented to properly decode the model output.
| rclpy.init(args=args) | ||
| node = AudioProcessorNode() | ||
| # Example: Process an audio file | ||
| node.process_audio_file('/root/ros2_ws/audio_processor/audio_processor/1089-134686-0001.wav') |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded absolute path in main function makes the code non-portable. Consider making this configurable or removing the hard-coded test call.
| maintainer='yuki', | ||
| maintainer_email='[email protected]', | ||
| description='Audio processing node for STT using OpenVINO', | ||
| license='License declaration', |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic license declaration should be replaced with the actual license name (e.g., 'Apache-2.0' to match package.xml).
| license='License declaration', | |
| license='Apache-2.0', |
No description provided.