Enable STT for #315 #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

ynakaga wants to merge 1 commit into intel:master from jb-balaji:ros2_stt_for_315

ynakaga commented Mar 14, 2025

No description provided.


          Enable STT for intel#315

86fa9cc

jb-balaji requested a review from Copilot

October 7, 2025 13:56

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull Request Overview

This PR adds a new ROS2 audio processing node that enables Speech-to-Text (STT) functionality using OpenVINO. The implementation processes audio files (MP4 and WAV) and converts them to text using a wav2vec2 model.

Adds complete ROS2 package structure with setup files and linting tests
Implements audio processor node with OpenVINO-based STT processing
Supports multiple audio formats with preprocessing pipeline

Reviewed Changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
audio_processor/audio_processor/audio_processor_node.py	Main STT processing node with OpenVINO integration
audio_processor/setup.py	Package configuration with dependencies and entry points
audio_processor/package.xml	ROS2 package metadata with build and runtime dependencies
audio_processor/setup.cfg	Package installation configuration
audio_processor/test/test_pep257.py	PEP 257 docstring compliance test
audio_processor/test/test_flake8.py	Code style linting test
audio_processor/test/test_copyright.py	Copyright header compliance test

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

audio_processor/audio_processor/audio_processor_node.py

+              from pydub import AudioSegment
+              from openvino.runtime import Core
+              class AudioProcessorNode(Node):

Copilot AI Oct 7, 2025

The class is missing a docstring. Add a docstring to describe the purpose of this audio processing node and its functionality.

Suggested change

      
            class AudioProcessorNode(Node):
          
            class AudioProcessorNode(Node):
          
                """
          
                A ROS2 node for audio processing that loads an OpenVINO speech-to-text model,
          
                processes audio files (WAV or MP4), performs inference, and publishes the
          
                transcribed text to a ROS2 topic.
          
                Functionality includes:
          
                    - Extracting audio from MP4 or reading WAV files.
          
                    - Preprocessing audio for model input.
          
                    - Running inference using OpenVINO.
          
                    - Postprocessing model output to text.
          
                    - Publishing results to the 'stt_output' topic.
          
                """

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

Comment on lines +12 to +16

+                      self.publisher_ = self.create_publisher(String, 'stt_output', 10)
+                      self.ie = Core()
+                      # Load the converted OpenVINO model
+                      # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml')
+                      self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml')

Copilot AI Oct 7, 2025

Hard-coded absolute path makes the code non-portable. Consider using a relative path or making this configurable through a ROS parameter.

Suggested change

      
                    self.publisher_ = self.create_publisher(String, 'stt_output', 10)
          
                    self.ie = Core()
          
                    # Load the converted OpenVINO model
          
                    # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml')
          
                    self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml')
          
                    self.declare_parameter('model_path', 'wav2vec2-base/wav2vec2-base.xml')
          
                    model_path = self.get_parameter('model_path').get_parameter_value().string_value
          
                    self.publisher_ = self.create_publisher(String, 'stt_output', 10)
          
                    self.ie = Core()
          
                    self.model = self.ie.read_model(model=model_path)

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

+                  def preprocess_audio(self, audio_data):
+                      # Normalize audio data
+                      audio_data = audio_data / np.max(np.abs(audio_data))

Copilot AI Oct 7, 2025

Division by zero will occur if audio_data contains only zeros. Add a check to prevent division by zero.

Suggested change

      
                    audio_data = audio_data / np.max(np.abs(audio_data))
          
                    max_val = np.max(np.abs(audio_data))
          
                    if max_val == 0:
          
                        self.get_logger().warning("Audio data contains only zeros; skipping normalization to avoid division by zero.")
          
                    else:
          
                        audio_data = audio_data / max_val

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

+                  def postprocess_result(self, result):
+                      # Implement postprocessing logic to convert model output to text
+                      return "example text"

Copilot AI Oct 7, 2025

The postprocess_result method returns a placeholder string instead of implementing actual text conversion logic. This should be implemented to properly decode the model output.

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

+                  rclpy.init(args=args)
+                  node = AudioProcessorNode()
+                  # Example: Process an audio file
+                  node.process_audio_file('/root/ros2_ws/audio_processor/audio_processor/1089-134686-0001.wav')

Copilot AI Oct 7, 2025

Hard-coded absolute path in main function makes the code non-portable. Consider making this configurable or removing the hard-coded test call.

Copilot uses AI. Check for mistakes.

audio_processor/setup.py

+                  maintainer='yuki',
+                  maintainer_email='[email protected]',
+                  description='Audio processing node for STT using OpenVINO',
+                  license='License declaration',

Copilot AI Oct 7, 2025

Generic license declaration should be replaced with the actual license name (e.g., 'Apache-2.0' to match package.xml).

Suggested change

      
                license='License declaration',
          
                license='Apache-2.0',

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet