-
Notifications
You must be signed in to change notification settings - Fork 88
Enable STT for #315 #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable STT for #315 #330
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,83 @@ | ||||||||||||||||||||||
| import rclpy | ||||||||||||||||||||||
| from rclpy.node import Node | ||||||||||||||||||||||
| from std_msgs.msg import String | ||||||||||||||||||||||
| import ffmpeg | ||||||||||||||||||||||
| import numpy as np | ||||||||||||||||||||||
| from pydub import AudioSegment | ||||||||||||||||||||||
| from openvino.runtime import Core | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| class AudioProcessorNode(Node): | ||||||||||||||||||||||
| def __init__(self): | ||||||||||||||||||||||
| super().__init__('audio_processor_node') | ||||||||||||||||||||||
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | ||||||||||||||||||||||
| self.ie = Core() | ||||||||||||||||||||||
| # Load the converted OpenVINO model | ||||||||||||||||||||||
| # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml') | ||||||||||||||||||||||
| self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml') | ||||||||||||||||||||||
|
Comment on lines
+12
to
+16
|
||||||||||||||||||||||
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | |
| self.ie = Core() | |
| # Load the converted OpenVINO model | |
| # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml') | |
| self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml') | |
| self.declare_parameter('model_path', 'wav2vec2-base/wav2vec2-base.xml') | |
| model_path = self.get_parameter('model_path').get_parameter_value().string_value | |
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | |
| self.ie = Core() | |
| self.model = self.ie.read_model(model=model_path) |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Division by zero will occur if audio_data contains only zeros. Add a check to prevent division by zero.
| audio_data = audio_data / np.max(np.abs(audio_data)) | |
| max_val = np.max(np.abs(audio_data)) | |
| if max_val == 0: | |
| self.get_logger().warning("Audio data contains only zeros; skipping normalization to avoid division by zero.") | |
| else: | |
| audio_data = audio_data / max_val |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The postprocess_result method returns a placeholder string instead of implementing actual text conversion logic. This should be implemented to properly decode the model output.
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded absolute path in main function makes the code non-portable. Consider making this configurable or removing the hard-coded test call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is missing a docstring. Add a docstring to describe the purpose of this audio processing node and its functionality.