Skip to content

Conversation

@nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Mar 25, 2025

fix: log source format in stream info

rename as below -
OtelLogs -> otel-logs
OtelMetrics -> otel-metrics
OtelTraces -> otel-traces
Pmeta -> pmeta
Json -> json
Kinesis -> kinesis

added migration steps to perform the rename for existing streams

Summary by CodeRabbit

  • New Features

    • Enhanced log data serialization with standardized naming for improved compatibility.
    • Introduced a new function to modify log source format during metadata migration for better accuracy.
  • Tests

    • Added tests to verify the new log source naming and migration functionality.

rename as below -
OtelLogs -> otel-logs
OtelMetrics -> otel-metrics
OtelTraces -> otel-traces
Pmeta -> pmeta
Json -> json
Kinesis -> kinesis

added migration steps to perform the rename for existing streams
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2025

Walkthrough

The changes update the JSON serialization for the LogSource enum by adding #[serde(rename = "...")] attributes to its variants. Additionally, the migration process is modified to include a new function, rename_log_source_v6, which standardizes log source format keys in stream metadata. This function iterates over the log_source array, updating their log_source_format fields based on a predefined mapping, and corresponding tests have been added to validate the new functionality.

Changes

File(s) Change Summary
src/event/format/mod.rs Added #[serde(rename = "...")] attributes on variants of LogSource to explicitly map variants to JSON keys ("kinesis", "otel-logs", "otel-metrics", "otel-traces", "pmeta", "json").
src/migration/mod.rs Updated migrate_stream_metadata to invoke rename_log_source_v6 after existing version migrations, replacing the no-op default with the new function call to process unrecognized versions.
src/migration/stream_metadata_migration.rs Introduced the new public function rename_log_source_v6 that iterates over the log_source array, updating the log_source_format values using a defined mapping; added a new test (test_rename_log_source_v6) for this functionality.

Sequence Diagram(s)

sequenceDiagram
    participant M as migrate_stream_metadata
    participant V as Version Migration Functions
    participant R as rename_log_source_v6
    participant S as Storage
    M->>V: Run migration functions (v1_v4, v4_v5, v5_v6)
    V-->>M: Return updated metadata
    M->>R: Call rename_log_source_v6 on metadata
    R-->>M: Return metadata with updated log_source_format
    M->>S: Save the final metadata
Loading
sequenceDiagram
    participant RS as rename_log_source_v6
    participant MAP as FormatMapping
    participant L as LogSource Element
    RS->>MAP: Create mapping ("Kinesis" -> "kinesis", etc.)
    RS->>RS: Check for log_source field in metadata
    loop For each log_source entry
        RS->>L: Inspect log_source_format field
        alt Field value matches mapping
            RS->>L: Update to new format key
        else
            RS->>L: No changes made
        end
    end
    RS-->>Return: Return updated metadata
Loading

Suggested reviewers

  • de-sh

Poem

Hop, hop, the code's set free,
With keys renamed for all to see,
Migrations march with steady beat,
Log sources now so neat and sweet,
This bunny cheers—let's code, let's leap!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9da876 and 4dc304f.

📒 Files selected for processing (3)
  • src/event/format/mod.rs (1 hunks)
  • src/migration/mod.rs (3 hunks)
  • src/migration/stream_metadata_migration.rs (3 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/migration/mod.rs (2)
src/migration/stream_metadata_migration.rs (2)
  • rename_log_source_v6 (204-229)
  • v4_v5 (154-181)
src/migration/metadata_migration.rs (1)
  • v4_v5 (137-162)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (10)
src/migration/mod.rs (6)

249-250: Appropriate implementation of the log source format renaming in the migration flow.

The addition of the rename_log_source_v6 function call ensures that log source formats are standardized as part of the v1 migration path. This aligns with the PR objective of renaming log source formats consistently.


265-267: LGTM: Consistent application of the renaming function in v2 migration path.

The function call is correctly placed after the existing migration steps for v2 metadata.


281-283: LGTM: Format standardization properly integrated in v3 migration path.

Good placement of the function call after existing v3 migration steps.


291-293: LGTM: Consistent application of format renaming for v4 migration.

The function call maintains consistency with the pattern established in previous version migrations.


300-305: LGTM: Format standardization correctly applied to v5 metadata.

The changes follow the established pattern of applying the rename function after the version-specific migration.


306-312: LGTM: Implemented format standardization for unrecognized versions.

The default case now properly applies the format renaming to maintain consistency across all streams regardless of version.

src/migration/stream_metadata_migration.rs (3)

20-20: LGTM: Appropriate import added for HashMap usage.

The import of HashMap is necessary for the new rename_log_source_v6 function.


204-229: Well-structured migration function for standardizing log source formats.

This function properly implements the renaming of log source formats as specified in the PR objectives. The code:

  1. Creates a mapping between old and new format names
  2. Traverses the log_source array in the metadata
  3. Updates each log_source_format value according to the mapping

The implementation handles potential edge cases gracefully by checking if fields exist at each step.


299-305: LGTM: Comprehensive test for the new migration function.

The test effectively validates that all format names are properly transformed according to the mapping. It uses realistic test data that includes multiple log source entries with different formats.

src/event/format/mod.rs (1)

62-63: LGTM: Added appropriate serialization names for Kinesis, OtelLogs, Pmeta, and Json variants.

The added #[serde(rename = "...")] attributes ensure these enum variants are properly serialized with the standardized names.

Also applies to: 66-67, 77-81

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 25, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/migration/stream_metadata_migration.rs (3)

192-228: Consider reusing the new rename function to avoid code duplication.
The logic for mapping log-source strings and wrapping them as an array largely duplicates the new rename_log_source_v6 function. Unifying both can reduce repetition and potential maintenance overhead.

Below is an example approach to unify logic (replace lines 192–228). This retains your single-string check, then leverages rename_log_source_v6 for the final mapping to ensure consistent transformations:

   if let Some(log_source_value) = stream_metadata_map.remove("log_source") {
       // If it's a string, convert it into an array of one object, else default to an array of one JSON.
-      if let Some(format_str) = log_source_value.as_str() {
-          let transformed_format = match format_str {
-              "Kinesis" => "kinesis",
-              "OtelLogs" => "otel-logs",
-              "OtelTraces" => "otel-traces",
-              "OtelMetrics" => "otel-metrics",
-              "Pmeta" => "pmeta",
-              "Json" => "json",
-              _ => "json",
-          };
-
-          let log_source_entry = json!({
-              "log_source_format": transformed_format,
-              "fields": []
-          });
-
-          stream_metadata_map.insert("log_source".to_owned(), json!([log_source_entry]));
-      } else {
-          let default_entry = json!({
-              "log_source_format": "json",
-              "fields": []
-          });
-          stream_metadata_map.insert("log_source".to_owned(), json!([default_entry]));
-      }
+      let mut array_entries = vec![];
+      if let Some(format_str) = log_source_value.as_str() {
+          array_entries.push(json!({"log_source_format": format_str, "fields": []}));
+      } else {
+          array_entries.push(json!({"log_source_format": "Json", "fields": []}));
+      }
+      // Insert the array for rename_log_source_v6 to process
+      stream_metadata_map.insert("log_source".to_owned(), Value::Array(array_entries));
   } else {
       let default_entry = json!({
           "log_source_format": "json",
           "fields": []
       });
       stream_metadata_map.insert("log_source".to_owned(), json!([default_entry]));
   }

- // After setting up the array-based log_source, call rename_log_source_v6 to unify transformations
- let updated_value = super::rename_log_source_v6(Value::Object(stream_metadata_map.clone()));
- *stream_metadata_map = updated_value.as_object().unwrap().clone();

230-255: Centralize the format mapping in a constant or helper function.
The format mapping here is identical to the match statements in v5_v6. Consider abstracting repetitive mappings into a single static or function for maintainability. This way, both v5_v6 and rename_log_source_v6 can use the same logic and remain in sync.


324-331: Expand test coverage for unknown formats.
The test covers standard conversions but does not confirm behavior for custom or unsupported formats. Including a test with a non-mapped format (e.g., "MyCustomLogs") ensures that rename_log_source_v6 behaves as intended when encountering unexpected values.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0baf20f and ba7c3a7.

📒 Files selected for processing (2)
  • src/migration/mod.rs (4 hunks)
  • src/migration/stream_metadata_migration.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/migration/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: coverage
🔇 Additional comments (2)
src/migration/stream_metadata_migration.rs (2)

20-20: No issues with this import.
This import is required for the HashMap usage in the new function.


25-25: Import addition looks good.
Adding INTERNAL_STREAM_NAME to the imports is consistent with your usage below.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 25, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/migration/stream_metadata_migration.rs (2)

216-226: Potential duplication in map_log_source_format.

This direct approach works well for immediate usage, but note that similar logic appears in create_format_mapping(). Consider consolidating them to reduce duplication and future maintenance.


256-266: Refactor suggestion to unify mapping logic.

This create_format_mapping() function repeats the same mapping as map_log_source_format(). Consider unifying them into a single mapping source or function to avoid duplication and reduce divergence risk.

Possible approach:

- fn map_log_source_format(format_str: &str) -> &str {
-     match format_str {
-         "Kinesis" => "kinesis",
-         "OtelLogs" => "otel-logs",
-         ...
-         _ => "json",
-     }
- }
- 
- fn create_format_mapping() -> HashMap<&'static str, &'static str> {
-     HashMap::from([
-         ("Kinesis", "kinesis"),
-         ("OtelLogs", "otel-logs"),
-         ...
-     ])
- }

+ lazy_static! {
+     static ref FORMAT_MAPPING: HashMap<&'static str, &'static str> = {
+         HashMap::from([
+             ("Kinesis", "kinesis"),
+             ("OtelLogs", "otel-logs"),
+             // ...
+         ])
+     };
+ }
+ 
+ fn map_log_source_format(format_str: &str) -> &str {
+     FORMAT_MAPPING.get(format_str).unwrap_or(&"json")
+ }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba7c3a7 and 34e8059.

📒 Files selected for processing (1)
  • src/migration/stream_metadata_migration.rs (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (7)
src/migration/stream_metadata_migration.rs (7)

20-20: No issues with the new import.

This import is needed to support the mappings in subsequent code.


25-25: Including INTERNAL_STREAM_NAME reference.

Adding handlers::http::cluster::INTERNAL_STREAM_NAME ensures the function can correctly assign stream_type for internal streams. This seems aligned with the migration logic.


181-203: Validate unknown log sources defaulting to JSON.

The v5_v6 logic sets the schema version to v6 and ensures that any missing or unrecognized log_source is converted to a default JSON-based entry. If this fallback is intentional, the code looks fine. Ensure that future changes or expansions of supported formats remain consistent with this fallback approach.


204-214: transform_log_source fallback for unknown formats.

This function applies a known mapping for recognized strings and otherwise returns a default JSON entry. Confirm that silently discarding unknown format strings is the desired behavior, as the data will be overwritten by default.

Would you like to verify that dropping unknown formats is correct, or should we preserve the unrecognized format string for debugging?


228-233: Default JSON log source.

Using an explicit default entry is a clean approach for unknown or missing data. It provides clarity about the fallback.


235-253: Check behavior for unmapped log_source formats.

Unlike v5_v6, rename_log_source_v6 will leave formats unchanged if no mapping exists. Verify if this difference is intended:

  • v5_v6 replaces unknown with "json".
  • rename_log_source_v6 leaves unknown formats untouched.

If keeping unknown formats is desired, this is fine; otherwise, unify the behavior with v5_v6.


336-341: Excellent test coverage for rename_log_source_v6.

This test ensures that multiple Otel-based formats are properly renamed. The approach is thorough and verifies the correct transformation, matching PR objectives for standardizing log source formats.

@nitisht nitisht merged commit a114e14 into parseablehq:main Mar 25, 2025
14 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Mar 27, 2025
@nikhilsinhaparseable nikhilsinhaparseable deleted the fix-p-format branch July 12, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants