-
-
Notifications
You must be signed in to change notification settings - Fork 153
refactor + test: query response serialization #1165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes update the error handling and response formatting in the query handling and response generation logic. A new error variant for Arrow errors is introduced in the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/response.rs (2)
31-59: Potential performance improvements
Writing entireRecordBatchdata into a single in-memory buffer witharrow_json::ArrayWritermay become memory-intensive for large result sets. Consider a streaming or chunk-based approach if you anticipate very large batches. Also note that settingfill_nulltotruewill require iterating over all objects to find missing fields, which can increase overhead.
62-219: Great test coverage
These tests thoroughly cover multiple scenarios, including handling of null values, empty batches, and field inclusion. However, there is repeated code for creating record batches. Consider refactoring common setup steps into a helper function to reduce duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/handlers/http/query.rs(6 hunks)src/response.rs(2 hunks)src/utils/arrow/mod.rs(0 hunks)
💤 Files with no reviewable changes (1)
- src/utils/arrow/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (8)
src/response.rs (2)
19-19: No issues
The newly added import forQueryErrorlooks correct.
21-21: Imports for JSON are appropriate
No concerns with the addedserde_jsonimports.src/handlers/http/query.rs (6)
22-22: Explicit import
IntroducingArrowErroris consistent with the newArrowerror variant.
35-35: New debug logging
No concerns with addingerroranddebugfromtracing.
105-105: Accurate timing logs for quick count queries
Recording and logging elapsed time here is beneficial for performance monitoring.Also applies to: 126-132
143-143: End-to-end timing for standard queries
Consistent usage of timing metrics here as well ensures comprehensive performance tracking.Also applies to: 145-145, 150-152
318-321: Enhanced error handling
AddingArrowandJsonvariants toQueryErrorclearly distinguishes error origins and improves debuggability.
327-329: Status code alignment
TreatingQueryError::JsonandQueryError::Arrowas internal server errors follows the existing pattern forExecuteerrors and keeps error handling consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/response.rs (2)
31-54: Consider adding error handling documentationThe
to_jsonmethod propagates errors from therecord_batches_to_jsonfunction, but there's no documentation about what specific error types might be returned. Consider adding documentation that explains the potential error scenarios./// Convert into a JSON response +/// +/// # Errors +/// +/// Returns a `QueryError` if: +/// - There's an error converting the record batches to JSON (Arrow error) +/// - JSON serialization fails pub fn to_json(&self) -> Result<Value, QueryError> {
32-54: Consider adding support for custom null value representationThe method currently uses
Value::Nullfor missing fields whenfill_nullis true. Consider extending this functionality to allow clients to specify a custom representation for null values (e.g., empty string, zero, etc.) which could be useful for certain data processing pipelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/response.rs(2 hunks)src/utils/arrow/mod.rs(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/arrow/mod.rs
🧰 Additional context used
🧬 Code Definitions (1)
src/response.rs (2)
src/handlers/http/query.rs (1)
query(70-154)src/utils/arrow/mod.rs (1)
record_batches_to_json(52-66)
🔇 Additional comments (11)
src/response.rs (11)
21-22: Updated import statements to align with new implementationThe imports have been updated to directly use
serde_json::Valueand therecord_batches_to_jsonutility function, which is appropriate for the new JSON-focused implementation.
32-34: Newto_jsonmethod correctly converts QueryResponse to serializable JSONThe method appropriately handles the conversion of Arrow
RecordBatchdata to JSON through therecord_batches_to_jsonutility function. The return type ofResult<Value, QueryError>is well-suited for JSON serialization, allowing for proper error propagation.
37-40: Null value handling is implemented correctlyWhen
fill_nullis true, the code properly iterates through each JSON object and ensures that any missing field gets aValue::Nullentry. This maintains data consistency for clients expecting all fields to be present.
44-51: Response structure correctly implements the configured output formatThe logic to conditionally include fields with the records (when
with_fieldsis true) or return just the records is well-implemented. The use ofjson!macro ensures proper serialization of both formats.
57-66: Well-organized test module setupThe test module is properly structured with appropriate imports, including all the necessary Arrow types for comprehensive testing.
67-77: Test for empty record batches is comprehensiveThis test correctly verifies that empty record batches convert to an empty array in JSON format, which is an important edge case to handle.
79-108: Thorough test for basic record batch conversionThe test creates a diverse set of data types (integers, floats, and strings) to ensure proper JSON conversion for various data types. The assertions correctly verify that the structure and values match the expected JSON output.
110-142: Test for including fields in the output is well-implementedThis test properly verifies the behavior when
with_fieldsis set to true, ensuring that both the fields array and records are included in the response as expected.
144-177: Test for handling record batches with nullable fields correctly verifies behaviorThis test thoroughly validates that nullable fields are handled correctly when
fill_nullis false. The test creates records with null values in the string array and verifies that these fields are omitted from the JSON output, which is the expected behavior.
179-211: Test for null value filling is comprehensiveThis test validates that when
fill_nullis set to true, null values are explicitly included in the JSON output rather than being omitted. The assertion correctly checks for the presence of explicit null values in the output.
33-35:❓ Verification inconclusive
Verify error propagation from record_batches_to_json
The code now directly propagates errors from
record_batches_to_jsonwith the?operator, but it's not clear how Arrow errors are mapped toQueryError. Ensure theQueryErrorenum has been updated to handle Arrow errors appropriately.
🏁 Script executed:
#!/bin/bash # Check if QueryError has been updated to handle Arrow errors rg -A 20 "pub enum QueryError" --type rustLength of output: 1325
Clarify error mapping for Arrow errors
The updated code now uses the
?operator to propagate errors fromrecord_batches_to_jsonas-is. Please verify that any errors it may return—particularly those stemming from Arrow operations—are correctly converted into the appropriateQueryError. Currently, theQueryErrorenum (defined insrc/handlers/http/query.rs) includes a variant forDataFusionError(with a#[from]conversion), which may already cover Arrow-related errors (if they are wrapped in aDataFusionError). However, ifrecord_batches_to_jsoncan yield errors that aren’t encapsulated byDataFusionError, an explicit mapping (or a newQueryErrorvariant) might be needed.
- Confirm that
record_batches_to_jsonreturns errors convertible intoQueryError(e.g., viaDataFusionError).- Update
QueryErrorto handle direct Arrow errors if such cases exist.
|
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA Devdutt Shenoi seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
Fixes #XXXX.
Description
Improves the testability of
QueryResponseThis PR has:
Summary by CodeRabbit