-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-49673][CONNECT] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to 0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE #48122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
Outdated
Show resolved
Hide resolved
nija-at
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any existing tests that use the new limits to ensure that everything still works?
I've added an E2E test that explicitly tests multiple batches and changed the static limit to 10MiB there to not create too much memory pressure. |
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
Show resolved
Hide resolved
|
Merged to master. |
…0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE ### What changes were proposed in this pull request? Increases the default `maxBatchSize` from 4MiB * 0.7 to 128MiB (= CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size. This limit is used when creating Arrow batches for the `SqlCommandResult` in the `SparkConnectPlanner` and for `ExecutePlanResponse.ArrowBatch` in `processAsArrowBatches`. This, for example, lets us return much larger `LocalRelations` in the `SqlCommandResult` (i.e., for the `SHOW PARTITIONS` command) while still staying within the GRPC message size limit. ### Why are the changes needed? There are `SqlCommandResults` that exceed 0.7 * 4MiB. ### Does this PR introduce _any_ user-facing change? Now support `SqlCommandResults` <= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48122 from dillitz/increase-sql-command-batch-size. Authored-by: Robert Dillitz <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE ### What changes were proposed in this pull request? Increases the default `maxBatchSize` from 4MiB * 0.7 to 128MiB (= CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size. This limit is used when creating Arrow batches for the `SqlCommandResult` in the `SparkConnectPlanner` and for `ExecutePlanResponse.ArrowBatch` in `processAsArrowBatches`. This, for example, lets us return much larger `LocalRelations` in the `SqlCommandResult` (i.e., for the `SHOW PARTITIONS` command) while still staying within the GRPC message size limit. ### Why are the changes needed? There are `SqlCommandResults` that exceed 0.7 * 4MiB. ### Does this PR introduce _any_ user-facing change? Now support `SqlCommandResults` <= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48122 from dillitz/increase-sql-command-batch-size. Authored-by: Robert Dillitz <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
Increases the default
maxBatchSizefrom 4MiB * 0.7 to 128MiB (=CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size.
This limit is used when creating Arrow batches for the
SqlCommandResultin theSparkConnectPlannerand forExecutePlanResponse.ArrowBatchinprocessAsArrowBatches. This, for example, lets us return much largerLocalRelationsin theSqlCommandResult(i.e., for theSHOW PARTITIONScommand) while still staying within the GRPC message size limit.Why are the changes needed?
There are
SqlCommandResultsthat exceed 0.7 * 4MiB.Does this PR introduce any user-facing change?
Now support
SqlCommandResults<= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB.How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.