[SPARK-40981][CONNECT][PYTHON] Support session.range in Python client #38460

amaliujia · 2022-11-01T04:22:16Z

What changes were proposed in this pull request?

This PR adds range API to Python client's RemoteSparkSession with tests.

This PR also updates start, end, step to int64 in the Connect proto.

Why are the changes needed?

Improve API coverage.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

UT

amaliujia · 2022-11-01T04:54:26Z

R: @HyukjinKwon @zhengruifeng

zhengruifeng · 2022-11-01T05:16:10Z

python/pyspark/sql/connect/client.py

i think we can use step: int = 1

furthermore, i think we can make step a required field in proto

hmmm we are not marking step as required because Scala side implementation does not treat it as required and thus it has also a default value.

private def transformRange(rel: proto.Range): LogicalPlan = { val start = rel.getStart val end = rel.getEnd val step = if (rel.hasStep) { rel.getStep.getStep } else { 1 } val numPartitions = if (rel.hasNumPartitions) { rel.getNumPartitions.getNumPartitions } else { session.leafNodeDefaultParallelism } logical.Range(start, end, step, numPartitions) }

Same for numPartitions.

zhengruifeng · 2022-11-01T05:24:46Z

python/pyspark/sql/tests/connect/test_connect_plan_only.py

what about adding a new case like range(start=10, end=20) and check:
1, step is set 1 (if we make the default value 1);
2, num_partitions not set;

I added this test case but only test step and num_partitions is not set.

Right now there is a division between client and server such that:

Client take care of required fields, meaning that clients need to make sure the required fields are set.

Server side take care of default values for optional fields.

This is to reduce load for both sides of implementation:

clients do not need to worry about default values for optional fields unless the default value is exposed on the DataFrame API already.

Server side do not care for whether required field is set (clients enforce it) but server side tracks the default value for optional fields. This can also avoid that clients side to set different default value. The default values are documented in proto:

spark/connector/connect/src/main/protobuf/spark/connect/relations.proto

Line 232 in fb64041

// it is set, or 2) spark default parallelism.

zhengruifeng · 2022-11-01T08:53:22Z

merged into master

HyukjinKwon

LGTM2

### What changes were proposed in this pull request? This PR adds `range` API to Python client's `RemoteSparkSession` with tests. This PR also updates `start`, `end`, `step` to `int64` in the Connect proto. ### Why are the changes needed? Improve API coverage. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT Closes apache#38460 from amaliujia/SPARK-40981. Authored-by: Rui Wang <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

github-actions bot added CONNECT CORE PYTHON SQL labels Nov 1, 2022

zhengruifeng reviewed Nov 1, 2022

View reviewed changes

amaliujia mentioned this pull request Nov 1, 2022

[SPARK-40883][CONNECT] Support Range in Connect proto #38347

Closed

amaliujia added 2 commits October 31, 2022 23:04

[SPARK-40981][CONNECT][PYTHON] Support session.range in Python client.

5d65d0f

update

2bad9fd

amaliujia force-pushed the SPARK-40981 branch from 59759a9 to 2bad9fd Compare November 1, 2022 06:05

grundprinzip approved these changes Nov 1, 2022

View reviewed changes

zhengruifeng approved these changes Nov 1, 2022

View reviewed changes

zhengruifeng closed this in 968463b Nov 1, 2022

HyukjinKwon reviewed Nov 10, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-40981][CONNECT][PYTHON] Support session.range in Python client #38460

[SPARK-40981][CONNECT][PYTHON] Support session.range in Python client #38460

Uh oh!

amaliujia commented Nov 1, 2022 •

edited

Loading

Uh oh!

amaliujia commented Nov 1, 2022

Uh oh!

zhengruifeng Nov 1, 2022

Uh oh!

zhengruifeng Nov 1, 2022

Uh oh!

amaliujia Nov 1, 2022 •

edited

Loading

Uh oh!

zhengruifeng Nov 1, 2022

Uh oh!

amaliujia Nov 1, 2022

Uh oh!

zhengruifeng commented Nov 1, 2022

Uh oh!

HyukjinKwon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-40981][CONNECT][PYTHON] Support session.range in Python client #38460

[SPARK-40981][CONNECT][PYTHON] Support session.range in Python client #38460

Uh oh!

Conversation

amaliujia commented Nov 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

amaliujia commented Nov 1, 2022

Uh oh!

zhengruifeng Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

amaliujia Nov 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

amaliujia Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Nov 1, 2022

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amaliujia commented Nov 1, 2022 •

edited

Loading

amaliujia Nov 1, 2022 •

edited

Loading