[SPARK-31186][PySpark][SQL][2.4] toPandas should not fail on duplicate column names #28219

viirya · 2020-04-15T00:10:31Z

What changes were proposed in this pull request?

When toPandas API works on duplicate column names produced from operators like join, we see the error like:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This patch fixes the error in toPandas API.

This is the backport of original patch to branch-2.4.

Why are the changes needed?

To make toPandas work on dataframe with duplicate column names.

Does this PR introduce any user-facing change?

Yes. Previously calling toPandas API on a dataframe with duplicate column names will fail. After this patch, it will produce correct result.

How was this patch tested?

Unit test.

SparkQA · 2020-04-15T00:30:18Z

Test build #121294 has finished for PR 28219 at commit 127f2a3.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-15T01:28:53Z

Test build #121295 has finished for PR 28219 at commit 6bdad4f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-15T03:26:51Z

Test build #121298 has finished for PR 28219 at commit 6e317aa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

Looks good

HyukjinKwon · 2020-04-15T04:56:57Z

Thanks @viirya. Merged to branch-2.4.

…e column names ### What changes were proposed in this pull request? When `toPandas` API works on duplicate column names produced from operators like join, we see the error like: ``` ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). ``` This patch fixes the error in `toPandas` API. This is the backport of original patch to branch-2.4. ### Why are the changes needed? To make `toPandas` work on dataframe with duplicate column names. ### Does this PR introduce any user-facing change? Yes. Previously calling `toPandas` API on a dataframe with duplicate column names will fail. After this patch, it will produce correct result. ### How was this patch tested? Unit test. Closes #28219 from viirya/SPARK-31186-2.4. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

HyukjinKwon · 2020-04-15T04:58:49Z

cc @ueshin

probot-autolabeler bot added PYTHON SQL labels Apr 15, 2020

viirya force-pushed the SPARK-31186-2.4 branch from 127f2a3 to 6bdad4f Compare April 15, 2020 01:01

Backport SPARK-31186.

6e317aa

viirya force-pushed the SPARK-31186-2.4 branch from 6bdad4f to 6e317aa Compare April 15, 2020 02:49

HyukjinKwon approved these changes Apr 15, 2020

View reviewed changes

HyukjinKwon closed this Apr 15, 2020

viirya deleted the SPARK-31186-2.4 branch December 27, 2023 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-31186][PySpark][SQL][2.4] toPandas should not fail on duplicate column names #28219

[SPARK-31186][PySpark][SQL][2.4] toPandas should not fail on duplicate column names #28219

Uh oh!

viirya commented Apr 15, 2020

Uh oh!

SparkQA commented Apr 15, 2020

Uh oh!

SparkQA commented Apr 15, 2020

Uh oh!

SparkQA commented Apr 15, 2020

Uh oh!

HyukjinKwon left a comment

Uh oh!

HyukjinKwon commented Apr 15, 2020

Uh oh!

HyukjinKwon commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-31186][PySpark][SQL][2.4] toPandas should not fail on duplicate column names #28219

[SPARK-31186][PySpark][SQL][2.4] toPandas should not fail on duplicate column names #28219

Uh oh!

Conversation

viirya commented Apr 15, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Apr 15, 2020

Uh oh!

SparkQA commented Apr 15, 2020

Uh oh!

SparkQA commented Apr 15, 2020

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 15, 2020

Uh oh!

HyukjinKwon commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants