Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 15, 2020

What changes were proposed in this pull request?

When toPandas API works on duplicate column names produced from operators like join, we see the error like:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This patch fixes the error in toPandas API.

This is the backport of original patch to branch-2.4.

Why are the changes needed?

To make toPandas work on dataframe with duplicate column names.

Does this PR introduce any user-facing change?

Yes. Previously calling toPandas API on a dataframe with duplicate column names will fail. After this patch, it will produce correct result.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121294 has finished for PR 28219 at commit 127f2a3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121295 has finished for PR 28219 at commit 6bdad4f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121298 has finished for PR 28219 at commit 6e317aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@HyukjinKwon
Copy link
Member

Thanks @viirya. Merged to branch-2.4.

HyukjinKwon pushed a commit that referenced this pull request Apr 15, 2020
…e column names

### What changes were proposed in this pull request?

When `toPandas` API works on duplicate column names produced from operators like join, we see the error like:

```
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```

This patch fixes the error in `toPandas` API.

This is the backport of original patch to branch-2.4.

### Why are the changes needed?

To make `toPandas` work on dataframe with duplicate column names.

### Does this PR introduce any user-facing change?

Yes. Previously calling `toPandas` API on a dataframe with duplicate column names will fail. After this patch, it will produce correct result.

### How was this patch tested?

Unit test.

Closes #28219 from viirya/SPARK-31186-2.4.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@HyukjinKwon
Copy link
Member

cc @ueshin

@viirya viirya deleted the SPARK-31186-2.4 branch December 27, 2023 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants