Skip to content

Conversation

@cloud-fan
Copy link
Contributor

This bug is caused by a wrong column-exist-check in __getitem__ of pyspark dataframe. DataFrame.apply accepts not only top level column names, but also nested column name like a.b, so we should remove that check from __getitem__.

@cloud-fan
Copy link
Contributor Author

cc @marmbrus @davies

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #40879 has finished for PR 8202 at commit 8988661.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #40887 has finished for PR 8202 at commit e61dd96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we still have this check if there is not '.' in item?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe, instead of doing any checks here, is it possible to catch specific errors here thrown by the analyzer and just reformat them nicely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i assume that was the whole point of doing a check here?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already capture the AnalysisException. If it's already good enough, we can just remove this check.

@marmbrus
Copy link
Contributor

Thanks! Merging to master and 1.5

@asfgit asfgit closed this in 1150a19 Aug 14, 2015
asfgit pushed a commit that referenced this pull request Aug 14, 2015
This bug is caused by a wrong column-exist-check in `__getitem__` of pyspark dataframe. `DataFrame.apply` accepts not only top level column names, but also nested column name like `a.b`, so we should remove that check from `__getitem__`.

Author: Wenchen Fan <[email protected]>

Closes #8202 from cloud-fan/nested.
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
This bug is caused by a wrong column-exist-check in `__getitem__` of pyspark dataframe. `DataFrame.apply` accepts not only top level column names, but also nested column name like `a.b`, so we should remove that check from `__getitem__`.

Author: Wenchen Fan <[email protected]>

Closes apache#8202 from cloud-fan/nested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants