[SPARK-8670][SQL] Nested columns can't be referenced in pyspark #8202

cloud-fan · 2015-08-14T12:23:51Z

This bug is caused by a wrong column-exist-check in __getitem__ of pyspark dataframe. DataFrame.apply accepts not only top level column names, but also nested column name like a.b, so we should remove that check from __getitem__.

cloud-fan · 2015-08-14T12:24:03Z

cc @marmbrus @davies

cloud-fan · 2015-08-14T12:48:36Z

retest this please.

SparkQA · 2015-08-14T14:57:51Z

Test build #40879 has finished for PR 8202 at commit 8988661.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-14T17:52:33Z

Test build #40887 has finished for PR 8202 at commit e61dd96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-08-14T17:58:45Z

python/pyspark/sql/dataframe.py

Could we still have this check if there is not '.' in item?

or maybe, instead of doing any checks here, is it possible to catch specific errors here thrown by the analyzer and just reformat them nicely?

(i assume that was the whole point of doing a check here?)

We already capture the AnalysisException. If it's already good enough, we can just remove this check.

marmbrus · 2015-08-14T21:08:59Z

Thanks! Merging to master and 1.5

This bug is caused by a wrong column-exist-check in `__getitem__` of pyspark dataframe. `DataFrame.apply` accepts not only top level column names, but also nested column name like `a.b`, so we should remove that check from `__getitem__`. Author: Wenchen Fan <[email protected]> Closes #8202 from cloud-fan/nested.

This bug is caused by a wrong column-exist-check in `__getitem__` of pyspark dataframe. `DataFrame.apply` accepts not only top level column names, but also nested column name like `a.b`, so we should remove that check from `__getitem__`. Author: Wenchen Fan <[email protected]> Closes apache#8202 from cloud-fan/nested.

remove the column-exist-check in __getitem__ in python dataframe

8988661

update expected exception type

e61dd96

davies reviewed Aug 14, 2015
View reviewed changes

asfgit closed this in 1150a19 Aug 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-8670][SQL] Nested columns can't be referenced in pyspark #8202

[SPARK-8670][SQL] Nested columns can't be referenced in pyspark #8202

Uh oh!

cloud-fan commented Aug 14, 2015

Uh oh!

cloud-fan commented Aug 14, 2015

Uh oh!

cloud-fan commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

davies Aug 14, 2015

Uh oh!

marmbrus Aug 14, 2015

Uh oh!

marmbrus Aug 14, 2015

Uh oh!

davies Aug 14, 2015

Uh oh!

marmbrus commented Aug 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-8670][SQL] Nested columns can't be referenced in pyspark #8202

[SPARK-8670][SQL] Nested columns can't be referenced in pyspark #8202

Uh oh!

Conversation

cloud-fan commented Aug 14, 2015

Uh oh!

cloud-fan commented Aug 14, 2015

Uh oh!

cloud-fan commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

davies Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

marmbrus Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

marmbrus Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

davies Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Aug 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants