Skip to content

Conversation

@0x0FFF
Copy link
Contributor

@0x0FFF 0x0FFF commented Sep 2, 2015

pyspark.sql.column.Column object has __getitem__ method, which makes it iterable for Python. In fact it has __getitem__ to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)

Issue reproduction:

df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
for i in df["name"]: print i

@srowen
Copy link
Member

srowen commented Sep 2, 2015

Don't know much about Python myself but that sounds convincing. CC @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use assertRaises to test the exception case.

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #1712 has finished for PR 8574 at commit ea2e9d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample
    • class KMeans @Since("1.5.0") (
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):
    • case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
    • case class UnionNode(children: Seq[LocalNode]) extends LocalNode

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 2, 2015

@cloud-fan, I addressed your comments with last commit

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 2, 2015

Looks like it's not being retested after the last commit as Jenkins failed to update the status and the dashboard shows that it's still running. Am I right?

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 2, 2015

Jenkins, retest this please

@davies
Copy link
Contributor

davies commented Sep 2, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #1714 has finished for PR 8574 at commit f041635.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Sep 2, 2015

Merged into master, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants