Skip to content

Conversation

@sun-rui
Copy link
Contributor

@sun-rui sun-rui commented Nov 17, 2015

No description provided.

@shivaram
Copy link
Contributor

Could you describe the problem a bit more ? Was it that raw vectors were being treated as lists ? Just curious how raw vectors differ from int vectors etc.

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46148 has finished for PR 9769 at commit 07a0f33.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor Author

sun-rui commented Nov 18, 2015

@shivaram, The R raw type is intended to hold raw bytes. while int vector is to hold 32-bit integer values. The R raw type maps to Spark SQL binary type, which is internally represented in Array[Byte].

This PR solves two problems:

  1. Inferring of raw type is incorrect.
    > SparkR:::infer_type(as.raw(c(1, 2 ,3))) [1] "array<binary>"
    This is not correct, it should be "binary".
  2. Collecting a DataFrame fails if there is any column of binary type. The bug lies in the logic that determines whether a collected column can be coerced into a atomic vector or not.

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46159 has finished for PR 9769 at commit df9bca7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Ok thanks for the clarification. It might take me a couple of days to get to this as the change looks a bit involved.

cc @felixcheung

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be returned from org.apache.spark.sql.api.r.SQLUtils.dfToCols for more optimal processing? this seems like potentially a lot of data to go through

@sun-rui
Copy link
Contributor Author

sun-rui commented Nov 23, 2015

@felixcheung, you concern is reasonable. I refactor the code by using schema to determine if a collected column can be coerced into an atomic vector.

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46532 has finished for PR 9769 at commit 2af83b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor Author

sun-rui commented Nov 25, 2015

@shivaram, @felixcheung, could you take more look? another PR depends on this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no space for func call: stopifnot(class(vec) != "list")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@felixcheung
Copy link
Member

looks good.

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46792 has finished for PR 9769 at commit f073c3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here as well ? Something like NOTE: "binary" columns behave like complex types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@shivaram
Copy link
Contributor

LGTM but for a minor comment.

@SparkQA
Copy link

SparkQA commented Nov 29, 2015

Test build #46847 has finished for PR 9769 at commit 0b1af63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

LGTM. Merging this

asfgit pushed a commit that referenced this pull request Nov 29, 2015
Author: Sun Rui <[email protected]>

Closes #9769 from sun-rui/SPARK-11781.

(cherry picked from commit cc7a1bc)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in cc7a1bc Nov 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants