-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type. #9769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you describe the problem a bit more ? Was it that raw vectors were being treated as lists ? Just curious how raw vectors differ from int vectors etc. |
|
Test build #46148 has finished for PR 9769 at commit
|
|
@shivaram, The R raw type is intended to hold raw bytes. while int vector is to hold 32-bit integer values. The R raw type maps to Spark SQL binary type, which is internally represented in Array[Byte]. This PR solves two problems:
|
|
Test build #46159 has finished for PR 9769 at commit
|
|
Ok thanks for the clarification. It might take me a couple of days to get to this as the change looks a bit involved. cc @felixcheung |
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be returned from org.apache.spark.sql.api.r.SQLUtils.dfToCols for more optimal processing? this seems like potentially a lot of data to go through
|
@felixcheung, you concern is reasonable. I refactor the code by using schema to determine if a collected column can be coerced into an atomic vector. |
|
Test build #46532 has finished for PR 9769 at commit
|
|
@shivaram, @felixcheung, could you take more look? another PR depends on this one. |
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no space for func call: stopifnot(class(vec) != "list")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
looks good. |
|
Test build #46792 has finished for PR 9769 at commit
|
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here as well ? Something like NOTE: "binary" columns behave like complex types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
LGTM but for a minor comment. |
|
Test build #46847 has finished for PR 9769 at commit
|
|
LGTM. Merging this |
Author: Sun Rui <[email protected]> Closes #9769 from sun-rui/SPARK-11781. (cherry picked from commit cc7a1bc) Signed-off-by: Shivaram Venkataraman <[email protected]>
No description provided.