-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15672][R][DOC] R programming guide update #13660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #60485 has finished for PR 13660 at commit
|
docs/sparkr.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to add an introduction here that there are two kinds of user-defined functions we support in SparkR. Something like
In SparkR we support two kinds for user-defined functions
1. Run a given function on a large dataset using dapply.
2. Run many functions in parallel using spark.lapply.
|
Thanks @vectorijk - I left some comments inline. cc @felixcheung |
docs/sparkr.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps explain why the schema needs to be passed here?
|
Ping @vectorijk |
063bc8e to
920c975
Compare
|
@jkbradley @shivaram @felixcheung addressed comments. |
|
Test build #60788 has finished for PR 13660 at commit
|
|
Test build #60787 has finished for PR 13660 at commit
|
|
Jenkins test this again. |
|
great! please see pending PR #13752 on removing |
|
Hi @vectorijk , @felixcheung , @sun-rui , @shivaram Is this on purpose ? |
|
@NarineK That is sort of unrelated to this PR since this PR is about the programming guide? But in short, this happens because in the R code both |
|
Yeah we can remove the duplication by having separate rd files or by just removing documentation for the overlapping arguments (I think in this case @NarineK feel free to open a separate JIRA/PR for this |
|
Test build #60863 has finished for PR 13660 at commit
|
docs/sparkr.md
Outdated
| </div> | ||
|
|
||
| ##### dapplyCollect | ||
| Like `dapply`, apply a function to each partition of `SparkDataFrame` and collect the result back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its good to say a couple of things here. First that we don't require any schema to be passed in to dapplyCollect (unlike dapply). The other thing is that its good to remind users that this should be used only if the output of the UDF run on all the partitions can fit in driver memory.
|
Test build #60918 has finished for PR 13660 at commit
|
|
Can you add documentation for gapply() and gapplyCollect() together here? or @NarineK will do in another PR? |
docs/sparkr.md
Outdated
| </div> | ||
|
|
||
| ### Applying User-defined Function | ||
| In SparkR, we support several kinds for User-defined Functions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
several kinds of?
docs/sparkr.md
Outdated
| </div> | ||
|
|
||
| ##### dapplyCollect | ||
| Like `dapply`, apply a function to each partition of `SparkDataFrame` and collect the result back. The output of function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply a function to each partition of a SparkDataFrame
|
Test build #60957 has finished for PR 13660 at commit
|
|
@felixcheung @jkbradley any more comments on this ? |
|
LGTM. I'll merge this with master and branch-2.0 |
## What changes were proposed in this pull request? Guide for - UDFs with dapply, dapplyCollect - spark.lapply for running parallel R functions ## How was this patch tested? build locally <img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png"> Author: Kai Jiang <[email protected]> Closes #13660 from vectorijk/spark-15672-R-guide-update. (cherry picked from commit 43b04b7) Signed-off-by: Joseph K. Bradley <[email protected]>
What changes were proposed in this pull request?
Guide for
How was this patch tested?
build locally
