-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for corr and other DataFrame stats functions #13109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM. Thanks @felixcheung |
|
Test build #58596 has finished for PR 13109 at commit
|
|
This looks better. but the roxygen style is a little bit deviated. The previous is like: Current is like: We may need a consistent roxygen style documentation. At least for two styles: And also if you type '?corr' in R, only corr() for Column functions is displayed. Since R is function oriented, I think two corr() descriptions better to be displayed together in one page? |
|
right - that's why my comment on SPARK-15237 is that we should have a different rd for each of "statsfunctions" instead of having all of them on one rd. To clarify, currently, we have
What I think we should have instead is
I think it is rather confusing to put both DataFrame |
|
My opinion is that since R supports generic function and a generic function can have multiple methods for it, it is natural to put both corr() in the same page. Is there a mechanism that descriptions can be aggregated even if methods of the same name are distributed in different RD files? |
|
that's fine, I don't know the history of putting stats column function into one rd page though. |
|
Chiming in a little late here -- from my R usage, I've definitely seen two patterns commonly used
Right now my take is the following:
Let me know what you think of this proposal. |
|
@shivaram, I checked your two examples. It seems that the rule is that: I have no strong opinion on this. so +1 |
- this changes might happen in apache#13109
|
@felixcheung Is this PR still relevant ? |
|
I think it does. I will try to update this today. |
|
Updated |
|
Test build #60897 has finished for PR 13109 at commit
|
| setOldClass("jobj") | ||
|
|
||
| #' crosstab | ||
| #' @title SparkDataFrame statistic functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @felixcheung .
When I use ./create-docs.sh, this breaks the page.
Should I do differently to generate the html page like yours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the error you see? this works for me and Jenkins run create docs too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the generated html file, file:///Users/dongjoon/spark/R/pkg/html/statfunctions.html has the following title.
SparkDataFrame statistic functions crosstab - Computes a pair-wise frequency table of the given columns Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned.
Also, index.html shows the above long string for all the stat functions like approxQuantile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little bit different from your screenshot (After).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the build error - but the page title doesn't come up right. A screenshot is at https://www.dropbox.com/s/sc1mrd7upr6t7mp/Screenshot%202016-06-20%2021.25.57.png?dl=0
Also we seem to have some functions like covar_samp, covar_pop that don't have a description ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, sorry about that. roxygen2 is a bit.. stubborn.
cov, corr shouldn't be there - they are referenced in generics.R these bugs are also fixed in my other PR #13798 - there are quite a lot of them.
|
Oh, the root cause exists in |
|
It's nasty! 😄 |
|
In line 333 of |
|
That's intentional - covar_pop has a separate page. |
R/pkg/R/generics.R
Outdated
| setGeneric("covar_samp", function(col1, col2) {standardGeneric("covar_samp") }) | ||
|
|
||
| #' @rdname statfunctions | ||
| #' @rdname cov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If then, here, cov -> covar_pop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thx!
|
Yes. Indeed, we had better keep each function on own RD generally. |
|
Test build #60905 has finished for PR 13109 at commit
|
R/pkg/R/stats.R
Outdated
| #' | ||
| #' Calculates the approximate quantiles of a numerical column of a SparkDataFrame. | ||
| #' approxQuantile - Calculates the approximate quantiles of a numerical column of a SparkDataFrame. | ||
| #' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to delete this line in between the approxQuantile line and the The result of this algorithm line. Otherwise the first line doesn't seem to show up in the rendered doc ?
|
Test build #60908 has finished for PR 13109 at commit
|
|
I just checked the generated R doc and I felt that we shouldn't group many methods together. For example, in this PR, the |
|
LGTM. This version looks good to me. Thanks for iterating on this. Will wait for Jenkins and then merge. |
|
@mengxr Yes - this is true and in #13798 we are making a few more of the methods into individual Rd files. At a high level there is a tradition in R to group together similar methods (https://stat.ethz.ch/R-manual/R-devel/library/base/html/colSums.html for example) but with roxygen2 that leads to some issues. I think we can even split the |
|
Methods documented in For |
|
Test build #60910 has finished for PR 13109 at commit
|
|
@mengxr @felixcheung Can we open a new issue of the form @mengxr - also let me know if you think this is good to merge. I think for 2.0 RC1 having this PR is better than not having it ? |
|
Jenkins, retest this please |
|
Yes, we should merge this PR first and discuss the grouping later. |
|
Created https://issues.apache.org/jira/browse/SPARK-16090 to follow up. |
|
Test build #60912 has finished for PR 13109 at commit
|
|
Cool. Thanks - LGTM. Merging this to master, branch-2.0 |
…DataFrame stats functions ## What changes were proposed in this pull request? Doc only changes. Please see screenshots. Before: http://spark.apache.org/docs/latest/api/R/statfunctions.html  After  (please ignore the style differences - this is due to not having the css in my local copy) This is still a bit weird. As discussed in SPARK-15237, I think the better approach is to separate out the DataFrame stats function instead of putting everything on one page. At least now it is clearer which description is on which function. ## How was this patch tested? Build doc Author: Felix Cheung <[email protected]> Author: felixcheung <[email protected]> Closes #13109 from felixcheung/rstatdoc. (cherry picked from commit 843a1eb) Signed-off-by: Shivaram Venkataraman <[email protected]>


What changes were proposed in this pull request?
Doc only changes. Please see screenshots.
Before:

http://spark.apache.org/docs/latest/api/R/statfunctions.html
After

(please ignore the style differences - this is due to not having the css in my local copy)
This is still a bit weird. As discussed in SPARK-15237, I think the better approach is to separate out the DataFrame stats function instead of putting everything on one page. At least now it is clearer which description is on which function.
How was this patch tested?
Build doc