-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for corr and other DataFrame stats functions #13109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for corr and other DataFrame stats functions #13109
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,9 +19,10 @@ | |
|
|
||
| setOldClass("jobj") | ||
|
|
||
| #' crosstab | ||
| #' | ||
| #' Computes a pair-wise frequency table of the given columns. Also known as a contingency | ||
| #' @title SparkDataFrame statistic functions | ||
|
|
||
| #' @description | ||
| #' crosstab - Computes a pair-wise frequency table of the given columns. Also known as a contingency | ||
| #' table. The number of distinct values for each column should be less than 1e4. At most 1e6 | ||
| #' non-zero pair frequencies will be returned. | ||
| #' | ||
|
|
@@ -49,16 +50,14 @@ setMethod("crosstab", | |
| collect(dataFrame(sct)) | ||
| }) | ||
|
|
||
| #' cov | ||
| #' | ||
| #' Calculate the sample covariance of two numerical columns of a SparkDataFrame. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param col1 the name of the first column | ||
| #' @param col2 the name of the second column | ||
| #' @return the covariance of the two columns. | ||
| #' | ||
| #' @rdname statfunctions | ||
| #' @rdname cov | ||
| #' @name cov | ||
| #' @export | ||
| #' @examples | ||
|
|
@@ -75,8 +74,6 @@ setMethod("cov", | |
| callJMethod(statFunctions, "cov", col1, col2) | ||
| }) | ||
|
|
||
| #' corr | ||
| #' | ||
| #' Calculates the correlation of two columns of a SparkDataFrame. | ||
| #' Currently only supports the Pearson Correlation Coefficient. | ||
| #' For Spearman Correlation, consider using RDD methods found in MLlib's Statistics. | ||
|
|
@@ -88,7 +85,7 @@ setMethod("cov", | |
| #' only "pearson" is allowed now. | ||
| #' @return The Pearson Correlation Coefficient as a Double. | ||
| #' | ||
| #' @rdname statfunctions | ||
| #' @rdname corr | ||
| #' @name corr | ||
| #' @export | ||
| #' @examples | ||
|
|
@@ -106,9 +103,8 @@ setMethod("corr", | |
| callJMethod(statFunctions, "corr", col1, col2, method) | ||
| }) | ||
|
|
||
| #' freqItems | ||
| #' | ||
| #' Finding frequent items for columns, possibly with false positives. | ||
| #' @description | ||
| #' freqItems - Finding frequent items for columns, possibly with false positives. | ||
| #' Using the frequent element count algorithm described in | ||
| #' \url{http://dx.doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou. | ||
| #' | ||
|
|
@@ -134,10 +130,8 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character"), | |
| collect(dataFrame(sct)) | ||
| }) | ||
|
|
||
| #' approxQuantile | ||
| #' | ||
| #' Calculates the approximate quantiles of a numerical column of a SparkDataFrame. | ||
| #' | ||
| #' @description | ||
| #' approxQuantile - Calculates the approximate quantiles of a numerical column of a SparkDataFrame. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately, this line is ignored. We need After adding that, the description depth will look differently. I mean only I'm not sure about balancing them by removing
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Except the above, LGTM!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @felixcheung . I found that If we keep the description, please try the above pair.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, sorry I missed one |
||
| #' The result of this algorithm has the following deterministic bound: | ||
| #' If the SparkDataFrame has N elements and if we request the quantile at probability `p` up to | ||
| #' error `err`, then the algorithm will return a sample `x` from the SparkDataFrame so that the | ||
|
|
@@ -174,9 +168,9 @@ setMethod("approxQuantile", | |
| as.list(probabilities), relativeError) | ||
| }) | ||
|
|
||
| #' sampleBy | ||
| #' | ||
| #' Returns a stratified sample without replacement based on the fraction given on each stratum. | ||
| #' @description | ||
| #' sampleBy - Returns a stratified sample without replacement based on the fraction given on each | ||
| #' stratum. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param col column that defines strata | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @felixcheung .
When I use
./create-docs.sh, this breaks the page.Should I do differently to generate the html page like yours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the error you see? this works for me and Jenkins run create docs too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the generated html file,
file:///Users/dongjoon/spark/R/pkg/html/statfunctions.htmlhas the following title.Also,
index.htmlshows the above long string for all the stat functions likeapproxQuantile.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little bit different from your screenshot (After).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the build error - but the page title doesn't come up right. A screenshot is at https://www.dropbox.com/s/sc1mrd7upr6t7mp/Screenshot%202016-06-20%2021.25.57.png?dl=0
Also we seem to have some functions like covar_samp, covar_pop that don't have a description ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, sorry about that. roxygen2 is a bit.. stubborn.
cov,corrshouldn't be there - they are referenced in generics.R these bugs are also fixed in my other PR #13798 - there are quite a lot of them.