Skip to content

Commit 843a1eb

Browse files
felixcheungshivaram
authored andcommitted
[SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for corr and other DataFrame stats functions
## What changes were proposed in this pull request? Doc only changes. Please see screenshots. Before: http://spark.apache.org/docs/latest/api/R/statfunctions.html ![image](https://cloud.githubusercontent.com/assets/8969467/15264110/cd458826-1924-11e6-85bd-8ee2e2e1a85f.png) After ![image](https://cloud.githubusercontent.com/assets/8969467/16218452/b9e89f08-3732-11e6-969d-a3a1796e7ad0.png) (please ignore the style differences - this is due to not having the css in my local copy) This is still a bit weird. As discussed in SPARK-15237, I think the better approach is to separate out the DataFrame stats function instead of putting everything on one page. At least now it is clearer which description is on which function. ## How was this patch tested? Build doc Author: Felix Cheung <[email protected]> Author: felixcheung <[email protected]> Closes #13109 from felixcheung/rstatdoc.
1 parent 09f4cea commit 843a1eb

File tree

2 files changed

+17
-23
lines changed

2 files changed

+17
-23
lines changed

R/pkg/R/generics.R

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -434,19 +434,19 @@ setGeneric("coltypes<-", function(x, value) { standardGeneric("coltypes<-") })
434434
#' @export
435435
setGeneric("columns", function(x) {standardGeneric("columns") })
436436

437-
#' @rdname statfunctions
437+
#' @rdname cov
438438
#' @export
439439
setGeneric("cov", function(x, ...) {standardGeneric("cov") })
440440

441-
#' @rdname statfunctions
441+
#' @rdname corr
442442
#' @export
443443
setGeneric("corr", function(x, ...) {standardGeneric("corr") })
444444

445-
#' @rdname statfunctions
445+
#' @rdname cov
446446
#' @export
447447
setGeneric("covar_samp", function(col1, col2) {standardGeneric("covar_samp") })
448448

449-
#' @rdname statfunctions
449+
#' @rdname covar_pop
450450
#' @export
451451
setGeneric("covar_pop", function(col1, col2) {standardGeneric("covar_pop") })
452452

R/pkg/R/stats.R

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@
1919

2020
setOldClass("jobj")
2121

22-
#' crosstab
23-
#'
24-
#' Computes a pair-wise frequency table of the given columns. Also known as a contingency
22+
#' @title SparkDataFrame statistic functions
23+
24+
#' @description
25+
#' crosstab - Computes a pair-wise frequency table of the given columns. Also known as a contingency
2526
#' table. The number of distinct values for each column should be less than 1e4. At most 1e6
2627
#' non-zero pair frequencies will be returned.
2728
#'
@@ -49,16 +50,14 @@ setMethod("crosstab",
4950
collect(dataFrame(sct))
5051
})
5152

52-
#' cov
53-
#'
5453
#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
5554
#'
5655
#' @param x A SparkDataFrame
5756
#' @param col1 the name of the first column
5857
#' @param col2 the name of the second column
5958
#' @return the covariance of the two columns.
6059
#'
61-
#' @rdname statfunctions
60+
#' @rdname cov
6261
#' @name cov
6362
#' @export
6463
#' @examples
@@ -75,8 +74,6 @@ setMethod("cov",
7574
callJMethod(statFunctions, "cov", col1, col2)
7675
})
7776

78-
#' corr
79-
#'
8077
#' Calculates the correlation of two columns of a SparkDataFrame.
8178
#' Currently only supports the Pearson Correlation Coefficient.
8279
#' For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.
@@ -88,7 +85,7 @@ setMethod("cov",
8885
#' only "pearson" is allowed now.
8986
#' @return The Pearson Correlation Coefficient as a Double.
9087
#'
91-
#' @rdname statfunctions
88+
#' @rdname corr
9289
#' @name corr
9390
#' @export
9491
#' @examples
@@ -106,9 +103,8 @@ setMethod("corr",
106103
callJMethod(statFunctions, "corr", col1, col2, method)
107104
})
108105

109-
#' freqItems
110-
#'
111-
#' Finding frequent items for columns, possibly with false positives.
106+
#' @description
107+
#' freqItems - Finding frequent items for columns, possibly with false positives.
112108
#' Using the frequent element count algorithm described in
113109
#' \url{http://dx.doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou.
114110
#'
@@ -134,10 +130,8 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character"),
134130
collect(dataFrame(sct))
135131
})
136132

137-
#' approxQuantile
138-
#'
139-
#' Calculates the approximate quantiles of a numerical column of a SparkDataFrame.
140-
#'
133+
#' @description
134+
#' approxQuantile - Calculates the approximate quantiles of a numerical column of a SparkDataFrame.
141135
#' The result of this algorithm has the following deterministic bound:
142136
#' If the SparkDataFrame has N elements and if we request the quantile at probability `p` up to
143137
#' error `err`, then the algorithm will return a sample `x` from the SparkDataFrame so that the
@@ -174,9 +168,9 @@ setMethod("approxQuantile",
174168
as.list(probabilities), relativeError)
175169
})
176170

177-
#' sampleBy
178-
#'
179-
#' Returns a stratified sample without replacement based on the fraction given on each stratum.
171+
#' @description
172+
#' sampleBy - Returns a stratified sample without replacement based on the fraction given on each
173+
#' stratum.
180174
#'
181175
#' @param x A SparkDataFrame
182176
#' @param col column that defines strata

0 commit comments

Comments
 (0)