Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 18 additions & 21 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,7 @@ setMethod("createOrReplaceTempView",
})

#' (Deprecated) Register Temporary Table
#'
#' Registers a SparkDataFrame as a Temporary Table in the SQLContext
#' @param x A SparkDataFrame
#' @param tableName A character vector containing the name of the table
Expand Down Expand Up @@ -606,10 +607,10 @@ setMethod("unpersist",
#'
#' The following options for repartition are possible:
#' \itemize{
#' \item{"Option 1"} {Return a new SparkDataFrame partitioned by
#' \item{1.} {Return a new SparkDataFrame partitioned by
#' the given columns into `numPartitions`.}
#' \item{"Option 2"} {Return a new SparkDataFrame that has exactly `numPartitions`.}
#' \item{"Option 3"} {Return a new SparkDataFrame partitioned by the given column(s),
#' \item{2.} {Return a new SparkDataFrame that has exactly `numPartitions`.}
#' \item{3.} {Return a new SparkDataFrame partitioned by the given column(s),
#' using `spark.sql.shuffle.partitions` as number of partitions.}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ordered itemize, what about the following?

#' \enumerate{
#'  \item Return a new SparkDataFrame partitioned by the given columns into `numPartitions`.
#'  \item Return a new SparkDataFrame that has exactly `numPartitions`.
#'  \item Return a new SparkDataFrame partitioned by the given column(s),
#'                     using `spark.sql.shuffle.partitions` as number of partitions.
#' }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it mentioned "these options" (as in one of these choices) on the line before, so I thought it's nicer to numbered the options.
in other cases we have unnumbered bullets if they are enum values and so on.

#'}
#' @param x A SparkDataFrame
Expand Down Expand Up @@ -1053,7 +1054,7 @@ setMethod("limit",
dataFrame(res)
})

#' Take the first NUM rows of a SparkDataFrame and return a the results as a data.frame
#' Take the first NUM rows of a SparkDataFrame and return a the results as a R data.frame
#'
#' @family SparkDataFrame functions
#' @rdname take
Expand All @@ -1076,7 +1077,7 @@ setMethod("take",

#' Head
#'
#' Return the first NUM rows of a SparkDataFrame as a data.frame. If NUM is NULL,
#' Return the first NUM rows of a SparkDataFrame as a R data.frame. If NUM is NULL,
#' then head() returns the first 6 rows in keeping with the current data.frame
#' convention in R.
#'
Expand Down Expand Up @@ -1157,7 +1158,6 @@ setMethod("toRDD",
#'
#' @param x a SparkDataFrame
#' @return a GroupedData
#' @seealso GroupedData
#' @family SparkDataFrame functions
#' @rdname groupBy
#' @name groupBy
Expand Down Expand Up @@ -1242,9 +1242,9 @@ dapplyInternal <- function(x, func, schema) {
#'
#' @param x A SparkDataFrame
#' @param func A function to be applied to each partition of the SparkDataFrame.
#' func should have only one parameter, to which a data.frame corresponds
#' func should have only one parameter, to which a R data.frame corresponds
#' to each partition will be passed.
#' The output of func should be a data.frame.
#' The output of func should be a R data.frame.
#' @param schema The schema of the resulting SparkDataFrame after the function is applied.
#' It must match the output of func.
#' @family SparkDataFrame functions
Expand Down Expand Up @@ -1290,9 +1290,9 @@ setMethod("dapply",
#'
#' @param x A SparkDataFrame
#' @param func A function to be applied to each partition of the SparkDataFrame.
#' func should have only one parameter, to which a data.frame corresponds
#' func should have only one parameter, to which a R data.frame corresponds
#' to each partition will be passed.
#' The output of func should be a data.frame.
#' The output of func should be a R data.frame.
#' @family SparkDataFrame functions
#' @rdname dapply
#' @name dapplyCollect
Expand Down Expand Up @@ -1639,7 +1639,6 @@ setMethod("select", signature(x = "SparkDataFrame", col = "character"),
}
})

#' @family SparkDataFrame functions
#' @rdname select
#' @export
#' @note select(SparkDataFrame, Column) since 1.4.0
Expand All @@ -1652,7 +1651,6 @@ setMethod("select", signature(x = "SparkDataFrame", col = "Column"),
dataFrame(sdf)
})

#' @family SparkDataFrame functions
#' @rdname select
#' @export
#' @note select(SparkDataFrame, list) since 1.4.0
Expand Down Expand Up @@ -1999,7 +1997,6 @@ setMethod("filter",
dataFrame(sdf)
})

#' @family SparkDataFrame functions
#' @rdname filter
#' @name where
#' @note where since 1.4.0
Expand Down Expand Up @@ -2220,11 +2217,13 @@ setMethod("merge",
joinRes
})

#' Creates a list of columns by replacing the intersected ones with aliases
#'
#' Creates a list of columns by replacing the intersected ones with aliases.
#' The name of the alias column is formed by concatanating the original column name and a suffix.
#'
#' @param x a SparkDataFrame on which the
#' @param intersectedColNames a list of intersected column names
#' @param x a SparkDataFrame
#' @param intersectedColNames a list of intersected column names of the SparkDataFrame
#' @param suffix a suffix for the column name
#' @return list of columns
#'
Expand Down Expand Up @@ -2511,9 +2510,9 @@ setMethod("summary",
})


#' dropna
#' A set of SparkDataFrame functions working with NA values
#'
#' Returns a new SparkDataFrame omitting rows with null values.
#' dropna, na.omit - Returns a new SparkDataFrame omitting rows with null values.
#'
#' @param x A SparkDataFrame.
#' @param how "any" or "all".
Expand Down Expand Up @@ -2565,9 +2564,7 @@ setMethod("na.omit",
dropna(object, how, minNonNulls, cols)
})

#' fillna
#'
#' Replace null values.
#' fillna - Replace null values.
#'
#' @param x A SparkDataFrame.
#' @param value Value to replace null values with.
Expand Down Expand Up @@ -2638,7 +2635,7 @@ setMethod("fillna",
dataFrame(sdf)
})

#' Download data from a SparkDataFrame into a data.frame
#' Download data from a SparkDataFrame into a R data.frame
#'
#' This function downloads the contents of a SparkDataFrame into an R's data.frame.
#' Since data.frames are held in memory, ensure that you have enough memory
Expand Down
6 changes: 3 additions & 3 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ dispatchFunc <- function(newFuncSig, x, ...) {
}

#' return the SparkSession
#' @note getSparkSession since 2.0.0
#' @noRd
getSparkSession <- function() {
if (exists(".sparkRsession", envir = .sparkREnv)) {
get(".sparkRsession", envir = .sparkREnv)
Expand All @@ -77,7 +77,7 @@ getSparkSession <- function() {
}

#' infer the SQL type
#' @note infer_type since 1.4.0
#' @noRd
infer_type <- function(x) {
if (is.null(x)) {
stop("can not infer type from NULL")
Expand Down Expand Up @@ -451,7 +451,7 @@ sql <- function(x, ...) {
#' Create a SparkDataFrame from a SparkSQL Table
#'
#' Returns the specified Table as a SparkDataFrame. The Table must have already been registered
#' in the SQLContext.
#' in the SparkSession.
#'
#' @param tableName The SparkSQL Table to convert to a SparkDataFrame.
#' @return SparkDataFrame
Expand Down
6 changes: 6 additions & 0 deletions R/pkg/R/column.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ setOldClass("jobj")
setClass("Column",
slots = list(jc = "jobj"))

#' A set of operations working with SparkDataFrame columns
#' @rdname columnfunctions
#' @name columnfunctions
NULL

setMethod("initialize", "Column", function(.Object, jc) {
.Object@jc <- jc
.Object
Expand All @@ -47,6 +52,7 @@ setMethod("column",

#' @rdname show
#' @name show
#' @export
#' @note show(Column) since 1.4.0
setMethod("show", "Column",
function(object) {
Expand Down
5 changes: 3 additions & 2 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -225,9 +225,10 @@ setCheckpointDir <- function(sc, dirName) {
invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(dirName))))
}

#' Run a function over a list of elements, distributing the computations with Spark.
#' Run a function over a list of elements, distributing the computations with Spark
#'
#' Applies a function in a manner that is similar to doParallel or lapply to elements of a list.
#' Run a function over a list of elements, distributing the computations with Spark. Applies a
#' function in a manner that is similar to doParallel or lapply to elements of a list.
#' The computations are distributed using Spark. It is conceptually the same as the following code:
#' lapply(list, func)
#'
Expand Down
40 changes: 13 additions & 27 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,14 @@ setMethod("acos",
column(jc)
})

#' approxCountDistinct
#' Returns the approximate number of distinct items in a group
#'
#' Aggregate function: returns the approximate number of distinct items in a group.
#' Returns the approximate number of distinct items in a group. This is a column
#' aggregate function.
#'
#' @rdname approxCountDistinct
#' @name approxCountDistinct
#' @family agg_funcs
#' @return the approximate number of distinct items in a group.
#' @export
#' @examples \dontrun{approxCountDistinct(df$c)}
#' @note approxCountDistinct(Column) since 1.4.0
Expand Down Expand Up @@ -234,7 +235,7 @@ setMethod("cbrt",
column(jc)
})

#' ceil
#' Computes the ceiling of the given value
#'
#' Computes the ceiling of the given value.
#'
Expand All @@ -254,15 +255,16 @@ setMethod("ceil",
#' Though scala functions has "col" function, we don't expose it in SparkR
#' because we don't want to conflict with the "col" function in the R base
#' package and we also have "column" function exported which is an alias of "col".
#' @noRd
col <- function(x) {
column(callJStatic("org.apache.spark.sql.functions", "col", x))
}

#' column
#' Returns a Column based on the given column name
#'
#' Returns a Column based on the given column name.
#'
#' @rdname col
#' @rdname column
#' @name column
#' @family normal_funcs
#' @export
Expand Down Expand Up @@ -385,9 +387,9 @@ setMethod("cosh",
column(jc)
})

#' count
#' Returns the number of items in a group
#'
#' Aggregate function: returns the number of items in a group.
#' Returns the number of items in a group. This is a column aggregate function.
#'
#' @rdname count
#' @name count
Expand Down Expand Up @@ -1193,7 +1195,7 @@ setMethod("sha1",
#'
#' Computes the signum of the given value.
#'
#' @rdname signum
#' @rdname sign
#' @name signum
#' @family math_funcs
#' @export
Expand Down Expand Up @@ -1717,7 +1719,7 @@ setMethod("datediff", signature(y = "Column"),

#' hypot
#'
#' Computes `sqrt(a^2^ + b^2^)` without intermediate overflow or underflow.
#' Computes "sqrt(a^2 + b^2)" without intermediate overflow or underflow.
#'
#' @rdname hypot
#' @name hypot
Expand Down Expand Up @@ -1813,12 +1815,8 @@ setMethod("pmod", signature(y = "Column"),
})


#' Approx Count Distinct
#'
#' @family agg_funcs
#' @rdname approxCountDistinct
#' @name approxCountDistinct
#' @return the approximate number of distinct items in a group.
#' @export
#' @examples \dontrun{approxCountDistinct(df$c, 0.02)}
#' @note approxCountDistinct(Column, numeric) since 1.4.0
Expand Down Expand Up @@ -1918,10 +1916,6 @@ setMethod("least",
column(jc)
})

#' ceiling
#'
#' Computes the ceiling of the given value.
#'
#' @rdname ceil
#' @name ceiling
#' @export
Expand All @@ -1933,11 +1927,7 @@ setMethod("ceiling",
ceil(x)
})

#' sign
#'
#' Computes the signum of the given value.
#'
#' @rdname signum
#' @rdname sign
#' @name sign
#' @export
#' @examples \dontrun{sign(df$c)}
Expand All @@ -1961,10 +1951,6 @@ setMethod("n_distinct", signature(x = "Column"),
countDistinct(x, ...)
})

#' n
#'
#' Aggregate function: returns the number of items in a group.
#'
#' @rdname count
#' @name n
#' @export
Expand Down
Loading