Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2143,6 +2143,11 @@ setMethod("selectExpr",
#' Return a new SparkDataFrame by adding a column or replacing the existing column
#' that has the same name.
#'
#' Note: This method introduces a projection internally. Therefore, calling it multiple times,
#' for instance, via loops in order to add multiple columns can generate big plans which
#' can cause performance issues and even \code{StackOverflowException}. To avoid this,
#' use \code{select} with the multiple columns at once.
#'
#' @param x a SparkDataFrame.
#' @param colName a column name.
#' @param col a Column expression (which must refer only to this SparkDataFrame), or an atomic
Expand Down
5 changes: 5 additions & 0 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1974,6 +1974,11 @@ def withColumn(self, colName, col):
:param colName: string, name of the new column.
:param col: a :class:`Column` expression for the new column.

.. note:: This method introduces a projection internally. Therefore, calling it multiple
times, for instance, via loops in order to add multiple columns can generate big
plans which can cause performance issues and even `StackOverflowException`.
To avoid this, use :func:`select` with the multiple columns at once.

>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name=u'Alice', age2=4), Row(age=5, name=u'Bob', age2=7)]

Expand Down
8 changes: 4 additions & 4 deletions sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2151,10 +2151,10 @@ class Dataset[T] private[sql](
* `column`'s expression must only refer to attributes supplied by this Dataset. It is an
* error to add a column that refers to some other Dataset.
*
* Please notice that this method introduces a `Project`. This means that using it in loops in
* order to add several columns can generate very big plans which can cause huge performance
* issues and even `StackOverflowException`s. A much better alternative use `select` with the
* list of columns to add.
* @note this method introduces a projection internally. Therefore, calling it multiple times,
* for instance, via loops in order to add multiple columns can generate big plans which
* can cause performance issues and even `StackOverflowException`. To avoid this,
* use `select` with the multiple columns at once.
*
* @group untypedrel
* @since 2.0.0
Expand Down