Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/ml-classification-regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat

Refer to the [R API docs](api/R/spark.fmClassifier.html) for more details.

Note: At the moment SparkR doesn't suport feature scaling.
Note: At the moment SparkR doesn't support feature scaling.

{% include_example r/ml/fmClassifier.R %}
</div>
Expand Down Expand Up @@ -1105,7 +1105,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression.

Refer to the [R API documentation](api/R/spark.fmRegressor.html) for more details.

Note: At the moment SparkR doesn't suport feature scaling.
Note: At the moment SparkR doesn't support feature scaling.

{% include_example r/ml/fmRegressor.R %}
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ SPARK_WORKER_OPTS supports the following system properties:
overlap with `spark.worker.cleanup.enabled`, as this enables cleanup of non-shuffle files in
local directories of a dead executor, while `spark.worker.cleanup.enabled` enables cleanup of
all files/subdirectories of a stopped and timeout application.
This only affects Standalone mode, support of other cluster manangers can be added in the future.
This only affects Standalone mode, support of other cluster managers can be added in the future.
</td>
<td>2.4.0</td>
</tr>
Expand Down
6 changes: 3 additions & 3 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,15 @@ license: |

- In Spark 3.0, `CREATE TABLE` without a specific provider uses the value of `spark.sql.sources.default` as its provider. In Spark version 2.4 and below, it was Hive. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createHiveTableByDefault.enabled` to `true`.

- In Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception is thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and below, type conversions during table insertion are allowed as long as they are valid `Cast`. When inserting an out-of-range value to a integral field, the low-order bits of the value is inserted(the same as Java/Scala numeric type casting). For example, if 257 is inserted to a field of byte type, the result is 1. The behavior is controlled by the option `spark.sql.storeAssignmentPolicy`, with a default value as "ANSI". Setting the option as "Legacy" restores the previous behavior.
- In Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception is thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and below, type conversions during table insertion are allowed as long as they are valid `Cast`. When inserting an out-of-range value to an integral field, the low-order bits of the value is inserted(the same as Java/Scala numeric type casting). For example, if 257 is inserted to a field of byte type, the result is 1. The behavior is controlled by the option `spark.sql.storeAssignmentPolicy`, with a default value as "ANSI". Setting the option as "Legacy" restores the previous behavior.

- The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set.

- Spark 2.4 and below: the `SET` command works without any warnings even if the specified key is for `SparkConf` entries and it has no effect because the command does not update `SparkConf`, but the behavior might confuse users. In 3.0, the command fails if a `SparkConf` key is used. You can disable such a check by setting `spark.sql.legacy.setCommandRejectsSparkCoreConfs` to `false`.

- Refreshing a cached table would trigger a table uncache operation and then a table cache (lazily) operation. In Spark version 2.4 and below, the cache name and storage level are not preserved before the uncache operation. Therefore, the cache name and storage level could be changed unexpectedly. In Spark 3.0, cache name and storage level are first preserved for cache recreation. It helps to maintain a consistent cache behavior upon table refreshing.

- In Spark 3.0, the properties listing below become reserved; commands fail if you specify reserved properties in places like `CREATE DATABASE ... WITH DBPROPERTIES` and `ALTER TABLE ... SET TBLPROPERTIES`. You need their specific clauses to specify them, for example, `CREATE DATABASE test COMMENT 'any comment' LOCATION 'some path'`. You can set `spark.sql.legacy.notReserveProperties` to `true` to ignore the `ParseException`, in this case, these properties will be silently removed, for example: `SET DBPROTERTIES('location'='/tmp')` will have no effect. In Spark version 2.4 and below, these properties are neither reserved nor have side effects, for example, `SET DBPROTERTIES('location'='/tmp')` do not change the location of the database but only create a headless property just like `'a'='b'`.
- In Spark 3.0, the properties listing below become reserved; commands fail if you specify reserved properties in places like `CREATE DATABASE ... WITH DBPROPERTIES` and `ALTER TABLE ... SET TBLPROPERTIES`. You need their specific clauses to specify them, for example, `CREATE DATABASE test COMMENT 'any comment' LOCATION 'some path'`. You can set `spark.sql.legacy.notReserveProperties` to `true` to ignore the `ParseException`, in this case, these properties will be silently removed, for example: `SET DBPROPERTIES('location'='/tmp')` will have no effect. In Spark version 2.4 and below, these properties are neither reserved nor have side effects, for example, `SET DBPROPERTIES('location'='/tmp')` do not change the location of the database but only create a headless property just like `'a'='b'`.

| Property (case sensitive) | Database Reserved | Table Reserved | Remarks |
| ------------------------- | ----------------- | -------------- | ------- |
Expand Down Expand Up @@ -130,7 +130,7 @@ license: |

- In Spark 3.0, negative scale of decimal is not allowed by default, for example, data type of literal like `1E10BD` is `DecimalType(11, 0)`. In Spark version 2.4 and below, it was `DecimalType(2, -9)`. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.allowNegativeScaleOfDecimal` to `true`.

- In Spark 3.0, the unary arithmetic operator plus(`+`) only accepts string, numeric and interval type values as inputs. Besides, `+` with a integral string representation is coerced to a double value, for example, `+'1'` returns `1.0`. In Spark version 2.4 and below, this operator is ignored. There is no type checking for it, thus, all type values with a `+` prefix are valid, for example, `+ array(1, 2)` is valid and results `[1, 2]`. Besides, there is no type coercion for it at all, for example, in Spark 2.4, the result of `+'1'` is string `1`.
- In Spark 3.0, the unary arithmetic operator plus(`+`) only accepts string, numeric and interval type values as inputs. Besides, `+` with an integral string representation is coerced to a double value, for example, `+'1'` returns `1.0`. In Spark version 2.4 and below, this operator is ignored. There is no type checking for it, thus, all type values with a `+` prefix are valid, for example, `+ array(1, 2)` is valid and results `[1, 2]`. Besides, there is no type coercion for it at all, for example, in Spark 2.4, the result of `+'1'` is string `1`.

- In Spark 3.0, Dataset query fails if it contains ambiguous column reference that is caused by self join. A typical example: `val df1 = ...; val df2 = df1.filter(...);`, then `df1.join(df2, df1("a") > df2("a"))` returns an empty result which is quite confusing. This is because Spark cannot resolve Dataset column references that point to tables being self joined, and `df1("a")` is exactly the same as `df2("a")` in Spark. To restore the behavior before Spark 3.0, you can set `spark.sql.analyzer.failAmbiguousSelfJoin` to `false`.

Expand Down
4 changes: 2 additions & 2 deletions docs/sql-ref-functions-udf-hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ An example below uses [GenericUDFAbs](https://github.com/apache/hive/blob/master

{% highlight sql %}
-- Register `GenericUDFAbs` and use it in Spark SQL.
-- Note that, if you use your own programmed one, you need to add a JAR containig it
-- Note that, if you use your own programmed one, you need to add a JAR containing it
-- into a classpath,
-- e.g., ADD JAR yourHiveUDF.jar;
CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs';
Expand Down Expand Up @@ -105,4 +105,4 @@ SELECT key, hiveUDAF(value) FROM t GROUP BY key;
| b| 3|
| a| 3|
+---+---------------+
{% endhighlight %}
{% endhighlight %}
2 changes: 1 addition & 1 deletion docs/sql-ref-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Built-in functions are commonly used routines that Spark SQL predefines and a co

### Built-in Functions

Spark SQL has some categories of frequently-used built-in functions for aggregtion, arrays/maps, date/timestamp, and JSON data.
Spark SQL has some categories of frequently-used built-in functions for aggregation, arrays/maps, date/timestamp, and JSON data.
This subsection presents the usages and descriptions of these functions.

#### Scalar Functions
Expand Down
2 changes: 1 addition & 1 deletion docs/sql-ref-syntax-aux-describe-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ DESCRIBE QUERY WITH all_names_cte
| name| string| null|
+--------+---------+-------+

-- Returns column metadata information for a inline table.
-- Returns column metadata information for an inline table.
DESC QUERY VALUES(100, 'John', 10000.20D) AS employee(id, name, salary);
+--------+---------+-------+
|col_name|data_type|comment|
Expand Down
4 changes: 2 additions & 2 deletions docs/web-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ This page displays the details of a specific job identified by its job ID.
The Stages tab displays a summary page that shows the current state of all stages of all jobs in
the Spark application.

At the beginning of the page is the summary with the count of all stages by status (active, pending, completed, sikipped, and failed)
At the beginning of the page is the summary with the count of all stages by status (active, pending, completed, skipped, and failed)

<p style="text-align: center;">
<img src="img/AllStagesPageDetail1.png" title="Stages header" alt="Stages header" width="30%">
Expand Down Expand Up @@ -136,7 +136,7 @@ Summary metrics for all task are represented in a table and in a timeline.
* **[Tasks deserialization time](configuration.html#compression-and-serialization)**
* **Duration of tasks**.
* **GC time** is the total JVM garbage collection time.
* **Result serialization time** is the time spent serializing the task result on a executor before sending it back to the driver.
* **Result serialization time** is the time spent serializing the task result on an executor before sending it back to the driver.
* **Getting result time** is the time that the driver spends fetching task results from workers.
* **Scheduler delay** is the time the task waits to be scheduled for execution.
* **Peak execution memory** is the maximum memory used by the internal data structures created during shuffles, aggregations and joins.
Expand Down