Skip to content

Commit e42dbe7

Browse files
maropuHyukjinKwon
andcommitted
[SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
### What changes were proposed in this pull request? This PR intends to add a Python script to generates a SQL document for built-in functions and the document in SQL references. ### Why are the changes needed? To make SQL references complete. ### Does this PR introduce any user-facing change? Yes; ![a](https://user-images.githubusercontent.com/692303/79406712-c39e1b80-7fd2-11ea-8b85-9f9cbb6efed3.png) ![b](https://user-images.githubusercontent.com/692303/79320526-eb46a280-7f44-11ea-8639-90b1fb2b8848.png) ![c](https://user-images.githubusercontent.com/692303/79320707-3365c500-7f45-11ea-9984-69ffe800fb87.png) ### How was this patch tested? Manually checked and added tests. Closes #28224 from maropu/SPARK-31429. Lead-authored-by: Takeshi Yamamuro <[email protected]> Co-authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent 4f8b03d commit e42dbe7

38 files changed

+528
-42
lines changed

docs/.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
sql-configs.html
1+
generated-*.html

docs/_data/menu-sql.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,8 @@
246246
- text: Functions
247247
url: sql-ref-functions.html
248248
subitems:
249+
- text: Built-in Functions
250+
url: sql-ref-functions-builtin.html
249251
- text: Scalar UDFs (User-Defined Functions)
250252
url: sql-ref-functions-udf-scalar.html
251253
- text: UDAFs (User-Defined Aggregate Functions)

docs/configuration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2623,10 +2623,10 @@ Spark subsystems.
26232623

26242624

26252625
{% for static_file in site.static_files %}
2626-
{% if static_file.name == 'sql-configs.html' %}
2626+
{% if static_file.name == 'generated-sql-configuration-table.html' %}
26272627
### Spark SQL
26282628

2629-
{% include_relative sql-configs.html %}
2629+
{% include_relative generated-sql-configuration-table.html %}
26302630
{% break %}
26312631
{% endif %}
26322632
{% endfor %}

docs/sql-ref-functions-builtin.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
layout: global
3+
title: Built-in Functions
4+
displayTitle: Built-in Functions
5+
license: |
6+
Licensed to the Apache Software Foundation (ASF) under one or more
7+
contributor license agreements. See the NOTICE file distributed with
8+
this work for additional information regarding copyright ownership.
9+
The ASF licenses this file to You under the Apache License, Version 2.0
10+
(the "License"); you may not use this file except in compliance with
11+
the License. You may obtain a copy of the License at
12+
http://www.apache.org/licenses/LICENSE-2.0
13+
Unless required by applicable law or agreed to in writing, software
14+
distributed under the License is distributed on an "AS IS" BASIS,
15+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
See the License for the specific language governing permissions and
17+
limitations under the License.
18+
---
19+
20+
{% for static_file in site.static_files %}
21+
{% if static_file.name == 'generated-agg-funcs-table.html' %}
22+
### Aggregate Functions
23+
{% include_relative generated-agg-funcs-table.html %}
24+
#### Examples
25+
{% include_relative generated-agg-funcs-examples.html %}
26+
{% break %}
27+
{% endif %}
28+
{% endfor %}
29+
30+
{% for static_file in site.static_files %}
31+
{% if static_file.name == 'generated-window-funcs-table.html' %}
32+
### Window Functions
33+
{% include_relative generated-window-funcs-table.html %}
34+
{% break %}
35+
{% endif %}
36+
{% endfor %}
37+
38+
{% for static_file in site.static_files %}
39+
{% if static_file.name == 'generated-array-funcs-table.html' %}
40+
### Array Functions
41+
{% include_relative generated-array-funcs-table.html %}
42+
#### Examples
43+
{% include_relative generated-array-funcs-examples.html %}
44+
{% break %}
45+
{% endif %}
46+
{% endfor %}
47+
48+
{% for static_file in site.static_files %}
49+
{% if static_file.name == 'generated-map-funcs-table.html' %}
50+
### Map Functions
51+
{% include_relative generated-map-funcs-table.html %}
52+
#### Examples
53+
{% include_relative generated-map-funcs-examples.html %}
54+
{% break %}
55+
{% endif %}
56+
{% endfor %}
57+
58+
{% for static_file in site.static_files %}
59+
{% if static_file.name == 'generated-datetime-funcs-table.html' %}
60+
### Date and Timestamp Functions
61+
{% include_relative generated-datetime-funcs-table.html %}
62+
#### Examples
63+
{% include_relative generated-datetime-funcs-examples.html %}
64+
{% break %}
65+
{% endif %}
66+
{% endfor %}
67+
68+
{% for static_file in site.static_files %}
69+
{% if static_file.name == 'generated-json-funcs-table.html' %}
70+
### JSON Functions
71+
{% include_relative generated-json-funcs-table.html %}
72+
#### Examples
73+
{% include_relative generated-agg-funcs-examples.html %}
74+
{% break %}
75+
{% endif %}
76+
{% endfor %}
77+

docs/sql-ref-functions.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,18 @@ license: |
2222
Spark SQL provides two function features to meet a wide range of user needs: built-in functions and user-defined functions (UDFs).
2323
Built-in functions are commonly used routines that Spark SQL predefines and a complete list of the functions can be found in the [Built-in Functions](api/sql/) API document. UDFs allow users to define their own functions when the system’s built-in functions are not enough to perform the desired task.
2424

25+
### Built-in Functions
26+
27+
Spark SQL has some categories of frequently-used built-in functions for aggregtion, arrays/maps, date/timestamp, and JSON data.
28+
This subsection presents the usages and descriptions of these functions.
29+
30+
* [Aggregate Functions](sql-ref-functions-builtin.html#aggregate-functions)
31+
* [Window Functions](sql-ref-functions-builtin.html#window-functions)
32+
* [Array Functions](sql-ref-functions-builtin.html#array-functions)
33+
* [Map Functions](sql-ref-functions-builtin.html#map-functions)
34+
* [Date and Timestamp Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions)
35+
* [JSON Functions](sql-ref-functions-builtin.html#json-functions)
36+
2537
### UDFs (User-Defined Functions)
2638

2739
User-Defined Functions (UDFs) are a feature of Spark SQL that allows users to define their own functions when the system's built-in functions are not enough to perform the desired task. To use UDFs in Spark SQL, users must first define the function, then register the function with Spark, and finally call the registered function. The User-Defined Functions can act on a single row or act on multiple rows at once. Spark SQL also supports integration of existing Hive implementations of UDFs, UDAFs and UDTFs.

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,21 +31,24 @@
3131
* `usage()` will be used for the function usage in brief way.
3232
*
3333
* These below are concatenated and used for the function usage in verbose way, suppose arguments,
34-
* examples, note, since and deprecated will be provided.
34+
* examples, note, group, since and deprecated will be provided.
3535
*
3636
* `arguments()` describes arguments for the expression.
3737
*
3838
* `examples()` describes examples for the expression.
3939
*
4040
* `note()` contains some notes for the expression optionally.
4141
*
42+
* `group()` describes the category that the expression belongs to. The valid value is
43+
* "agg_funcs", "array_funcs", "datetime_funcs", "json_funcs", "map_funcs" and "window_funcs".
44+
*
4245
* `since()` contains version information for the expression. Version is specified by,
4346
* for example, "2.2.0".
4447
*
4548
* `deprecated()` contains deprecation information for the expression optionally, for example,
4649
* "Deprecated since 2.2.0. Use something else instead".
4750
*
48-
* The format, in particular for `arguments()`, `examples()`,`note()`, `since()` and
51+
* The format, in particular for `arguments()`, `examples()`,`note()`, `group()`, `since()` and
4952
* `deprecated()`, should strictly be as follows.
5053
*
5154
* <pre>
@@ -68,6 +71,7 @@
6871
* note = """
6972
* ...
7073
* """,
74+
* group = "agg_funcs",
7175
* since = "3.0.0",
7276
* deprecated = """
7377
* ...
@@ -78,8 +82,9 @@
7882
* We can refer the function name by `_FUNC_`, in `usage()`, `arguments()` and `examples()` as
7983
* it is registered in `FunctionRegistry`.
8084
*
81-
* Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()`, `since()` and
82-
* `deprecated()` should be not defined together. `extended()` exists for backward compatibility.
85+
* Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()`, `group()`,
86+
* `since()` and `deprecated()` should be not defined together. `extended()` exists
87+
* for backward compatibility.
8388
*
8489
* Note this contents are used in the SparkSQL documentation for built-in functions. The contents
8590
* here are considered as a Markdown text and then rendered.
@@ -98,6 +103,7 @@
98103
String arguments() default "";
99104
String examples() default "";
100105
String note() default "";
106+
String group() default "";
101107
String since() default "";
102108
String deprecated() default "";
103109
}

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@
1919

2020
import com.google.common.annotations.VisibleForTesting;
2121

22+
import java.util.Arrays;
23+
import java.util.HashSet;
24+
import java.util.Set;
25+
2226
/**
2327
* Expression information, will be used to describe a expression.
2428
*/
@@ -31,9 +35,14 @@ public class ExpressionInfo {
3135
private String arguments;
3236
private String examples;
3337
private String note;
38+
private String group;
3439
private String since;
3540
private String deprecated;
3641

42+
private static final Set<String> validGroups =
43+
new HashSet<>(Arrays.asList("agg_funcs", "array_funcs", "datetime_funcs",
44+
"json_funcs", "map_funcs", "window_funcs"));
45+
3746
public String getClassName() {
3847
return className;
3948
}
@@ -75,6 +84,10 @@ public String getDeprecated() {
7584
return deprecated;
7685
}
7786

87+
public String getGroup() {
88+
return group;
89+
}
90+
7891
public String getDb() {
7992
return db;
8093
}
@@ -87,13 +100,15 @@ public ExpressionInfo(
87100
String arguments,
88101
String examples,
89102
String note,
103+
String group,
90104
String since,
91105
String deprecated) {
92106
assert name != null;
93107
assert arguments != null;
94108
assert examples != null;
95109
assert examples.isEmpty() || examples.contains(" Examples:");
96110
assert note != null;
111+
assert group != null;
97112
assert since != null;
98113
assert deprecated != null;
99114

@@ -104,6 +119,7 @@ public ExpressionInfo(
104119
this.arguments = arguments;
105120
this.examples = examples;
106121
this.note = note;
122+
this.group = group;
107123
this.since = since;
108124
this.deprecated = deprecated;
109125

@@ -120,6 +136,11 @@ public ExpressionInfo(
120136
}
121137
this.extended += "\n Note:\n " + note.trim() + "\n";
122138
}
139+
if (!group.isEmpty() && !validGroups.contains(group)) {
140+
throw new IllegalArgumentException("'group' is malformed in the expression [" +
141+
this.name + "]. It should be a value in " + validGroups + "; however, " +
142+
"got [" + group + "].");
143+
}
123144
if (!since.isEmpty()) {
124145
if (Integer.parseInt(since.split("\\.")[0]) < 0) {
125146
throw new IllegalArgumentException("'since' is malformed in the expression [" +
@@ -140,11 +161,11 @@ public ExpressionInfo(
140161
}
141162

142163
public ExpressionInfo(String className, String name) {
143-
this(className, null, name, null, "", "", "", "", "");
164+
this(className, null, name, null, "", "", "", "", "", "");
144165
}
145166

146167
public ExpressionInfo(String className, String db, String name) {
147-
this(className, db, name, null, "", "", "", "", "");
168+
this(className, db, name, null, "", "", "", "", "", "");
148169
}
149170

150171
/**
@@ -155,7 +176,7 @@ public ExpressionInfo(String className, String db, String name) {
155176
public ExpressionInfo(String className, String db, String name, String usage, String extended) {
156177
// `arguments` and `examples` are concatenated for the extended description. So, here
157178
// simply pass the `extended` as `arguments` and an empty string for `examples`.
158-
this(className, db, name, usage, extended, "", "", "", "");
179+
this(className, db, name, usage, extended, "", "", "", "", "");
159180
}
160181

161182
private String replaceFunctionName(String usage) {

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -655,7 +655,7 @@ object FunctionRegistry {
655655
val clazz = scala.reflect.classTag[Cast].runtimeClass
656656
val usage = "_FUNC_(expr) - Casts the value `expr` to the target data type `_FUNC_`."
657657
val expressionInfo =
658-
new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "", "")
658+
new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "", "", "")
659659
(name, (expressionInfo, builder))
660660
}
661661

@@ -675,6 +675,7 @@ object FunctionRegistry {
675675
df.arguments(),
676676
df.examples(),
677677
df.note(),
678+
df.group(),
678679
df.since(),
679680
df.deprecated())
680681
} else {

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ import org.apache.spark.sql.types._
6565
> SELECT _FUNC_(10.0, 0.5, 100);
6666
10.0
6767
""",
68+
group = "agg_funcs",
6869
since = "2.1.0")
6970
case class ApproximatePercentile(
7071
child: Expression,

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ import org.apache.spark.sql.types._
3232
> SELECT _FUNC_(col) FROM VALUES (1), (2), (NULL) AS tab(col);
3333
1.5
3434
""",
35+
group = "agg_funcs",
3536
since = "1.0.0")
3637
case class Average(child: Expression) extends DeclarativeAggregate with ImplicitCastInputTypes {
3738

0 commit comments

Comments
 (0)