-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions #28224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
just like this? @HyukjinKwon @huaxingao |
|
Test build #121314 has finished for PR 28224 at commit
|
|
Test build #121318 has finished for PR 28224 at commit
|
| Spark SQL has some categories of frequently-used built-in functions for aggregtion, arrays/maps, date/timestamp, and JSON data. | ||
| This subsection presents the usages and descriptions of these functions. | ||
|
|
||
| * [Aggregate Functions](sql-ref-functions-builtin.html#aggregate-functions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: order of bullets/words in the above statement is different from that in sql-ref-functions-builtin.html. .e. the alphabetical order in sql-ref-functions-builtin.html.
I am neutral on the order (current or consistent among them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually, I sorted them by similar functionality, e.g., array and map. On the other hand, in sql-ref-functions-builtin.html, the script sorts them just by group name.... Yea, we can update the script for sorting them in the same order though, I just want to keep it simple.
|
Looks really nice! I love this feature! Thanks a lot! @maropu @HyukjinKwon |
|
A few comments:
|
Yea, we can do. I'll update later. Thanks!
Yea, we can do, too. But, in this PR, I just want to focus on the python code and the basic set of built-in functions. As for the other function groups, I think we need to discuss more about which functions should be listed, or not.
Yea, I noticed that, but the collection group includes array, map, and JSON functions. I personally think splitting it into the three groups looks easy-to-serach for users. WDYT?
The original one had multiple document pages for the functions, so it had one more hierarchy. But, the script generates a single page for them. So, I think we don't need the hierarchy. If we need to improve the structure, feel free to do it as follow-up. |
Ur, I noticed now that all the window functions have no example tag, e.g., https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L428 We need to add examples there before generating the doc. |
I guess no window functions examples there because these examples are a little complicated. Is it OK we just generate the doc without examples? |
|
Test build #121341 has finished for PR 28224 at commit
|
|
Looks fine now. But, adding a simpler example looks good if possible. Updated in b9407ca |
|
Nice, thanks you so much @maropu. I have some ideas on my mind for example, not including the generated MD files into the Github repo. We can of course include them but it might be better to do it separately because we're already not including md files in SQL built-in functions. Do you mind if I manually push some changes into your branch (and to your PR)? Of course, feel free to edit as you want after that. My changes might be mostly just my preference or nits I guess. |
Yea, the idea look reasonable.
Of course not! Both is ok to me. |
|
Test build #121346 has finished for PR 28224 at commit
|
|
Test build #121350 has finished for PR 28224 at commit
|
|
retest this please |
|
Test build #121355 has finished for PR 28224 at commit
|
741c3ae to
7e9ebdf
Compare
|
Test build #121416 has finished for PR 28224 at commit
|
|
Test build #121515 has finished for PR 28224 at commit
|
|
Looks pretty nice, thanks for the update, @HyukjinKwon ! LGTM except for two comment. |
|
Silly question: What's the relationship of the SQL function docs being added in this PR to the SQL function docs we already have here? |
|
Have you checked the discussion in #28170 (comment) ? The document in |
|
Updated for my two comments in f302c91. Could you check again for final sign-off? @HyukjinKwon |
Actually .. I target to remove that SQL function docs in long run. For the short term, #28224 (comment) is true - we don't list all expressions yet. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I am going to merge this because the documentation generation build is passed in Github Actions. There's nothing to check with PR builder. |
|
Merged to master and branch-3.0. |
…ilt-in functions This PR intends to add a Python script to generates a SQL document for built-in functions and the document in SQL references. To make SQL references complete. Yes;    Manually checked and added tests. Closes #28224 from maropu/SPARK-31429. Lead-authored-by: Takeshi Yamamuro <[email protected]> Co-authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit e42dbe7) Signed-off-by: HyukjinKwon <[email protected]>
|
@huaxingao do you want to separate pages, and address the leftover comments at #28224 (comment)? Should be pretty straightforward to do it now. |
|
Test build #121550 has finished for PR 28224 at commit
|
| ### Aggregate Functions | ||
| {% include_relative generated-agg-funcs-table.html %} | ||
| #### Examples | ||
| {% include_relative generated-agg-funcs-examples.html %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we put all the examples in one section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we currently put examples per group in one section like this;
### Aggrgate Functions
<A list of aggregate functions>
#### Examples
<All examples of the aggregate functions listed above>
### Window Functions
<A list of aggregate functions>
#### Examples
<All examples of the window functions listed above>
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to put everything about one function (signature, desc, examples, etc.) together. Did we decide to do this due to technical difficulties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it. I just a bit rushed to merge this to keep this PR narrow-scoped as a base work.
Nothing is completely pinned at this moment.
As an easy way, we can refer other DBMSs documentations and mimic I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so and we can improve the structure by follow-up activities. But, I'm not sure the improvement should be applied at the 3.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea we can improve it in 3.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks, I'll file jira later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
How about |
|
Sounds good. I will fix this. |
|
Thanks @huaxingao! |
### What changes were proposed in this pull request? This PR intends to add a new test suite for `ExpressionInfo`. Major changes are as follows; - Added a new test suite named `ExpressionInfoSuite` - To improve test coverage, added a test for error handling in `ExpressionInfoSuite` - Moved the `ExpressionInfo`-related tests from `UDFSuite` to `ExpressionInfoSuite` - Moved the related tests from `SQLQuerySuite` to `ExpressionInfoSuite` - Added a comment in `ExpressionInfoSuite` (followup of #28224) ### Why are the changes needed? To improve test suites/coverage. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added tests. Closes #28308 from maropu/SPARK-31526. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
### What changes were proposed in this pull request? This PR intends to add a new test suite for `ExpressionInfo`. Major changes are as follows; - Added a new test suite named `ExpressionInfoSuite` - To improve test coverage, added a test for error handling in `ExpressionInfoSuite` - Moved the `ExpressionInfo`-related tests from `UDFSuite` to `ExpressionInfoSuite` - Moved the related tests from `SQLQuerySuite` to `ExpressionInfoSuite` - Added a comment in `ExpressionInfoSuite` (followup of #28224) ### Why are the changes needed? To improve test suites/coverage. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added tests. Closes #28308 from maropu/SPARK-31526. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 42f496f) Signed-off-by: HyukjinKwon <[email protected]>
|
Thank you, this looks great! |
| ### JSON Functions | ||
| {% include_relative generated-json-funcs-table.html %} | ||
| #### Examples | ||
| {% include_relative generated-agg-funcs-examples.html %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generated-agg-funcs-examples.html -> generated-json-funcs-examples.html ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea... @huaxingao @dilipbiswal Could you include this fix in your open PRs? #28451
or #28433
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @huaxingao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, it was my mistake I guess. Thanks guys,
|
So far, it just covers a subset. Could we add more categories to cover more built-in functions listed in https://spark.apache.org/docs/latest/api/sql/index.html? For your reference, Presto has the complete list https://prestodb.io/docs/current/functions.html cc @huaxingao @maropu |
|
Filed in https://issues.apache.org/jira/browse/SPARK-33124 |
|
Yeah, I can take a look |

What changes were proposed in this pull request?
This PR intends to add a Python script to generates a SQL document for built-in functions and the document in SQL references.
Why are the changes needed?
To make SQL references complete.
Does this PR introduce any user-facing change?
Yes;
How was this patch tested?
Manually checked and added tests.