-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started #28290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@HyukjinKwon @maropu |
docs/sql-getting-started.md
Outdated
| ## Scalar Functions | ||
| (to be filled soon) | ||
|
|
||
| Scalar functions are functions that return a single value per row, as opposed to aggregation functions, which return a value for a group of rows. Spark SQL supports a variety of built-in scalar functions such as [Array Functions](sql-ref-functions-builtin.html#array-functions), [Map Functions](sql-ref-functions-builtin.html#map-functions), [Date and Timestamp Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions), etc. It also supports [User Defined Scalar Functions](sql-ref-functions-udf-scalar.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to describe aggregate/window functions? Also, I think its better not to describe each function group name here because we might forget to update this page every time we add a new group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used Scalar function instead of built-in function (because the next paragraph is for aggregations). I don't want to talk about window function because I think window function is still sort of aggregation, even thought it returns result for each row.
I can't link it to builtin function because builtin function contains aggregation function, so I link to each group of the builtin scalar functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused; the next section describes the DataFrame aggregates in SQL documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the next paragraph talks about builtin aggregation functions and user defined aggregation functions, so I guess it's better for our paragraph to talk about scalar function than built-in function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we need to organize both paragraphs more. For example, how about this?
Adds two categories for built-in functions in sql-ref-functions.html#built-in-functions;
### Built-in Functions
Spark SQL has some ...
#### Scalar Functions
Array Functions
Map Functions
Date and Timestamp Functions
JSON Functions
#### Aggregate-like Functions
Aggregate Functions
Window Functions
Then,
## Scalar Functions
A short description about scalar built-in functions and UDFs, then we add two links to their scalar sections.
## Aggregate Functions
A short description about aggregate built-in functions and UDFs, then we add two links to their aggregate sections.
Anyway, I think both categories had better to have same granularity description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... Seems Window functions still should be put in the Scalar function category. I will double check tomorrow.
|
Test build #121607 has finished for PR 28290 at commit
|
|
Test build #121656 has finished for PR 28290 at commit
|
|
retest this please |
docs/sql-ref-functions.md
Outdated
| * [Map Functions](sql-ref-functions-builtin.html#map-functions) | ||
| * [Date and Timestamp Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions) | ||
| * [JSON Functions](sql-ref-functions-builtin.html#json-functions) | ||
| * [Window Functions](sql-ref-functions-builtin.html#window-functions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... Seems Window functions still should be put in the Scalar function category. I will double check tomorrow.
Why do you think so? Putting this in the scalar category still looks weird to me.
The other parts look ok. Also, could you file a jira, just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this https://issues.apache.org/jira/browse/SPARK-29458. I will use this jira.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition, scalar function returns a single value per row, while aggregate function returns a single value on a group of rows. Since window function returns a value per row, seems more suitable to put into scalar function category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think vector input & output rather than scalar input & output. WDYT? @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how snowflake categorizes the functions: https://docs.snowflake.com/en/sql-reference-functions.html.
Snowflake lists window function in parallel with scalar function and aggregate function, but it also says "Window Functions — subset of aggregate functions that can operate on a subset of rows."
I guess it's safe to use your proposal
#### Aggregate-like Functions
Aggregate Functions
Window Functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the last one looks fine. I think we can improve the structure later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am a little confused... you prefer to put window function in scalar function, or aggregate-like function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregate-like function one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... let me change it now
|
Test build #121663 has finished for PR 28290 at commit
|
|
Test build #121706 has finished for PR 28290 at commit
|
|
btw, I just noticed now that we don't have any statement about a LIKE caluse? Probably, we need to document it in |
|
I will add LIKE clause. quick syc-up:
Things remaining:
I will leave sub-query one to Dilip for 3.1.0 |
|
Looks nice, thanks! cc: @dilipbiswal @gatorsmile |
|
I roughly checked if the keywords are included in the documents: https://gist.github.com/maropu/d48733b7683dbb608452ee38833e2dab It might be worth adding items below in the docs, too;
|
|
Is this much OK as-is or does it need more documentation? |
…etting started ### What changes were proposed in this pull request? Add a paragraph for scalar function in sql getting started ### Why are the changes needed? To make 3.0 doc complete. ### Does this PR introduce any user-facing change? before: <img width="870" alt="Screen Shot 2020-04-21 at 10 11 12 PM" src="https://user-images.githubusercontent.com/13592258/79943182-16d1fd00-841d-11ea-9744-9cdd58d83f81.png"> after: <img width="865" alt="Screen Shot 2020-04-22 at 11 49 59 PM" src="https://user-images.githubusercontent.com/13592258/80068256-26704500-84f4-11ea-9845-c835927c027e.png"> <img width="1033" alt="Screen Shot 2020-04-23 at 6 22 53 PM" src="https://user-images.githubusercontent.com/13592258/80165100-82d47280-858f-11ea-8c84-1ef702cc1bff.png"> ### How was this patch tested? Closes #28290 from huaxingao/scalar. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit dcc0902) Signed-off-by: Sean Owen <[email protected]>
|
Merged to master/3.0 |
|
note: I've filed jira so that we wouldn't forget to add them in the documents: https://issues.apache.org/jira/browse/SPARK-31753 |
What changes were proposed in this pull request?
Add a paragraph for scalar function in sql getting started
Why are the changes needed?
To make 3.0 doc complete.
Does this PR introduce any user-facing change?
before:

after:

How was this patch tested?