[MINOR][SQL][DOCS] Remove two leading spaces from sql tables #28348

huaxingao · 2020-04-26T01:32:12Z

What changes were proposed in this pull request?

Remove two leading spaces from sql tables.

Why are the changes needed?

Follow the format of other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html.

Does this PR introduce any user-facing change?

before

SELECT * FROM  test;
  +-+
  ...
  +-+

after

SELECT * FROM  test;
+-+
...
+-+

How was this patch tested?

Manually build and check

huaxingao · 2020-04-26T01:42:53Z

cc @HyukjinKwon @maropu

SparkQA · 2020-04-26T01:47:39Z

Test build #121824 has finished for PR 28348 at commit d484f7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-04-26T02:15:20Z

I think its better to fix this issue in a final clean-up with other minor issues, e.g., typo. The current open documentation PRs are last ones for the 3.0 release?

huaxingao · 2020-04-26T04:11:17Z

I plan to do one more sql ref PR for 3.0: modify sql-ref.html page to add more descriptions and also add links to each subsections. Then final check and clean-up. For the remaining keywords (such as EXPLAIN LOGICAL, PIVOT, etc) we need to document, I want to defer to 3.1. I have a few ML stuff that are half way done and need to go back to take care of those.

dongjoon-hyun

I'm not against this one, but I thought this was for separating the command and result in md environment. In other language environment in Spark, we use commenting result for that purpose.

BTW, this will be enforced for all SQL documents for consistency eventually, right?

huaxingao · 2020-04-26T22:58:35Z

@dongjoon-hyun
Do you mean to comment out the result like this?

val sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

In scala/java/python/R examples, we need to comment out the results because these results are not part of the file. We just want to show the users what results the sql statement will generate. However, in sql examples, because it is a REPL, the sql cmd is evaluated and the result is be shown in the cmd, so it makes sense to show the result. Also, other sql references (e.g. Oracle, Snowflake) show the result after the sql statement without commenting out. We probably want to be consistent with these sql references.

BTW, this will be enforced for all SQL documents for consistency eventually, right?

Yes, I am trying to make all the SQL documents consistent :)

dongjoon-hyun · 2020-04-26T23:16:25Z

Apache Spark is different from SQL-only engines' documents. As you know, Apache Spark has four REPL environments. The code is also runnable in shell environments.

spark-shell for Scala
pyspark for Python
spark for R
spark-sql for SQL

Also, we can keep SQL statements in the SQL source file and execute with spark-sql's -f option (in addition to -i option). I believe that the only reason why SQL sources are not on the files is that the documents arrived relatively recently.

huaxingao · 2020-04-27T00:12:57Z

@dongjoon-hyun Thanks for the quick reply.
I guess what I am trying to say is that the scala/java/python/R examples are directly from the examples files. The results in the files have to be commented out.
When writing the SQL reference, we simply use spark-sql to run the sql statement and show users what the result is. It seems to me that it makes more sense not to comment out the result, and I also think we should be consistent with the majority of the sql reference format.

Anyway, is everybody OK with not commenting out the sql result? :)

dongjoon-hyun · 2020-04-27T00:18:14Z

Ur, I'm not saying that we need commenting in SQL. The existing two spaces was considered as a separation like the other languages.

I thought this was for separating the command and result in md environment.

Also, I'm not against this approach as I stated in my first sentence. :)

I'm not against this one, ...

We can be consistent in this way and that way. It's great that we can be in any way.

HyukjinKwon · 2020-04-27T02:41:16Z

Similar discussions happened internally and externally here and there. In short, I first thought we should comment the output and use the direct output from spark-sql to be consistent with other examples. Seems like we're using show() (or beeline-like format).

After the discussions, I think it's just better to stick to the references in other DBMSs. This way can be inconsistent across the different language APIs but maybe here is where we should think about the language specific convention to make it friendly to each user group. For example, users dedicated to SQL might likely expect SQL-friendly docs.

HyukjinKwon · 2020-04-27T02:53:46Z

We could move all to separate example files and comment the output out too to be consistent. I am good with that approach too. We just need to choose one approach with reasonable logics, and stick to it.

gatorsmile · 2020-04-27T03:26:12Z

Let us follow the SQL reference by the other mainstream database vendors?

@huaxingao Since you are trying to changing all the query example, it would be nice if the queries in our SQL Reference docs can be highlighted using SQL syntax. Could you check how to enable it in Markdown files?

huaxingao · 2020-04-27T04:03:47Z

@gatorsmile
We actually highlight SQL keywords. We have

{% highlight sql %}
......
-- Correlated Subquery in `WHERE` clause.
SELECT * FROM person AS parent
    WHERE EXISTS (
        SELECT 1 FROM person AS child
        WHERE parent.id = child.id AND child.age IS NULL
    );
......
{% endhighlight %}

and it looks like this

The only exception I am aware of is the identifier doc I added recently. The backtick totally messed up the format.

huaxingao · 2020-05-01T04:16:21Z

If there are no further comments, could one of you please merge this PR for me? Thanks a lot!

SparkQA · 2020-05-01T04:32:33Z

Test build #122156 has finished for PR 28348 at commit f357d36.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-05-01T14:20:15Z

This is for 3.0 right?

huaxingao · 2020-05-01T14:44:14Z

@srowen Yes, it's for 3.0. Thanks!

gatorsmile

LGTM

### What changes were proposed in this pull request? Remove two leading spaces from sql tables. ### Why are the changes needed? Follow the format of other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html. ### Does this PR introduce any user-facing change? before ``` SELECT * FROM test; +-+ ... +-+ ``` after ``` SELECT * FROM test; +-+ ... +-+ ``` ### How was this patch tested? Manually build and check Closes #28348 from huaxingao/sql-format. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 75da050) Signed-off-by: gatorsmile <[email protected]>

gatorsmile · 2020-05-01T17:12:21Z

Thanks! Merged to master/3.0

huaxingao · 2020-05-01T17:20:34Z

Thank you all!

probot-autolabeler bot added the DOCS label Apr 26, 2020

srowen approved these changes Apr 26, 2020

View reviewed changes

dongjoon-hyun reviewed Apr 26, 2020

View reviewed changes

huaxingao added 2 commits April 30, 2020 21:08

[MINOR][SQL][DOCS] Remove two leading spaces from sql tables

6a2c1e2

fix

2807784

huaxingao force-pushed the sql-format branch from d484f7d to 2807784 Compare May 1, 2020 04:15

rebase

f357d36

gatorsmile approved these changes May 1, 2020

View reviewed changes

gatorsmile closed this in 75da050 May 1, 2020

huaxingao deleted the sql-format branch May 1, 2020 17:20

[MINOR][SQL][DOCS] Remove two leading spaces from sql tables #28348

[MINOR][SQL][DOCS] Remove two leading spaces from sql tables #28348

Uh oh!

Conversation

huaxingao commented Apr 26, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

huaxingao commented Apr 26, 2020

Uh oh!

SparkQA commented Apr 26, 2020

Uh oh!

maropu commented Apr 26, 2020

Uh oh!

huaxingao commented Apr 26, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Apr 26, 2020

Uh oh!

dongjoon-hyun commented Apr 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huaxingao commented Apr 27, 2020

Uh oh!

dongjoon-hyun commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Apr 27, 2020

Uh oh!

HyukjinKwon commented Apr 27, 2020

Uh oh!

gatorsmile commented Apr 27, 2020

Uh oh!

huaxingao commented Apr 27, 2020

Uh oh!

huaxingao commented May 1, 2020

Uh oh!

SparkQA commented May 1, 2020

Uh oh!

srowen commented May 1, 2020

Uh oh!

huaxingao commented May 1, 2020

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented May 1, 2020

Uh oh!

huaxingao commented May 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dongjoon-hyun commented Apr 26, 2020 •

edited

Loading

dongjoon-hyun commented Apr 27, 2020 •

edited

Loading