Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Remove two leading spaces from sql tables.

Why are the changes needed?

Follow the format of other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html.

Does this PR introduce any user-facing change?

before

SELECT * FROM  test;
  +-+
  ...
  +-+

after

SELECT * FROM  test;
+-+
...
+-+

How was this patch tested?

Manually build and check

@huaxingao
Copy link
Contributor Author

cc @HyukjinKwon @maropu

@SparkQA
Copy link

SparkQA commented Apr 26, 2020

Test build #121824 has finished for PR 28348 at commit d484f7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 26, 2020

I think its better to fix this issue in a final clean-up with other minor issues, e.g., typo. The current open documentation PRs are last ones for the 3.0 release?

@huaxingao
Copy link
Contributor Author

I plan to do one more sql ref PR for 3.0: modify sql-ref.html page to add more descriptions and also add links to each subsections. Then final check and clean-up. For the remaining keywords (such as EXPLAIN LOGICAL, PIVOT, etc) we need to document, I want to defer to 3.1. I have a few ML stuff that are half way done and need to go back to take care of those.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against this one, but I thought this was for separating the command and result in md environment. In other language environment in Spark, we use commenting result for that purpose.

BTW, this will be enforced for all SQL documents for consistency eventually, right?

@huaxingao
Copy link
Contributor Author

@dongjoon-hyun
Do you mean to comment out the result like this?

val sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

In scala/java/python/R examples, we need to comment out the results because these results are not part of the file. We just want to show the users what results the sql statement will generate. However, in sql examples, because it is a REPL, the sql cmd is evaluated and the result is be shown in the cmd, so it makes sense to show the result. Also, other sql references (e.g. Oracle, Snowflake) show the result after the sql statement without commenting out. We probably want to be consistent with these sql references.

BTW, this will be enforced for all SQL documents for consistency eventually, right?

Yes, I am trying to make all the SQL documents consistent :)

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 26, 2020

Apache Spark is different from SQL-only engines' documents. As you know, Apache Spark has four REPL environments. The code is also runnable in shell environments.

  1. spark-shell for Scala
  2. pyspark for Python
  3. spark for R
  4. spark-sql for SQL

Also, we can keep SQL statements in the SQL source file and execute with spark-sql's -f option (in addition to -i option). I believe that the only reason why SQL sources are not on the files is that the documents arrived relatively recently.

@huaxingao
Copy link
Contributor Author

@dongjoon-hyun Thanks for the quick reply.
I guess what I am trying to say is that the scala/java/python/R examples are directly from the examples files. The results in the files have to be commented out.
When writing the SQL reference, we simply use spark-sql to run the sql statement and show users what the result is. It seems to me that it makes more sense not to comment out the result, and I also think we should be consistent with the majority of the sql reference format.

Anyway, is everybody OK with not commenting out the sql result? :)

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 27, 2020

Ur, I'm not saying that we need commenting in SQL. The existing two spaces was considered as a separation like the other languages.

I thought this was for separating the command and result in md environment.

Also, I'm not against this approach as I stated in my first sentence. :)

I'm not against this one, ...

We can be consistent in this way and that way. It's great that we can be in any way.

@HyukjinKwon
Copy link
Member

Similar discussions happened internally and externally here and there. In short, I first thought we should comment the output and use the direct output from spark-sql to be consistent with other examples. Seems like we're using show() (or beeline-like format).

After the discussions, I think it's just better to stick to the references in other DBMSs. This way can be inconsistent across the different language APIs but maybe here is where we should think about the language specific convention to make it friendly to each user group. For example, users dedicated to SQL might likely expect SQL-friendly docs.

@HyukjinKwon
Copy link
Member

We could move all to separate example files and comment the output out too to be consistent. I am good with that approach too. We just need to choose one approach with reasonable logics, and stick to it.

@gatorsmile
Copy link
Member

Let us follow the SQL reference by the other mainstream database vendors?

@huaxingao Since you are trying to changing all the query example, it would be nice if the queries in our SQL Reference docs can be highlighted using SQL syntax. Could you check how to enable it in Markdown files?

@huaxingao
Copy link
Contributor Author

@gatorsmile
We actually highlight SQL keywords. We have

{% highlight sql %}
......
-- Correlated Subquery in `WHERE` clause.
SELECT * FROM person AS parent
    WHERE EXISTS (
        SELECT 1 FROM person AS child
        WHERE parent.id = child.id AND child.age IS NULL
    );
......
{% endhighlight %}

and it looks like this

Screen Shot 2020-04-26 at 8 52 56 PM

The only exception I am aware of is the identifier doc I added recently. The backtick totally messed up the format.
Screen Shot 2020-04-26 at 9 00 21 PM

@huaxingao
Copy link
Contributor Author

If there are no further comments, could one of you please merge this PR for me? Thanks a lot!

@SparkQA
Copy link

SparkQA commented May 1, 2020

Test build #122156 has finished for PR 28348 at commit f357d36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 1, 2020

This is for 3.0 right?

@huaxingao
Copy link
Contributor Author

@srowen Yes, it's for 3.0. Thanks!

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gatorsmile gatorsmile closed this in 75da050 May 1, 2020
gatorsmile pushed a commit that referenced this pull request May 1, 2020
### What changes were proposed in this pull request?
Remove two leading spaces from sql tables.

### Why are the changes needed?

Follow the format of other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html.

### Does this PR introduce any user-facing change?

before
```
SELECT * FROM  test;
  +-+
  ...
  +-+
```
after
```
SELECT * FROM  test;
+-+
...
+-+
```

### How was this patch tested?
Manually build and check

Closes #28348 from huaxingao/sql-format.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
(cherry picked from commit 75da050)
Signed-off-by: gatorsmile <[email protected]>
@gatorsmile
Copy link
Member

Thanks! Merged to master/3.0

@huaxingao
Copy link
Contributor Author

Thank you all!

@huaxingao huaxingao deleted the sql-format branch May 1, 2020 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants