Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to double check - is this standard sql?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(standard as in -- do most database systems support this?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the comparison operator is official and a standard one (because both I am used to MySQL and thought Spark and Hive use Standard SQL Syntax). However, it looks like it is a MySQL dialect.

In details,
Firstly, I looked through several documents.
I assume there are several standard documentations as mentioned in the Where can I get a copy of the SQL standards? https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F).

(SQL-92 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
SQL:1999 http://web.cs.ualberta.ca/~yuan/courses/db_readings/ansi-iso-9075-2-1999.pdf
SQL:2003 http://www.wiscorp.com/sql_2003_standard.zip
SQL:201x (preliminary) http://www.wiscorp.com/sql20nn.zip)

Though I can't guarantee, It looks null-safe equality comparison is not the standard one. It seems there are no mentions about this.

Secondly, I got a list of the top 10 databases here (http://www.improgrammer.net/top-10-databases-should-learn-2015/)

and reviewed if there is a such operation or not.

  1. Oracle - not support (http://docs.oracle.com/html/A95915_01/sqopr.htm)
  2. MySQL - support (https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html)
  3. Microsoft SQL Server - not support (https://msdn.microsoft.com/en-us/library/ms188074.aspx)
  4. PostgreSQL - not support (http://www.postgresql.org/docs/9.2/static/functions-comparison.html)
  5. MongoDB - N/A
  6. DB 2 - not support (http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/sqlp/rbafycompop.htm)
  7. Microsoft Access - not support (https://support.office.com/en-za/article/Table-of-operators-e1bc04d5-8b76-429f-a252-e9223117d6bd)
  8. SQLite - not support (http://www.tutorialspoint.com/sqlite/sqlite_comparison_operators.htm)
  9. Cassandra - N/A
  10. Redis - N/A

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should be treated differently for each different dialect ( maybe only for MySQL ) or should this be closed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only in MySQL, I think the only options are:

  1. don't support it
  2. rewrite it as using isnull and normal equality.
  3. add it to dialect.

Don't have a strong preference here. If 2 is not much slower, maybe just do 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will go for the second one with some comments. Thanks :)

@HyukjinKwon
Copy link
Member Author

Hm.. one thing I want to say is, it looks like there is no test code for JdbcRelation. So I tested this with seperate copied functions. It looks just about text parsing (SQL query) though.

@rxin
Copy link
Contributor

rxin commented Aug 25, 2015

There is this: #8101

Using docker to test this end-to-end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be null if attr is null but value is not null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my late comment.
Yes.. it looks so. There can be a problem if the query uses the result of comparison <=>.

Let me correct this soon as (NOT ($attr <> ${compileValue(value)} OR $attr IS NULL OR ${compileValue(value)} IS NULL) OR ($attr IS NULL AND ${compileValue(value)} IS NULL)).

(BTW, I found <=> actually a standard but as IS NOT DISTINCT FROM from SQL1999 but not many DBs look supporting this)

asfgit pushed a commit that referenced this pull request Jan 2, 2016
…ilter

This PR is followed by #8391.
Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison.

Author: hyukjinkwon <[email protected]>
Author: HyukjinKwon <[email protected]>

Closes #8743 from HyukjinKwon/SPARK-10180.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants