-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10180] [SQL] JDBCRDD does not process EqualNullSafe filter. #8391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to double check - is this standard sql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(standard as in -- do most database systems support this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the comparison operator is official and a standard one (because both I am used to MySQL and thought Spark and Hive use Standard SQL Syntax). However, it looks like it is a MySQL dialect.
In details,
Firstly, I looked through several documents.
I assume there are several standard documentations as mentioned in the Where can I get a copy of the SQL standards? https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F).
(SQL-92 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
SQL:1999 http://web.cs.ualberta.ca/~yuan/courses/db_readings/ansi-iso-9075-2-1999.pdf
SQL:2003 http://www.wiscorp.com/sql_2003_standard.zip
SQL:201x (preliminary) http://www.wiscorp.com/sql20nn.zip)
Though I can't guarantee, It looks null-safe equality comparison is not the standard one. It seems there are no mentions about this.
Secondly, I got a list of the top 10 databases here (http://www.improgrammer.net/top-10-databases-should-learn-2015/)
and reviewed if there is a such operation or not.
- Oracle - not support (http://docs.oracle.com/html/A95915_01/sqopr.htm)
- MySQL - support (https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html)
- Microsoft SQL Server - not support (https://msdn.microsoft.com/en-us/library/ms188074.aspx)
- PostgreSQL - not support (http://www.postgresql.org/docs/9.2/static/functions-comparison.html)
- MongoDB - N/A
- DB 2 - not support (http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/sqlp/rbafycompop.htm)
- Microsoft Access - not support (https://support.office.com/en-za/article/Table-of-operators-e1bc04d5-8b76-429f-a252-e9223117d6bd)
- SQLite - not support (http://www.tutorialspoint.com/sqlite/sqlite_comparison_operators.htm)
- Cassandra - N/A
- Redis - N/A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this should be treated differently for each different dialect ( maybe only for MySQL ) or should this be closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only in MySQL, I think the only options are:
- don't support it
- rewrite it as using isnull and normal equality.
- add it to dialect.
Don't have a strong preference here. If 2 is not much slower, maybe just do 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will go for the second one with some comments. Thanks :)
|
Hm.. one thing I want to say is, it looks like there is no test code for JdbcRelation. So I tested this with seperate copied functions. It looks just about text parsing (SQL query) though. |
|
There is this: #8101 Using docker to test this end-to-end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be null if attr is null but value is not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my late comment.
Yes.. it looks so. There can be a problem if the query uses the result of comparison <=>.
Let me correct this soon as (NOT ($attr <> ${compileValue(value)} OR $attr IS NULL OR ${compileValue(value)} IS NULL) OR ($attr IS NULL AND ${compileValue(value)} IS NULL)).
(BTW, I found <=> actually a standard but as IS NOT DISTINCT FROM from SQL1999 but not many DBs look supporting this)
…ilter This PR is followed by #8391. Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison. Author: hyukjinkwon <[email protected]> Author: HyukjinKwon <[email protected]> Closes #8743 from HyukjinKwon/SPARK-10180.
https://issues.apache.org/jira/browse/SPARK-10180