Skip to content

Conversation

@kevinyu98
Copy link
Contributor

@kevinyu98 kevinyu98 commented Feb 13, 2017

What changes were proposed in this pull request?

This is 4th batch of test case for IN/NOT IN subquery. In this PR, it has these test files:

in-set-operations.sql
in-with-cte.sql
not-in-joins.sql

Here are the queries and results from running on DB2.

in-set-operations DB2 version
Output of in-set-operations
in-with-cte DB2 version
Output of in-with-cte
not-in-joins DB2 version
Output of not-in-joins

How was this patch tested?

This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct.

get latest code from upstream
adding trim characters support
ON t1c = t2c
LEFT JOIN t3
ON t2d = t3d ) AND
t1a = "val1b")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style looks strange. Could you adjust them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I have adjust the style and resubmit. thanks.

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72846 has finished for PR 16915 at commit 3dd57fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

@kevinyu98 @nsyca @dilipbiswal could someone confirm that these results match DB2?

I also think that this PR is almost too large.

1 10 NULL 2014-08-04
1 10 NULL 2014-09-04
1 10 NULL 2015-05-04
1 10 NULL 2014-05-04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the results are equivalent with the ones from DB2.

struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1h:timestamp>
-- !query 12 output
val1b 8 16 19 2014-05-04 01:01:00
val1c 8 16 19 2014-05-04 01:02:00.001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the results are equivalent with the ones from DB2.

-- !query 8 schema
struct<count(DISTINCT t1a):bigint,t1b:smallint,t1c:int,t1d:bigint>
-- !query 8 output
1 6 8 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the results are equivalent with the ones from DB2.

@nsyca
Copy link
Contributor

nsyca commented Feb 15, 2017

It's larger than typical test PRs we submitted for the subquery JIRA but since it's the last test PR, we think we wanted to avoid an additional round of administrative work.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 15, 2017

Test build #72952 has finished for PR 16915 at commit 3dd57fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM. Merging to master.

@asfgit asfgit closed this in 8487902 Feb 16, 2017
@kevinyu98
Copy link
Contributor Author

@gatorsmile thanks a lot.

cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 16, 2017
…atch

## What changes were proposed in this pull request?

This is 4th batch of test case for IN/NOT IN subquery. In this PR, it has these test files:

`in-set-operations.sql`
`in-with-cte.sql`
`not-in-joins.sql`

Here are the queries and results from running on DB2.

[in-set-operations DB2 version](https://github.com/apache/spark/files/772846/in-set-operations.sql.db2.txt)
[Output of in-set-operations](https://github.com/apache/spark/files/772848/in-set-operations.sql.db2.out.txt)
[in-with-cte DB2 version](https://github.com/apache/spark/files/772849/in-with-cte.sql.db2.txt)
[Output of in-with-cte](https://github.com/apache/spark/files/772856/in-with-cte.sql.db2.out.txt)
[not-in-joins DB2 version](https://github.com/apache/spark/files/772851/not-in-joins.sql.db2.txt)
[Output of not-in-joins](https://github.com/apache/spark/files/772852/not-in-joins.sql.db2.out.txt)

## How was this patch tested?

This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct.

Author: Kevin Yu <[email protected]>

Closes apache#16915 from kevinyu98/spark-18871-44.
asfgit pushed a commit that referenced this pull request Mar 14, 2017
…ll up to Optimizer phase

## What changes were proposed in this pull request?
Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its
originating SubqueryExpression. The subquery plan is then transformed to remove the correlated
predicates after they are moved up to the outer plan. In this PR, the task of pulling up
correlated predicates is deferred to Optimizer. This is the initial work that will allow us to
support the form of correlated subqueries that we don't support today. The design document
from nsyca can be found in the following link :
[DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#)

The brief description of code changes (hopefully to aid with code review) can be be found in the
following link:
[CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#)

## How was this patch tested?
The test case PRs were submitted earlier using.
[16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802)

Author: Dilip Biswal <[email protected]>

Closes #16954 from dilipbiswal/SPARK-18874.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants