-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10327][SQL] Cache Table is not working while subquery has alias in its project list #8494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @marmbrus |
|
Test build #41720 has finished for PR 8494 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have ()? It also needs scala doc. However, since its only ever used once, I'd consider just inlining it.
bfd40d9 to
fc63b89
Compare
|
Test build #41812 has finished for PR 8494 at commit
|
|
Test build #41852 has finished for PR 8494 at commit
|
|
Thanks, merging to master. |
Author: Michael Armbrust <[email protected]> Closes #8659 from marmbrus/testBuildBreak.
* apache/master: (65 commits) [SPARK-10065] [SQL] avoid the extra copy when generate unsafe array [SPARK-10497] [BUILD] [TRIVIAL] Handle both locations for JIRAError with python-jira [MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result should be 0.0 (original: 1.0) [SPARK-9772] [PYSPARK] [ML] Add Python API for ml.feature.VectorSlicer [SPARK-9730] [SQL] Add Full Outer Join support for SortMergeJoin [SPARK-10461] [SQL] make sure `input.primitive` is always variable name not code at `GenerateUnsafeProjection` [SPARK-10481] [YARN] SPARK_PREPEND_CLASSES make spark-yarn related jar could n… [SPARK-10117] [MLLIB] Implement SQL data source API for reading LIBSVM data [SPARK-10227] fatal warnings with sbt on Scala 2.11 [SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide [SPARK-9654] [ML] [PYSPARK] Add IndexToString to PySpark [SPARK-10094] Pyspark ML Feature transformers marked as experimental [SPARK-10373] [PYSPARK] move @SInCE into pyspark from sql [SPARK-10464] [MLLIB] Add WeibullGenerator for RandomDataGenerator [SPARK-9834] [MLLIB] implement weighted least squares via normal equation [SPARK-10071] [STREAMING] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream [RELEASE] Add more contributors & only show names in release notes. [HOTFIX] Fix build break caused by apache#8494 [SPARK-10327] [SQL] Cache Table is not working while subquery has alias in its project list [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure ...
…as in its project list
```scala
import org.apache.spark.sql.hive.execution.HiveTableScan
sql("select key, value, key + 1 from src").registerTempTable("abc")
cacheTable("abc")
val sparkPlan = sql(
"""select a.key, b.key, c.key from
|abc a join abc b on a.key=b.key
|join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan
assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed
assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed
```
The actual plan is:
```
== Parsed Logical Plan ==
'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)]
'Join Inner, Some(('a.key = 'c.key))
'Join Inner, Some(('a.key = 'b.key))
'UnresolvedRelation [abc], Some(a)
'UnresolvedRelation [abc], Some(b)
'UnresolvedRelation [abc], Some(c)
== Analyzed Logical Plan ==
key: int, key: int, key: int
Project [key#14,key#61,key#66]
Join Inner, Some((key#14 = key#66))
Join Inner, Some((key#14 = key#61))
Subquery a
Subquery abc
Project [key#14,value#15,(key#14 + 1) AS _c2#16]
MetastoreRelation default, src, None
Subquery b
Subquery abc
Project [key#61,value#62,(key#61 + 1) AS _c2#58]
MetastoreRelation default, src, None
Subquery c
Subquery abc
Project [key#66,value#67,(key#66 + 1) AS _c2#63]
MetastoreRelation default, src, None
== Optimized Logical Plan ==
Project [key#14,key#61,key#66]
Join Inner, Some((key#14 = key#66))
Project [key#14,key#61]
Join Inner, Some((key#14 = key#61))
Project [key#14]
InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)
Project [key#61]
MetastoreRelation default, src, None
Project [key#66]
MetastoreRelation default, src, None
== Physical Plan ==
TungstenProject [key#14,key#61,key#66]
BroadcastHashJoin [key#14], [key#66], BuildRight
TungstenProject [key#14,key#61]
BroadcastHashJoin [key#14], [key#61], BuildRight
ConvertToUnsafe
InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc))
ConvertToUnsafe
HiveTableScan [key#61], (MetastoreRelation default, src, None)
ConvertToUnsafe
HiveTableScan [key#66], (MetastoreRelation default, src, None)
```
Author: Cheng Hao <[email protected]>
Closes apache#8494 from chenghao-intel/weird_cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use transformAllExpressions here instead of enumerating all possible ways of how expressions may occur in the logical plan, or would that produce a different result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, yeah that would probably work.
…as in its project list
```scala
import org.apache.spark.sql.hive.execution.HiveTableScan
sql("select key, value, key + 1 from src").registerTempTable("abc")
cacheTable("abc")
val sparkPlan = sql(
"""select a.key, b.key, c.key from
|abc a join abc b on a.key=b.key
|join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan
assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed
assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed
```
The actual plan is:
```
== Parsed Logical Plan ==
'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)]
'Join Inner, Some(('a.key = 'c.key))
'Join Inner, Some(('a.key = 'b.key))
'UnresolvedRelation [abc], Some(a)
'UnresolvedRelation [abc], Some(b)
'UnresolvedRelation [abc], Some(c)
== Analyzed Logical Plan ==
key: int, key: int, key: int
Project [key#14,key#61,key#66]
Join Inner, Some((key#14 = key#66))
Join Inner, Some((key#14 = key#61))
Subquery a
Subquery abc
Project [key#14,value#15,(key#14 + 1) AS _c2#16]
MetastoreRelation default, src, None
Subquery b
Subquery abc
Project [key#61,value#62,(key#61 + 1) AS _c2#58]
MetastoreRelation default, src, None
Subquery c
Subquery abc
Project [key#66,value#67,(key#66 + 1) AS _c2#63]
MetastoreRelation default, src, None
== Optimized Logical Plan ==
Project [key#14,key#61,key#66]
Join Inner, Some((key#14 = key#66))
Project [key#14,key#61]
Join Inner, Some((key#14 = key#61))
Project [key#14]
InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)
Project [key#61]
MetastoreRelation default, src, None
Project [key#66]
MetastoreRelation default, src, None
== Physical Plan ==
TungstenProject [key#14,key#61,key#66]
BroadcastHashJoin [key#14], [key#66], BuildRight
TungstenProject [key#14,key#61]
BroadcastHashJoin [key#14], [key#61], BuildRight
ConvertToUnsafe
InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc))
ConvertToUnsafe
HiveTableScan [key#61], (MetastoreRelation default, src, None)
ConvertToUnsafe
HiveTableScan [key#66], (MetastoreRelation default, src, None)
```
Author: Cheng Hao <[email protected]>
Closes apache#8494 from chenghao-intel/weird_cache.
The actual plan is: