Skip to content

Commit 8a926e4

Browse files
committed
[SPARK-26736][SQL] Partition pruning through nondeterministic expressions in Hive tables
### What changes were proposed in this pull request? This PR intends to improve partition pruning for nondeterministic expressions in Hive tables: Before this PR: ``` scala> sql("""create table test(id int) partitioned by (dt string)""") scala> sql("""select * from test where dt='20190101' and rand() < 0.5""").explain() == Physical Plan == *(1) Filter ((isnotnull(dt#19) AND (dt#19 = 20190101)) AND (rand(6515336563966543616) < 0.5)) +- Scan hive default.test [id#18, dt#19], HiveTableRelation `default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#18], [dt#19], Statistics(sizeInBytes=8.0 EiB) ``` After this PR: ``` == Physical Plan == *(1) Filter (rand(-9163956883277176328) < 0.5) +- Scan hive default.test [id#0, dt#1], HiveTableRelation `default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#0], [dt#1], Statistics(sizeInBytes=8.0 EiB), [isnotnull(dt#1), (dt#1 = 20190101)] ``` This PR is the rework of #24118. ### Why are the changes needed? For better performance. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit tests added. Closes #27219 from maropu/SPARK-26736. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
1 parent d42cf45 commit 8a926e4

File tree

3 files changed

+508
-1
lines changed

3 files changed

+508
-1
lines changed

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ private[hive] trait HiveStrategies {
252252
*/
253253
object HiveTableScans extends Strategy {
254254
def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
255-
case PhysicalOperation(projectList, predicates, relation: HiveTableRelation) =>
255+
case ScanOperation(projectList, predicates, relation: HiveTableRelation) =>
256256
// Filter out all predicates that only deal with partition keys, these are given to the
257257
// hive table scan operator to be used for partition pruning.
258258
val partitionKeyIds = AttributeSet(relation.partitionCols)

0 commit comments

Comments
 (0)