Skip to content

Commit aa78c05

Browse files
viiryaHyukjinKwon
authored andcommitted
[SPARK-33427][SQL][FOLLOWUP] Put key and value into IdentityHashMap sequantially
### What changes were proposed in this pull request? This follow-up fixes an issue when inserting key/value pairs into `IdentityHashMap` in `SubExprEvaluationRuntime`. ### Why are the changes needed? The last commits to #30341 follows review comment to use `IdentityHashMap`. Because we leverage `IdentityHashMap` to compare keys in reference, we should not convert expression pairs to Scala map before inserting. Scala map compares keys by equality so we will loss keys with different references. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Run benchmark to verify. Closes #30459 from viirya/SPARK-33427-map. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent a459238 commit aa78c05

File tree

2 files changed

+28
-3
lines changed

2 files changed

+28
-3
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@ package org.apache.spark.sql.catalyst.expressions
1818

1919
import java.util.IdentityHashMap
2020

21-
import scala.collection.JavaConverters._
22-
2321
import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
2422
import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
2523

@@ -98,7 +96,12 @@ class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
9896
val proxy = ExpressionProxy(expr, proxyExpressionCurrentId, this)
9997
proxyExpressionCurrentId += 1
10098

101-
proxyMap.putAll(e.map(_ -> proxy).toMap.asJava)
99+
// We leverage `IdentityHashMap` so we compare expression keys by reference here.
100+
// So for example if there are one group of common exprs like Seq(common expr 1,
101+
// common expr2, ..., common expr n), we will insert into `proxyMap` some key/value
102+
// pairs like Map(common expr 1 -> proxy(common expr 1), ...,
103+
// common expr n -> proxy(common expr 1)).
104+
e.map(proxyMap.put(_, proxy))
102105
}
103106

104107
// Only adding proxy if we find subexpressions.

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,4 +95,26 @@ class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
9595
})
9696
assert(proxys.isEmpty)
9797
}
98+
99+
test("SubExprEvaluationRuntime should wrap semantically equal exprs") {
100+
val runtime = new SubExprEvaluationRuntime(1)
101+
102+
val one = Literal(1)
103+
val two = Literal(2)
104+
def mul: (Literal, Literal) => Expression =
105+
(left: Literal, right: Literal) => Multiply(left, right)
106+
107+
val mul2_1 = Multiply(mul(one, two), mul(one, two))
108+
val mul2_2 = Multiply(mul(one, two), mul(one, two))
109+
110+
val sqrt = Sqrt(mul2_1)
111+
val sum = Add(mul2_2, sqrt)
112+
val proxyExpressions = runtime.proxyExpressions(Seq(sum))
113+
val proxys = proxyExpressions.flatMap(_.collect {
114+
case p: ExpressionProxy => p
115+
})
116+
// ( (one * two) * (one * two) )
117+
assert(proxys.size == 2)
118+
assert(proxys.forall(_.child.semanticEquals(mul2_1)))
119+
}
98120
}

0 commit comments

Comments
 (0)