Skip to content

Commit 0818618

Browse files
committed
[SPARK-24076][SQL] Use different seed in HashAggregate to avoid hash conflict
1 parent 5fea17b commit 0818618

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -755,7 +755,10 @@ case class HashAggregateExec(
755755
}
756756

757757
// generate hash code for key
758-
val hashExpr = Murmur3Hash(groupingExpressions, 42)
758+
// SPARK-24076: HashAggregate uses the same hash algorithm on the same expressions
759+
// as ShuffleExchange, it may lead to bad hash conflict when shuffle.partitions=8192*n,
760+
// pick a different seed to avoid this conflict
761+
val hashExpr = Murmur3Hash(groupingExpressions, 48)
759762
val hashEval = BindReferences.bindReference(hashExpr, child.output).genCode(ctx)
760763

761764
val (checkFallbackForGeneratedHashMap, checkFallbackForBytesToBytesMap, resetCounter,

0 commit comments

Comments
 (0)