Skip to content

Commit 2cdd92a

Browse files
lianchengcloud-fan
authored andcommitted
[SPARK-17182][SQL] Mark Collect as non-deterministic
## What changes were proposed in this pull request? This PR marks the abstract class `Collect` as non-deterministic since the results of `CollectList` and `CollectSet` depend on the actual order of input rows. ## How was this patch tested? Existing test cases should be enough. Author: Cheng Lian <[email protected]> Closes #14749 from liancheng/spark-17182-non-deterministic-collect.
1 parent 920806a commit 2cdd92a

File tree

1 file changed

+4
-0
lines changed
  • sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate

1 file changed

+4
-0
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ abstract class Collect extends ImperativeAggregate {
5454

5555
override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
5656

57+
// Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the
58+
// actual order of input rows.
59+
override def deterministic: Boolean = false
60+
5761
protected[this] val buffer: Growable[Any] with Iterable[Any]
5862

5963
override def initialize(b: MutableRow): Unit = {

0 commit comments

Comments
 (0)