Skip to content

Commit 91de0dc

Browse files
yhuaimarmbrus
authored andcommitted
[SQL] Set outputPartitioning of BroadcastHashJoin correctly.
I think we will not generate the plan triggering this bug at this moment. But, let me explain it... Right now, we are using `left.outputPartitioning` as the `outputPartitioning` of a `BroadcastHashJoin`. We may have a wrong physical plan for cases like... ```sql SELECT l.key, count(*) FROM (SELECT key, count(*) as cnt FROM src GROUP BY key) l // This is buildPlan JOIN r // This is the streamedPlan ON (l.cnt = r.value) GROUP BY l.key ``` Let's say we have a `BroadcastHashJoin` on `l` and `r`. For this case, we will pick `l`'s `outputPartitioning` for the `outputPartitioning`of the `BroadcastHashJoin` on `l` and `r`. Also, because the last `GROUP BY` is using `l.key` as the key, we will not introduce an `Exchange` for this aggregation. However, `r`'s outputPartitioning may not match the required distribution of the last `GROUP BY` and we fail to group data correctly. JIRA is being reindexed. I will create a JIRA ticket once it is back online. Author: Yin Huai <[email protected]> Closes #1735 from yhuai/BroadcastHashJoin and squashes the following commits: 96d9cb3 [Yin Huai] Set outputPartitioning correctly. (cherry picked from commit 67bd8e3) Signed-off-by: Michael Armbrust <[email protected]>
1 parent 8d6ac2b commit 91de0dc

File tree

1 file changed

+1
-2
lines changed
  • sql/core/src/main/scala/org/apache/spark/sql/execution

1 file changed

+1
-2
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -405,8 +405,7 @@ case class BroadcastHashJoin(
405405
left: SparkPlan,
406406
right: SparkPlan) extends BinaryNode with HashJoin {
407407

408-
409-
override def outputPartitioning: Partitioning = left.outputPartitioning
408+
override def outputPartitioning: Partitioning = streamedPlan.outputPartitioning
410409

411410
override def requiredChildDistribution =
412411
UnspecifiedDistribution :: UnspecifiedDistribution :: Nil

0 commit comments

Comments
 (0)