[SPARK-21977] SinglePartition optimizations break certain Streaming Stateful Aggregation requirements #19196

brkyvz · 2017-09-12T04:35:04Z

What changes were proposed in this pull request?

This is a bit hard to explain as there are several issues here, I'll try my best. Here are the requirements:

A StructuredStreaming Source that can generate empty RDDs with 0 partitions
A StructuredStreaming query that uses the above source, performs a stateful aggregation
(mapGroupsWithState, groupBy.count, ...), and coalesce's by 1

The crux of the problem is that when a dataset has a coalesce(1) call, it receives a SinglePartition partitioning scheme. This scheme satisfies most required distributions used for aggregations such as HashAggregateExec. This causes a world of problems:
Symptom 1. If the input RDD has 0 partitions, the whole lineage will receive 0 partitions, nothing will be executed, the state store will not create any delta files. When this happens, the next trigger fails, because the StateStore fails to load the delta file for the previous trigger
Symptom 2. Let's say that there was data. Then in this case, if you stop your stream, and change coalesce(1) with coalesce(2), then restart your stream, your stream will fail, because spark.sql.shuffle.partitions - 1 number of StateStores will fail to find its delta files.

To fix the issues above, we must check that the partitioning of the child of a StatefulOperator satisfies:
If the grouping expressions are empty:
a) AllTuple distribution
b) Single physical partition
If the grouping expressions are non empty:
a) Clustered distribution
b) spark.sql.shuffle.partition # of partitions
whether or not coalesce(1) exists in the plan, and whether or not the input RDD for the trigger has any data.

Once you fix the above problem by adding an Exchange to the plan, you come across the following bug:
If you call coalesce(1).groupBy().count() on a Streaming DataFrame, and if you have a trigger with no data, StateStoreRestoreExec doesn't return the prior state. However, for this specific aggregation, HashAggregateExec after the restore returns a (0, 0) row, since we're performing a count, and there is no data. Then this data gets stored in StateStoreSaveExec causing the previous counts to be overwritten and lost.

How was this patch tested?

Regression tests

SparkQA · 2017-09-12T04:49:27Z

Test build #81652 has finished for PR 19196 at commit 090044c.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-12T05:24:18Z

Test build #81656 has finished for PR 19196 at commit 2f94951.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-12T06:14:14Z

Test build #81657 has finished for PR 19196 at commit 12cf02a.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-09-12T17:25:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala

+  override def apply(plan: SparkPlan): SparkPlan = plan transformUp {
+    case ss: StatefulOperator =>
+      val numPartitions = plan.sqlContext.sessionState.conf.numShufflePartitions
+      val keys = ss.keyExpressions


Another option is to not expose keyExpressions in StatefulOperator but use the requiredChildDistribution field to get the required key expression and partitioning

I think that is a better idea.

brkyvz · 2017-09-12T17:25:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala

 trait StatefulOperator extends SparkPlan {
  def stateInfo: Option[StatefulOperatorStateInfo]

+  def keyExpressions: Seq[Attribute]


we don't need to expose this if we don't want to

SparkQA · 2017-09-12T17:40:12Z

Test build #81690 has finished for PR 19196 at commit c5b7f23.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-09-13T22:32:09Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala

  }

-  override def preparations: Seq[Rule[SparkPlan]] = state +: super.preparations
+  override def preparations: Seq[Rule[SparkPlan]] = Seq(


this is a odd break up of the line. how about

override def preparations: Seq[Rule[SparkPlan]] = Seq(state, EnsureStatefulOpPartitioning) ++ super.preparations

tdas

Some clean up and code deduping required but overall its in the right direction.

tdas · 2017-09-13T22:49:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala

+object EnsureStatefulOpPartitioning extends Rule[SparkPlan] {
+  // Needs to be transformUp to avoid extra shuffles
+  override def apply(plan: SparkPlan): SparkPlan = plan transformUp {
+    case ss: StatefulOperator =>


nit: why ss? how about so or op or stateOp

tdas · 2017-09-13T22:52:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala

    if (streamDeathCause != null) {
      throw streamDeathCause
    }
+    if (!isActive) return


+1 good catch

tdas · 2017-09-13T22:54:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala

-          numOutputRows += 1
-          row +: Option(savedState).toSeq
+        val hasInput = iter.hasNext
+        if (!hasInput && keyExpressions.isEmpty) {


add docs on why we are doing this. similar to the docs in other places related to batch aggregation.

there wasn't any docs in batch :)

tdas · 2017-09-13T23:04:21Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+ * `coalesce(1)`, which has several optimizations regarding [[SinglePartition]], and a 0 partition
+ * parentRDD.
+ */
+class NonLocalRelationSource(spark: SparkSession) extends Source {


The point of this source is to basically create empty batches, local/non-local are just internal details. So it should be named accordingly.

The docs should explain accordingly, what it does, not why it does it the way it is. It really does not matter that local relation is not the right thing to use.

tdas · 2017-09-13T23:11:24Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

      CheckLastBatch((0, 0, 2), (1, 1, 3)))
  }
+
+  private def checkAggregationChain(


what does it check about the aggregation chain? add docs for any such complex functions

tdas · 2017-09-13T23:30:40Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+            spark.table("agg_test").as[Long],
+            1L)
+
+          inputSource.addData(2, 3)


This is a lot of duplicate code. I am sure you can create shortcuts like AddData, and AddFileData for this source, and then you can use testStream(). All the checkAggregation can be put inside an AssertQuery.

tdas · 2017-09-14T00:16:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala

+  override def apply(plan: SparkPlan): SparkPlan = plan transformUp {
+    case ss: StatefulOperator =>
+      val numPartitions = plan.sqlContext.sessionState.conf.numShufflePartitions
+      val keys = ss.keyExpressions


I think that is a better idea.

tdas · 2017-09-14T00:21:38Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+    }
+  }
+
+  test("SPARK-21977: coalesce(1) with 0 partition RDD should be repartitioned accordingly") {


what does "accordingly" mean? this test name can be improved.

tdas · 2017-09-14T00:22:02Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+          .outputMode("complete")
+          .queryName("agg_test")
+          .option("checkpointLocation", tempDir.getAbsolutePath)
+          .start()


this query code can be deduped into a function

tdas · 2017-09-14T00:25:17Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala

    assert(e.getMessage === "The output mode of function should be append or update")
  }

+  test("SPARK-21977: coalesce(1) should still be repartitioned when it has keyExpressions") {


what changed code paths this test cover that is not already covered by the other tests you added.

SparkQA · 2017-09-15T07:04:45Z

Test build #81812 has finished for PR 19196 at commit 6cc8c46.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class IncrementalExecutionRulesSuite extends SparkPlanTest with SharedSQLContext
case class TestOperator(

tdas · 2017-09-15T07:15:53Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+
+  def addData(data: Int*): Unit = {
+    if (streamLock.getCount == 0) {
+      streamLock = new CountDownLatch(1)


This is complicated. See how AddFileData is implemented. It's much simpler.

tdas · 2017-09-15T07:17:06Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+            .coalesce(1)
+            .groupBy()
+            .count()
+            .as[Long]


nit: collapse this query.. and at other places.... making the tests look unnecessarily long.

tdas · 2017-09-15T07:20:23Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+    }
+  }
+
+  test("SPARK-21977: coalesce(1) should still be repartitioned when it has keyExpressions") {


what is keyExpressions? how about non-empty grouping keys.

tdas · 2017-09-15T07:47:25Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+    }
+  }
+
+  test("SPARK-21977: coalesce(1) with 0 partition RDD should be repartitioned to 1") {


can you add more docs to explain this test. this is testing and complicated edge case, so more docs is necessary.

tdas · 2017-09-15T07:47:56Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala

+
+  override def getBatch(start: Option[Offset], end: Offset): DataFrame = synchronized {
+    val rdd = new BlockRDD[Int](spark.sparkContext, blocks.toArray)
+      .map(i => InternalRow(i)) // we don't really care about the values in this test


we do care about the values in the test that has .groupBy('a % 1)

not really since the grouping key is consistently 1

SparkQA · 2017-09-18T03:00:34Z

Test build #81860 has finished for PR 19196 at commit be39125.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-09-18T15:47:47Z

retest this please

SparkQA · 2017-09-18T18:27:13Z

Test build #81887 has finished for PR 19196 at commit be39125.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-09-19T01:14:13Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/IncrementalExecutionRulesSuite.scala

+import org.apache.spark.sql.execution.streaming.{IncrementalExecution, OffsetSeqMetadata, StatefulOperator, StatefulOperatorStateInfo}
+import org.apache.spark.sql.test.SharedSQLContext
+
+class IncrementalExecutionRulesSuite extends SparkPlanTest with SharedSQLContext {


How about making this the EnsureStatefulOpPartitioningSuite?
This pattern is followed by many other optimization rules (PropagateEmptyRelationSuite, CollapseProjectSuite...)

tdas · 2017-09-19T01:14:45Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/IncrementalExecutionRulesSuite.scala

+  }
+}
+
+case class TestOperator(


This can be within the above class. Then it wont pollute the general namespace.

Name it as TestStatefulOperator to make it more specific than just "test operator". And add docs saying what is it used for.

tdas · 2017-09-19T01:16:24Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/IncrementalExecutionRulesSuite.scala

+    keys => SinglePartition,
+    expectShuffle = false)
+
+  private def testEnsureStatefulOpPartitioning(


nit: add some docs specifying what does it test.

tdas · 2017-09-19T01:17:25Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/IncrementalExecutionRulesSuite.scala

+  testEnsureStatefulOpPartitioning(
+    "AllTuples with coalesce(1) doesn't need Exchange",
+    baseDf.coalesce(1).queryExecution.sparkPlan,
+    keys => AllTuples,


nit: add requiredDistribution = and expectedPartitioning = for greater readability

brkyvz · 2017-09-19T16:17:24Z

@tdas Addressed

SparkQA · 2017-09-19T16:35:02Z

Test build #81938 has finished for PR 19196 at commit f34fc8a.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class EnsureStatefulOpPartitioningSuite extends SparkPlanTest with SharedSQLContext
case class TestStatefulOperator(

brkyvz · 2017-09-19T18:02:47Z

retest this please

SparkQA · 2017-09-19T18:20:27Z

Test build #81942 has finished for PR 19196 at commit f34fc8a.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class EnsureStatefulOpPartitioningSuite extends SparkPlanTest with SharedSQLContext
case class TestStatefulOperator(

brkyvz · 2017-09-19T19:16:44Z

retest this please

SparkQA · 2017-09-19T19:29:46Z

Test build #81946 has finished for PR 19196 at commit f34fc8a.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class EnsureStatefulOpPartitioningSuite extends SparkPlanTest with SharedSQLContext
case class TestStatefulOperator(

SparkQA · 2017-09-19T20:09:52Z

Test build #81947 has finished for PR 19196 at commit 505af43.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-19T23:02:38Z

Test build #81950 has finished for PR 19196 at commit 8a6eafe.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-09-19T23:11:35Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EnsureStatefulOpPartitioningSuite.scala

+}
+
+/** Used to emulate a `StatefulOperator` with the given requiredDistribution. */
+case class TestStatefulOperator(


this has to live outside the test suite, otherwise we get weird failed to makeCopy failures as seen here:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81950/testReport/org.apache.spark.sql.streaming/EnsureStatefulOpPartitioningSuite/ClusteredDistribution_generates_Exchange_with_HashPartitioning/

tdas · 2017-09-19T23:43:23Z

LGTM.

SparkQA · 2017-09-20T01:47:59Z

Test build #81954 has finished for PR 19196 at commit 4eb7f4f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class TestStatefulOperator(

brkyvz · 2017-09-20T07:00:24Z

Thanks! Merging to master

brkyvz added 4 commits September 8, 2017 11:36

couldn't repro

b7aeed6

save

4a7d124

ready for review

00fa592

uncomment

090044c

Added more checks

2f94951

save

12cf02a

add required child distribution

c5b7f23

brkyvz commented Sep 12, 2017

View reviewed changes

tdas reviewed Sep 13, 2017

View reviewed changes

tdas suggested changes Sep 14, 2017

View reviewed changes

brkyvz added 2 commits September 14, 2017 16:20

concise tests

e178d10

address comments

6cc8c46

tdas reviewed Sep 15, 2017

View reviewed changes

address

be39125

tdas reviewed Sep 19, 2017

View reviewed changes

address

f34fc8a

Merge branch 'master' of github.com:apache/spark into sa-0

505af43

brkyvz added 2 commits September 19, 2017 14:17

move things around

d0e9094

i think I found it

8a6eafe

Update EnsureStatefulOpPartitioningSuite.scala

4eb7f4f

brkyvz commented Sep 19, 2017

View reviewed changes

asfgit closed this in 280ff52 Sep 20, 2017

brkyvz deleted the sa-0 branch February 3, 2019 20:53

[SPARK-21977] SinglePartition optimizations break certain Streaming Stateful Aggregation requirements #19196

[SPARK-21977] SinglePartition optimizations break certain Streaming Stateful Aggregation requirements #19196

Uh oh!

Conversation

brkyvz commented Sep 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas Sep 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 18, 2017

Uh oh!

brkyvz commented Sep 18, 2017

Uh oh!

SparkQA commented Sep 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

brkyvz commented Sep 12, 2017 •

edited

Loading

tdas Sep 15, 2017 •

edited

Loading

tdas Sep 19, 2017 •

edited

Loading