[SPARK-19499][SS] Add more notes in the comments of Sink.addBatch()

CodingCat · zsxwing · commit d4cd97571871 · 2017-02-07T20:25:18.000-08:00
## What changes were proposed in this pull request? addBatch method in Sink trait is supposed to be a synchronous method to coordinate with the fault-tolerance design in StreamingExecution (being different with the compute() method in DStream) We need to add more notes in the comments of this method to remind the developers ## How was this patch tested? existing tests Author: CodingCat <zhunansjtu@gmail.com> Closes #16840 from CodingCat/SPARK-19499.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Sink.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Sink.scala
@@ -31,8 +31,11 @@ trait Sink {
    * this method is called more than once with the same batchId (which will happen in the case of
    * failures), then `data` should only be added once.
    *
-   * Note: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
+   * Note 1: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
    * Otherwise, you may get a wrong result.
+   *
+   * Note 2: The method is supposed to be executed synchronously, i.e. the method should only return
+   * after data is consumed by sink successfully.
    */
   def addBatch(batchId: Long, data: DataFrame): Unit
 }

Original file line number	Diff line number	Diff line change
`@@ -31,8 +31,11 @@ trait Sink {`
`31`	`31`	`* this method is called more than once with the same batchId (which will happen in the case of`
`32`	`32`	* failures), then `data` should only be added once.
`33`	`33`	`*`
`34`		- * Note: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
	`34`	+ * Note 1: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
`35`	`35`	`* Otherwise, you may get a wrong result.`
	`36`	`+ *`
	`37`	`+ * Note 2: The method is supposed to be executed synchronously, i.e. the method should only return`
	`38`	`+ * after data is consumed by sink successfully.`
`36`	`39`	`*/`
`37`	`40`	`def addBatch(batchId: Long, data: DataFrame): Unit`
`38`	`41`	`}`