Skip to content

Commit d4cd975

Browse files
CodingCatzsxwing
authored andcommitted
[SPARK-19499][SS] Add more notes in the comments of Sink.addBatch()
## What changes were proposed in this pull request? addBatch method in Sink trait is supposed to be a synchronous method to coordinate with the fault-tolerance design in StreamingExecution (being different with the compute() method in DStream) We need to add more notes in the comments of this method to remind the developers ## How was this patch tested? existing tests Author: CodingCat <[email protected]> Closes #16840 from CodingCat/SPARK-19499.
1 parent aeb8034 commit d4cd975

File tree

1 file changed

+4
-1
lines changed
  • sql/core/src/main/scala/org/apache/spark/sql/execution/streaming

1 file changed

+4
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Sink.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,11 @@ trait Sink {
3131
* this method is called more than once with the same batchId (which will happen in the case of
3232
* failures), then `data` should only be added once.
3333
*
34-
* Note: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
34+
* Note 1: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
3535
* Otherwise, you may get a wrong result.
36+
*
37+
* Note 2: The method is supposed to be executed synchronously, i.e. the method should only return
38+
* after data is consumed by sink successfully.
3639
*/
3740
def addBatch(batchId: Long, data: DataFrame): Unit
3841
}

0 commit comments

Comments
 (0)