-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK][STREAMING] Invoke onBatchCompletion() only when all jobs in the JobSet are Success #19824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…obset from JobScheduler.jobSets" This reverts commit f8db894.
…he JobSet are Success
|
Can one of the admins verify this patch? |
|
cc @CodingCat, WDYT about this revert? |
|
cc @zsxwing |
|
Btw, I think this does not only revert the commit, seems it also proposes its fix. It is better to modify the title. |
|
did I miss anything? @Victor-Wong , you are describing what #16542 does in your description, but you are reverting it in your changes? |
|
@viirya Sorry for the misleading title, I have changed it now. |
|
@CodingCat Yes, this PR wants to solve the same issue in #16542, but I think this is a better way to solve it. |
|
|
|
@CodingCat please checkout the difference between the two PR.
|
|
#16542 has guaranteed that the failed batch can be re-executed, and I didn’t check if reverting the change in #16542 plus your new change can guarantee the same thing... Suppose it also guarantees, the remaining discussion becomes “what does complete mean in English?” which is not interesting to me to discuss |
|
One thing to note is that mute an event is a behavior change, if a user has introduced some customized listener to capture all completed batches and also extract failed job info, he/she will see a broken scenario with the change in this PR |
I agree with that, so we should be careful about changing the current behavior. I will close the PR later.
Maybe the remaining discussion should be how to let the user know that he will get a StreamingListenerBatchCompleted event even if the batch failed.
As I was using the StreamingListenerBatchCompleted to do some metadata checkpointing stuff, which should be done only when the batch succeeded. |
|
if you just worry about
If this is your concern, you should handle whether complete is with a failure in your own listener, check: https://github.com/Azure/azure-event-hubs-spark/blob/fecc34de8a238049806d033d8a85d888cad75901/core/src/main/scala/org/apache/spark/streaming/eventhubs/checkpoint/ProgressTrackingListener.scala#L38 IMHO, |
|
@CodingCat |
What changes were proposed in this pull request?
The code changes in PR(#16542) make me very confusing:
spark/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
Line 203 in 5a02e3a
private def handleJobCompletion(job: Job, completedTime: Long) { val jobSet = jobSets.get(job.time) jobSet.handleJobCompletion(job) job.setEndTime(completedTime) listenerBus.post(StreamingListenerOutputOperationCompleted(job.toOutputOperationInfo)) logInfo("Finished job " + job.id + " from job set of time " + jobSet.time) if (jobSet.hasCompleted) { listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo)) } job.result match { case Failure(e) => reportError("Error running job " + job, e) case _ => if (jobSet.hasCompleted) { jobSets.remove(jobSet.time) jobGenerator.onBatchCompletion(jobSet.time) logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format( jobSet.totalDelay / 1000.0, jobSet.time.toString, jobSet.processingDelay / 1000.0 )) } } }If a Job failed and the JobSet containing it has completed, listenerBus will post a StreamingListenerBatchCompleted, while jobGenerator will not invoke onBatchCompletion. So the batch is completed or not ?
The key point is if a Job in a Batch failed, whether or not we consider the Batch as completed.
I think if someone register a listener on StreamingListenerBatchCompleted, he just wants to get notified only when the batch finishes with no error. So if a Job is failed, we should not remove it from its JobSet, thus the JobSet has not completed.
How was this patch tested?
existing tests
Please review http://spark.apache.org/contributing.html before opening a pull request.