Skip to content

Commit 958c797

Browse files
testsgmrNgone51
authored andcommitted
[SPARK-37060][CORE] Handle driver status response from backup masters
### What changes were proposed in this pull request? After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug. ### Why are the changes needed? We need to find if the response received from a backup master client must ignore it. ### Does this PR introduce _any_ user-facing change? No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters. ### How was this patch tested? Closes #34331 from mohamadrezarostami/fix-bug-in-report-driver-status. Authored-by: Mohamadreza Rostami <[email protected]> Signed-off-by: yi.wu <[email protected]>
1 parent 898cf76 commit 958c797

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

core/src/main/scala/org/apache/spark/deploy/Client.scala

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -190,13 +190,15 @@ private class ClientEndpoint(
190190
logDebug(s"State of driver $submittedDriverID is ${state.get}, " +
191191
s"continue monitoring driver status.")
192192
}
193-
}
194-
}
195-
} else {
193+
}
194+
}
195+
} else if (exception.exists(e => Utils.responseFromBackup(e.getMessage))) {
196+
logDebug(s"The status response is reported from a backup spark instance. So, ignored.")
197+
} else {
196198
logError(s"ERROR: Cluster master did not recognize $submittedDriverID")
197199
System.exit(-1)
198-
}
199200
}
201+
}
200202
override def receive: PartialFunction[Any, Unit] = {
201203

202204
case SubmitDriverResponse(master, success, driverId, message) =>

0 commit comments

Comments
 (0)