Commit 571acc8
[SPARK-34939][CORE] Throw fetch failure exception when unable to deserialize broadcasted map statuses
### What changes were proposed in this pull request?
This patch catches `IOException`, which is possibly thrown due to unable to deserialize map statuses (e.g., broadcasted value is destroyed), when deserilizing map statuses. Once `IOException` is caught, `MetadataFetchFailedException` is thrown to let Spark handle it.
### Why are the changes needed?
One customer encountered application error. From the log, it is caused by accessing non-existing broadcasted value. The broadcasted value is map statuses. E.g.,
```
[info] Cause: java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
[info] at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410)
[info] at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:226)
[info] at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:103)
[info] at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
[info] at org.apache.spark.MapOutputTracker$.$anonfun$deserializeMapStatuses$3(MapOutputTracker.scala:967)
[info] at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
[info] at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
[info] at org.apache.spark.MapOutputTracker$.logInfo(MapOutputTracker.scala:887)
[info] at org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:967)
```
There is a race-condition. After map statuses are broadcasted and the executors obtain serialized broadcasted map statuses. If any fetch failure happens after, Spark scheduler invalidates cached map statuses and destroy broadcasted value of the map statuses. Then any executor trying to deserialize serialized broadcasted map statuses and access broadcasted value, `IOException` will be thrown. Currently we don't catch it in `MapOutputTrackerWorker` and above exception will fail the application.
Normally we should throw a fetch failure exception for such case. Spark scheduler will handle this.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test.
Closes #32033 from viirya/fix-broadcast-master.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>1 parent ebf01ec commit 571acc8
File tree
2 files changed
+64
-10
lines changed- core/src
- main/scala/org/apache/spark
- test/scala/org/apache/spark
2 files changed
+64
-10
lines changedLines changed: 23 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
| 103 | + | |
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| |||
843 | 843 | | |
844 | 844 | | |
845 | 845 | | |
846 | | - | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
847 | 854 | | |
848 | 855 | | |
849 | 856 | | |
| |||
953 | 960 | | |
954 | 961 | | |
955 | 962 | | |
956 | | - | |
957 | | - | |
958 | | - | |
959 | | - | |
960 | | - | |
961 | | - | |
962 | | - | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
963 | 976 | | |
964 | 977 | | |
965 | 978 | | |
| |||
Lines changed: 41 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
332 | 332 | | |
333 | 333 | | |
334 | 334 | | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
335 | 376 | | |
0 commit comments