Skip to content

Conversation

@caneGuy
Copy link
Contributor

@caneGuy caneGuy commented Sep 9, 2017

What changes were proposed in this pull request?

As logging below, actually exception will be hidden when removeBlockInternal throw an exception.
2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting block broadcast_110 failed due to an exception 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: Failed to create a new broadcast in 1 attempts java.io.IOException: Failed to create local dir in /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e. at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70) at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115) at org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:88) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924) at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771) at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

In this pr i will print exception first make troubleshooting more conveniently.
PS:
This one split from PR-19133

How was this patch tested?

Exsist unit test

@caneGuy
Copy link
Contributor Author

caneGuy commented Sep 9, 2017

@kiszk I updated and split this from PR-19133

// Since removeBlockInternal may throw exception,
// we should print exception first to show root cause.
case e: Throwable =>
logWarning(s"Putting block $blockId failed due to exception $e.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we see the message Putting block $blockId failed due to exception twice in a log file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update

Copy link
Contributor

@jerryshao jerryshao Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please change to case NonFatal(e) =>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @jerryshao

@caneGuy
Copy link
Contributor Author

caneGuy commented Sep 15, 2017

Ping @kiszk Cloud you help take a look at this? Thanks too much.

@jerryshao
Copy link
Contributor

ok to test.

@SparkQA
Copy link

SparkQA commented Sep 15, 2017

Test build #81807 has finished for PR 19171 at commit 86525f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

Ok, seems the test is passed, let me merge to master branch.

Please be noted such trivial fix usually doesn't require a JIRA, also please think carefully about the necessity of such fix.

@asfgit asfgit closed this in 22b111e Sep 15, 2017
@caneGuy
Copy link
Contributor Author

caneGuy commented Sep 15, 2017

Ok, thanks for the notice @jerryshao

@SparkQA
Copy link

SparkQA commented Sep 15, 2017

Test build #81808 has finished for PR 19171 at commit a3ed8b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@caneGuy caneGuy deleted the zhoukang/print-rootcause branch September 25, 2017 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants