[SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException #46028

pan3793 · 2024-04-12T10:04:48Z

What changes were proposed in this pull request?

SPARK-29089 parallelized checkAndGlobPathIfNecessary by leveraging ForkJoinPool, it also introduced a side effect, if something goes wrong, the reported error message loses caller side stack trace.

For example, I meet the following error in a Spark job, I have no idea what happened without the caller stack trace.

2024-04-12 14:31:21 CST ApplicationMaster INFO - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://xyz-cluster/user/abc/hive_db/tmp.db/tmp_lskkh_1
	at org.apache.spark.sql.errors.QueryCompilationErrors$.dataPathNotExistError(QueryCompilationErrors.scala:1011)
	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:785)
	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782)
	at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
	at scala.util.Success.$anonfun$map$1(Try.scala:255)
	at scala.util.Success.map(Try.scala:213)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
)

Why are the changes needed?

Improve error message.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT is added, and the exception stacktrace differences are

raw stacktrace

java.lang.RuntimeException: Error occurred on Thread-9
        at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141)
        at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138)

enhanced exception stacktrace

java.lang.RuntimeException: Error occurred on Thread-9
        at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141)
        at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138)
        at ... run in separate thread: Thread-9 ... ()
        at org.apache.spark.util.ThreadUtilsSuite.$anonfun$new$16(ThreadUtilsSuite.scala:151)
        at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
        at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
        at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
        at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
        at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
        at org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
        at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
        at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
        (... other scalatest callsites)

Was this patch authored or co-authored using generative AI tooling?

No

…thIfNecessary AnalysisException

pan3793 · 2024-04-12T10:09:28Z

cc @srowen @viirya @dongjoon-hyun @yaooqinn

pan3793 · 2024-04-16T02:45:04Z

cc @gengliangwang @LuciferYang @mridulm WDYT of this approach for stacktrace enhancement? Or do you have other suggestions?

mridulm

Looks reasonable to me, but I will let @viirya and @dongjoon-hyun comment better.

Btw, are there other callsites where this might be helpful ?

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala

pan3793 · 2024-04-16T08:09:42Z

@mridulm In this case, ThreadUtils.parmap catches and wraps the internal AnalysisException with a SparkException, if we throw SparkException directly, the callsites are retained, but propagating SparkException instead of the real exception AnalysisException might not be a desirable behavior, I believe the original author also wants to keep AnalysisException so unwrap in-place. I checked other callers of ThreadUtils.parmap, they throw SparkException directly so no such issues.

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala

yaooqinn · 2024-04-18T04:01:53Z

core/src/test/scala/org/apache/spark/util/ThreadUtilsSuite.scala

+
+    ThreadUtils.wrapCallerStacktrace(exception, s"run in separate thread: $runnerThreadName")
+
+    assert(exception.getStackTrace.mkString("\n").contains(


nit: val stStr = exception.getStackTrace.mkString("\n")

changed as suggested

yaooqinn · 2024-04-18T04:02:10Z

core/src/test/scala/org/apache/spark/util/ThreadUtilsSuite.scala

+      "org.scalatest.Suite.run"),
+      "stack trace does not contain caller stack trace"
+    )
+    assert(exception.getStackTrace.mkString("\n").contains("ThreadUtils.scala") === false,


nit: !exception.getStackTrace.mkString("\n").contains("ThreadUtils.scala")

changed as suggested

yaooqinn · 2024-04-18T04:02:28Z

core/src/test/scala/org/apache/spark/util/ThreadUtilsSuite.scala

    )
  }

+  test("wrapCallerStacktrace") {


attach jira id

pan3793 · 2024-04-18T04:56:21Z

@yaooqinn thanks for reviewing, I addressed all comments

yaooqinn

LGTM, thank you for updating the desc

yaooqinn · 2024-04-19T07:57:01Z

Merged to master. Thank you @pan3793

…thIfNecessary AnalysisException ### What changes were proposed in this pull request? SPARK-29089 parallelized `checkAndGlobPathIfNecessary` by leveraging ForkJoinPool, it also introduced a side effect, if something goes wrong, the reported error message loses caller side stack trace. For example, I meet the following error on a Spark job, I have no idea what happened without the caller stack trace. ``` 2024-04-12 14:31:21 CST ApplicationMaster INFO - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://xyz-cluster/user/abc/hive_db/tmp.db/tmp_lskkh_1 at org.apache.spark.sql.errors.QueryCompilationErrors$.dataPathNotExistError(QueryCompilationErrors.scala:1011) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:785) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ) ``` ### Why are the changes needed? Improve error message. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT is added, and the exception stacktrace differences are raw stacktrace ``` java.lang.RuntimeException: Error occurred on Thread-9 at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141) at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138) ``` enhanced exception stacktrace ``` java.lang.RuntimeException: Error occurred on Thread-9 at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141) at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138) at ... run in separate thread: Thread-9 ... () at org.apache.spark.util.ThreadUtilsSuite.$anonfun$new$16(ThreadUtilsSuite.scala:151) at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) at org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) (... other scalatest callsites) ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46028 from pan3793/SPARK-47833. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Kent Yao <[email protected]>

[SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPa…

3213b22

…thIfNecessary AnalysisException

github-actions bot added SQL CORE labels Apr 12, 2024

pan3793 mentioned this pull request Apr 12, 2024

[SPARK-29089][SQL] Parallelize blocking FileSystem calls in DataSource#checkAndGlobPathIfNecessary #25899

Closed

mridulm reviewed Apr 16, 2024

View reviewed changes

LuciferYang reviewed Apr 16, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala Outdated Show resolved Hide resolved

LuciferYang reviewed Apr 16, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala Show resolved Hide resolved

address comment

40d2ea1

yaooqinn reviewed Apr 18, 2024

View reviewed changes

address comments

4580a03

pan3793 requested a review from yaooqinn April 19, 2024 02:50

yaooqinn approved these changes Apr 19, 2024

View reviewed changes

yaooqinn closed this in 2bf4346 Apr 19, 2024


		ThreadUtils.wrapCallerStacktrace(exception, s"run in separate thread: $runnerThreadName")

		assert(exception.getStackTrace.mkString("\n").contains(

[SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException #46028

[SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException #46028

Uh oh!

Conversation

pan3793 commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 commented Apr 12, 2024

Uh oh!

pan3793 commented Apr 16, 2024

Uh oh!

mridulm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pan3793 commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yaooqinn Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

pan3793 Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

yaooqinn Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

pan3793 Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

yaooqinn Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

pan3793 Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Apr 18, 2024

Uh oh!

yaooqinn left a comment

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Apr 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pan3793 commented Apr 12, 2024 •

edited

Loading

pan3793 commented Apr 16, 2024 •

edited

Loading