-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException #46028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…thIfNecessary AnalysisException
|
cc @gengliangwang @LuciferYang @mridulm WDYT of this approach for stacktrace enhancement? Or do you have other suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, but I will let @viirya and @dongjoon-hyun comment better.
Btw, are there other callsites where this might be helpful ?
|
@mridulm In this case, |
|
|
||
| ThreadUtils.wrapCallerStacktrace(exception, s"run in separate thread: $runnerThreadName") | ||
|
|
||
| assert(exception.getStackTrace.mkString("\n").contains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: val stStr = exception.getStackTrace.mkString("\n")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed as suggested
| "org.scalatest.Suite.run"), | ||
| "stack trace does not contain caller stack trace" | ||
| ) | ||
| assert(exception.getStackTrace.mkString("\n").contains("ThreadUtils.scala") === false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: !exception.getStackTrace.mkString("\n").contains("ThreadUtils.scala")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed as suggested
| ) | ||
| } | ||
|
|
||
| test("wrapCallerStacktrace") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attach jira id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
@yaooqinn thanks for reviewing, I addressed all comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for updating the desc
|
Merged to master. Thank you @pan3793 |
…thIfNecessary AnalysisException
### What changes were proposed in this pull request?
SPARK-29089 parallelized `checkAndGlobPathIfNecessary` by leveraging ForkJoinPool, it also introduced a side effect, if something goes wrong, the reported error message loses caller side stack trace.
For example, I meet the following error on a Spark job, I have no idea what happened without the caller stack trace.
```
2024-04-12 14:31:21 CST ApplicationMaster INFO - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://xyz-cluster/user/abc/hive_db/tmp.db/tmp_lskkh_1
at org.apache.spark.sql.errors.QueryCompilationErrors$.dataPathNotExistError(QueryCompilationErrors.scala:1011)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:785)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782)
at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
)
```
### Why are the changes needed?
Improve error message.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New UT is added, and the exception stacktrace differences are
raw stacktrace
```
java.lang.RuntimeException: Error occurred on Thread-9
at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141)
at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138)
```
enhanced exception stacktrace
```
java.lang.RuntimeException: Error occurred on Thread-9
at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141)
at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138)
at ... run in separate thread: Thread-9 ... ()
at org.apache.spark.util.ThreadUtilsSuite.$anonfun$new$16(ThreadUtilsSuite.scala:151)
at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
at org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
(... other scalatest callsites)
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes apache#46028 from pan3793/SPARK-47833.
Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
What changes were proposed in this pull request?
SPARK-29089 parallelized
checkAndGlobPathIfNecessaryby leveraging ForkJoinPool, it also introduced a side effect, if something goes wrong, the reported error message loses caller side stack trace.For example, I meet the following error in a Spark job, I have no idea what happened without the caller stack trace.
Why are the changes needed?
Improve error message.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UT is added, and the exception stacktrace differences are
raw stacktrace
enhanced exception stacktrace
Was this patch authored or co-authored using generative AI tooling?
No