-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24851][UI] Map a Stage ID to it's Associated Job ID #21809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added a field in Stage UI to display the corresponding job id for that particular stage.
|
ok to test |
|
Test build #93247 has finished for PR 21809 at commit
|
| }.toSeq | ||
| } | ||
|
|
||
| def getJobIdsAssociatedWithStage(stageId: Int): Seq[Set[Int]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to fetch all the stage attempts, just the first of it is enough to get all the jobIds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Have taken care of it.
| {if (!stageJobIds.isEmpty) { | ||
| <li> | ||
| <strong>Associated Job Ids: </strong> | ||
| {stageJobIds} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it href link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that the stage could also have multiple job ids, in that case, we get a bunch of them. Do you want a generic link instead that will take you to the jobs page? Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to map each job id as a href link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that and updated the screenshot. Thank you.
| } | ||
|
|
||
| def getJobIdsAssociatedWithStage(stageId: Int): Seq[Set[Int]] = { | ||
| store.view(classOf[StageDataWrapper]).index("stageId").first(stageId).last(stageId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid the store look up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really did not get the question, perhaps you are suggesting to try an alternative way but is there any other alternative way to do this? Let me know your thoughts. Thank you.
…h the job ids for the particular stage
|
Test build #93458 has finished for PR 21809 at commit
|
| return UIUtils.headerSparkPage(request, stageHeader, content, parent) | ||
| } | ||
|
|
||
| val stageJobIds = parent.store.getJobIdsAssociatedWithStage(stageId, stageAttemptId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L109 there is a query for the stage data already. We can reduce the query to the store here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. we can add a function to return the whole StageDataWrapper in AppStatusStore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I have fixed it. Request you to have a look.
|
@pgandhi999 could you update the title to |
|
Test build #93561 has finished for PR 21809 at commit
|
|
Test build #93563 has finished for PR 21809 at commit
|
|
test this please |
|
Test build #93607 has finished for PR 21809 at commit
|
…Exception on querying an invalid stage id
|
Test build #93631 has finished for PR 21809 at commit
|
| .asOption(stageDataWrapper.info) | ||
| .get | ||
| } else { | ||
| stageData = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code branch is unreachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unreachable by the IDE but during runtime, the code does work, I have confirmed that.
| {if (!stageJobIds.isEmpty) { | ||
| <li> | ||
| <strong>Associated Job Ids: </strong> | ||
| {for (jobId <- stageJobIds) yield {val detailUrl = "%s/jobs/job/?id=%s".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using map is more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } | ||
|
|
||
| def stageAttempt(stageId: Int, stageAttemptId: Int, details: Boolean = false): v1.StageData = { | ||
| def stageAttempt(stageId: Int, stageAttemptId: Int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the return type to (StageData, jobIds) might be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Using map instead of for..yield and returning tuple instead of object
|
Test build #93689 has finished for PR 21809 at commit
|
| val stageKey = Array(stageId, stageAttemptId) | ||
| val stage = store.read(classOf[StageDataWrapper], stageKey).info | ||
| if (details) stageWithDetails(stage) else stage | ||
| val stageDataWrapper: StageDataWrapper = store.read(classOf[StageDataWrapper], stageKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't really need the type here (: StageDataWrapper)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
|
|
||
| def stageAttempt(stageId: Int, stageAttemptId: Int, details: Boolean = false): v1.StageData = { | ||
| def stageAttempt(stageId: Int, stageAttemptId: Int, | ||
| details: Boolean = false): (v1.StageData, Seq[Int]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent , you might make it look like taskSummary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
| .asOption(parent.store.stageAttempt(stageId, stageAttemptId, details = false)) | ||
| .getOrElse { | ||
| var stageDataTuple: Tuple2[StageData, Seq[Int]] = null | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could use options here instead of try and null checks to make this cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
|
Test build #94500 has finished for PR 21809 at commit
|
abellina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgandhi999 had a comment, otherwise LGTM.
| {Utils.bytesToString(stageData.diskBytesSpilled)} | ||
| </li> | ||
| }} | ||
| {if (!stageJobIds.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could throw an NPE if stageDataTuple is None
scala> var x:Seq[Int] = null
x: Seq[Int] = null
scala> x.isEmpty
java.lang.NullPointerException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we are returning from the code if it is None, it should never reach there as far as I can tell.
| </div> | ||
| return UIUtils.headerSparkPage(request, stageHeader, content, parent) | ||
| } | ||
| var stageDataTuple: Option[Tuple2[StageData, Seq[Int]]] = try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just simplify this to be close to what is was before but return the tuple:
val (stageData, stageJodIds) = parent.store
.asOption(parent.store.stageAttempt(stageId, stageAttemptId, details = false))
.getOrElse {
val content =
<div id="no-info">
<p>No information to display for Stage {stageId} (Attempt {stageAttemptId})</p>
</div>
return UIUtils.headerSparkPage(stageHeader, content, parent)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, simplified the code and tested it. Looks good. Thank you.
Simplifying code block in StagePage
|
Test build #96762 has finished for PR 21809 at commit
|
| if (details) stageWithDetails(stage) else stage | ||
| val stageDataWrapper = store.read(classOf[StageDataWrapper], stageKey) | ||
| val stage = if (details) stageWithDetails(stageDataWrapper.info) else stageDataWrapper.info | ||
| val jobIds = stageDataWrapper.jobIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having separate val just put this in the return:
(stage, stageDataWrapper.jobIds.toSeq)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Test build #97008 has finished for PR 21809 at commit
|
|
test this please |
|
add to whitelist |
|
Test build #97120 has finished for PR 21809 at commit
|
|
test this please |
|
Test build #97130 has finished for PR 21809 at commit
|
|
+1 |
|
merged to master, thanks @pgandhi999 |
It would be nice to have a field in Stage Page UI which would show mapping of the current stage id to the job id's to which that stage belongs to. ## What changes were proposed in this pull request? Added a field in Stage UI to display the corresponding job id for that particular stage. ## How was this patch tested? <img width="448" alt="screen shot 2018-07-25 at 1 33 07 pm" src="https://user-images.githubusercontent.com/22228190/43220447-a8e94f80-900f-11e8-8a20-a235bbd5a369.png"> Closes apache#21809 from pgandhi999/SPARK-24851. Authored-by: pgandhi <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
It would be nice to have a field in Stage Page UI which would show mapping of the current stage id to the job id's to which that stage belongs to.
What changes were proposed in this pull request?
Added a field in Stage UI to display the corresponding job id for that particular stage.
How was this patch tested?