-
Notifications
You must be signed in to change notification settings - Fork 0
SHS-NG M9: Stage page speed up. #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b5b270f to
3ee026c
Compare
|
is this branch stable enough now? |
|
I just run the UnitTesst, and found some test failed:
|
|
@djvulee there's a couple of things I need to fix in this last patch... if you just reset the branch to the previous commit things should be more stable. |
|
Unit tests should be fixed in this patch too, now. |
|
Ok, Thanks! I found the current branch can not deal with the failed Stage well enough, it will produce the following error:
|
|
Another issue is that the SQL tab page will lead to nullPointerException(M8 branch) |
|
@djvulee do you have some code that can reproduce the failed stage you're having trouble with? I can't see any issues in my local build. The SQL tab and individual executions also render fine for me. |
85b9ca1 to
172d0bb
Compare
Detect the deletion of event log files from storage, and remove data about the related application attempt in the SHS.
There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIi much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified.
There are two main changes to speed up rendering of the tasks list
when rendering the stage page.
The first one makes the code only load the tasks being shown in the
current page of the tasks table, and information related to only
those tasks. One side-effect of this change is that the graph that
shows task-related events now only shows events for the tasks in
the current page, instead of the previously hardcoded limit of "events
for the first 1000 tasks". That ends up helping with readability,
though.
To make sorting efficient when using a disk store, the task wrapper
was extended to include many new indices, one for each of the sortable
columns in the UI, and metrics for which quantiles are calculated.
The second changes the way metric quantiles are calculated for stages.
Instead of using the "Distribution" class to process data for all task
metrics, which requires scanning all tasks of a stage, the code now
uses the KVStore "skip()" functionality to only read tasks that contain
interesting information for the quantiles that are desired.
This is still not cheap; because there are many metrics that the UI
and API track, the code needs to scan the index for each metric to
gather the information. Savings come mainly from skipping deserialization
when using the disk store, but the in-memory code also seems to be
faster than before (most probably because of other changes in this
patch).
With the above changes, a lot of code in the UI layer could be simplified.