-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP]: create attributes from live log urls to form a permalink past YARN log aggregation #23720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: create attributes from live log urls to form a permalink past YARN log aggregation #23720
Conversation
|
Can one of the admins verify this patch? |
|
Not sure we would want to have feature for just enabling old apps to also enjoy the feature. Might be ideal to craft a tool that migrates event logs? We know the format of YARN default log url, so it is possible to parse log url via regex or some rules, and instrument event log to add attribute. |
|
With our order of magnitude this tool will be more like a large an ETL job, also require a replay for the logs already replayed to the kvstore. |
|
Hmm... got it. I don't have strong preference so let's see others' voice. Btw I think we should have preference to attributes and regex group (which is less stable than attributes), or maybe more ideally merge two attributes. Your patch looks to use attributes as a fail-back which could effectively disable the feature for newer apps. |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, but as usual, needs a bug, tests, and removing the WIP from the title.
| .get("stdout") | ||
| .flatMap(regEx.findFirstMatchIn) | ||
| .map { urlMatch => | ||
| STDOUT_URL_GROUPS.foldLeft(exec.attributes) { (attribs, groupName) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this block should be executed at all when exec.attributes is not empty; IMO this should only apply to legacy event logs that don't record that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Let's just use attributes if it exists. I can see the case this helps even if attributes are presented: the number of attributes are updated which new attributes can be extracted from URL, but at least with current YARN URL we have extracted almost everything so it will unlikely happen.
|
I'll close this for now since it hasn't been updated. |
What changes were proposed in this pull request?
Provide ability to recover attributes for log url rewrite in #23260 from event logs produced by older spark versions using the log urls of live containers
The current solution in #23260 currently works only with the apps run on the to-be-released version of Spark. However, deployments with a wealth of historic event logs are left out. More importantly SHS is less critical and can be upgraded much faster than production apps.
Another improvement in this PR is going beyond a fixed set of supported variables for wider set of deployments. This PR proposes to use configurable named capturing groups for a regex to extract building blocks for a permalink. Below is an example for YARN.
How was this patch tested?
Just locally so far. With configuration like
I was able to use this PR with old event logs from EMR stored on S3