-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8282306: os::is_first_C_frame(frame*) crashes on invalid link access #7591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8282306: os::is_first_C_frame(frame*) crashes on invalid link access #7591
Conversation
Prevents segmentation faults in os::is_first_C_frame if passed a thread.
|
👋 Welcome back parttimenerd! A progress list of the required criteria for merging this PR into |
|
@parttimenerd The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
tstuefe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Johannes,
Thanks for doing this, solving this makes sense.
But I'm not sure yours is the right approach. I think it would better to use SafeFetch to check the addresses in the relevant registers.
Using Safefetch would mean that we don't depend on the existence of Thread (which may be NULL, especially in signal contexts). It would work if the registers erroneously point into unmapped or guarded portions of the stack, or if Thread is corrupted or outdated. And it would be way simpler, since it would not require a new version of is_first_C_frame.
I also find the interface - passing Thread* to the function just for it to then do error checking - slightly off. Without any comment on the prototype explaining what this argument is for, this causes head scratching. And semantically, there is only one instance of Thread this can ever be called for.
A function like this:
// check if frame is valid within the Thread's stack
bool Thread::is_valid_frame(const frame*)
would actually be clearer.
And if this error check is necessary, why do we then need two variants of is_first_c_frame? Should the error check not always happen?
But bottom line, I think safefetch would be a simpler and more robust approach.
Cheers, Thomas
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to understand the motivation for this change and what problem is being solved.
Do all these extra checks need to be done in product bits or would debug-only work? What kind of errors are we trying to guard against by doing this?
Thanks,
David
| // is_first_C_frame() does only simple checks for frame pointer, | ||
| // it will pass if java compiled code has a pointer in EBP. | ||
| if (os::is_first_C_frame(&fr)) return invalid; | ||
| if (os::is_first_C_frame(&fr, t)) return invalid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment still accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? But maybe removing the second line would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But are the checks still "simple"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the change proposed by Thomas: I think so, it still only checks the pointer value and safefetches the value of the stack pointer, ... to check whether they are valid.
They currently do not affect production code, but I forgot that the The main motivation is to prevent crashes in native stack walking in cases where just calling And to @tstuefe:
Thanks for the comment. I missed that safefetch does exactly what I want,and hopefully without a large performance penalty?). |
|
The last commit rewrites it to something that might resemble Thomas' ideas. |
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks much better/cleaner - thanks.
Do we have any crash tests we can use to verify this?
Thanks,
David
|
Hi Johannes, thanks for taking my suggestion. This is better, and helps beyond your AsyncGetCallTrace scenario (e.g. in NMT). safefetch works as an unconditional sub routine call to a prolog-free piece of code which does a single load. Basically: and the signal handler knows how to handle things if a segfault happens at (2). So, for the standard case, if no fault happens, you pay for a subroutine call and a load. This is as cheap as it gets, but still not as cheap as a single inline load would be. Still, I'm not sure I would add this to such a low-level function as frame::link(), at least not without analyzing the callers. Most of the callers of frame::link don't seem to be that performance-sensitive that a sub-routine call would throw them off. But I'm not sure here. Moreover, even though your solution is beautifully simple, I don't like "lying" at this level. There may be cases where we rather have an honest crash when dereferencing an invalid frame, because we may want to analyze the root cause. What I actually had in mind - sorry I was not too clear in my first review - was to use SafeFetch inside is_first_C_frame to check the validity of the link before dereferencing it. Note that we have @dholmes-ora : the motivation is to harden a piece of code which may run in unsafe situations in production scenarios. Examples: AsyncGetCallTrace, stack printing in error reports, stack printing in NMT... Error handling has its secondary crash guards, but the other scenarios are "naked". And we have downstream additional facilities which use VM stack printing. About a test, I agree, that would be nice. But one would have to "fake" an invalid stack. Maybe a new error reporting test where one deliberately overwrites portions of the stack and then tries to print the stack. However, I imagine things could be brittle, because the OS may catch a stack overwrite first. It's not totally trivial, maybe something for a separate RFE? Cheers, Thomas |
|
Know I understand. I simple test would be to just allocate an area of zeroes and then create a frame for it. The proposed changes should prevent it from crashing. |
|
I changed it again, introducing "frame::link_or_null()" that is the safe version of "frame::link()".
I think tests would be nice but also quite difficult. A simple test would be to allocate a frame with zero values for all entries and check that |
tstuefe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost good now. Small remarks remain.
tstuefe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Johannes,
Getting closer. More remarks inline.
Cheers, Thomas
The problem is that registering a thread for NMT uses the os::is_first_C_frame method which calls Thread::enable_wx internally. But enable_wx requires that the init_wx method has been called before, not after. Swapping two lines therefore fixes the problem.
|
The failing tests are related to https://bugs.openjdk.java.net/browse/JDK-8282475, fixed in #7727 |
|
The SafeFetch PR is more work. I modified the CanUseSafeFetch methods. This should fix the tests. |
tstuefe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. If you add the comment about JDK-8282475 (see inline remark) this is fine.
Cheers, Thomas
|
@parttimenerd This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 271 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@RealCLanger, @tstuefe, @dholmes-ora, @TheRealMDoerr) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
TheRealMDoerr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except minor nits.
TheRealMDoerr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
|
/integrate |
|
@parttimenerd |
|
/sponsor |
|
Going to push as commit 999da9b.
Your commit was automatically rebased without conflicts. |
|
@tstuefe @parttimenerd Pushed as commit 999da9b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR introduces a new method
can_access_linkinto the frame class to check the accessibility of the link information. It furthermore adds a newos::is_first_C_frame(frame*, Thread*)that uses thecan_access_linkmethodand the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591$ git checkout pull/7591Update a local copy of the PR:
$ git checkout pull/7591$ git pull https://git.openjdk.java.net/jdk pull/7591/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7591View PR using the GUI difftool:
$ git pr show -t 7591Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7591.diff