Skip to content

Conversation

@parttimenerd
Copy link
Contributor

@parttimenerd parttimenerd commented Feb 23, 2022

This PR introduces a new method can_access_link into the frame class to check the accessibility of the link information. It furthermore adds a new os::is_first_C_frame(frame*, Thread*) that uses the can_access_link method
and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8282306: os::is_first_C_frame(frame*) crashes on invalid link access

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591
$ git checkout pull/7591

Update a local copy of the PR:
$ git checkout pull/7591
$ git pull https://git.openjdk.java.net/jdk pull/7591/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7591

View PR using the GUI difftool:
$ git pr show -t 7591

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7591.diff

Prevents segmentation faults in os::is_first_C_frame
if passed a thread.
@bridgekeeper
Copy link

bridgekeeper bot commented Feb 23, 2022

👋 Welcome back parttimenerd! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8282306: os::is_first_C_frame(frame*) crashes on invalid link access 8282306: os::is_first_C_frame(frame*) crashes on invalid link access Feb 23, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 23, 2022
@openjdk
Copy link

openjdk bot commented Feb 23, 2022

@parttimenerd The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Feb 23, 2022

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Johannes,

Thanks for doing this, solving this makes sense.

But I'm not sure yours is the right approach. I think it would better to use SafeFetch to check the addresses in the relevant registers.

Using Safefetch would mean that we don't depend on the existence of Thread (which may be NULL, especially in signal contexts). It would work if the registers erroneously point into unmapped or guarded portions of the stack, or if Thread is corrupted or outdated. And it would be way simpler, since it would not require a new version of is_first_C_frame.

I also find the interface - passing Thread* to the function just for it to then do error checking - slightly off. Without any comment on the prototype explaining what this argument is for, this causes head scratching. And semantically, there is only one instance of Thread this can ever be called for.

A function like this:

// check if frame is valid within the Thread's stack
bool Thread::is_valid_frame(const frame*)

would actually be clearer.

And if this error check is necessary, why do we then need two variants of is_first_c_frame? Should the error check not always happen?

But bottom line, I think safefetch would be a simpler and more robust approach.

Cheers, Thomas

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to understand the motivation for this change and what problem is being solved.

Do all these extra checks need to be done in product bits or would debug-only work? What kind of errors are we trying to guard against by doing this?

Thanks,
David

Comment on lines 336 to 338
// is_first_C_frame() does only simple checks for frame pointer,
// it will pass if java compiled code has a pointer in EBP.
if (os::is_first_C_frame(&fr)) return invalid;
if (os::is_first_C_frame(&fr, t)) return invalid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment still accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? But maybe removing the second line would be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are the checks still "simple"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change proposed by Thomas: I think so, it still only checks the pointer value and safefetches the value of the stack pointer, ... to check whether they are valid.

@parttimenerd
Copy link
Contributor Author

Do all these extra checks need to be done in product bits or would debug-only work? What kind of errors are we trying to guard against by doing this?

They currently do not affect production code, but I forgot that the NativeCallStack class exists that can make use of it (especially when using the simpler API as @tstuefe correctly proposed).

The main motivation is to prevent crashes in native stack walking in cases where just calling frame.is_safe_for_sender would return false, but a walk is still possible (typically on the bottom of the native call stack). I currently observe these crashes while walking on AsyncGetCallTrace modifications.

And to @tstuefe:

But bottom line, I think safefetch would be a simpler and more robust approach.

Thanks for the comment. I missed that safefetch does exactly what I want,and hopefully without a large performance penalty?).

@parttimenerd
Copy link
Contributor Author

The last commit rewrites it to something that might resemble Thomas' ideas.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks much better/cleaner - thanks.

Do we have any crash tests we can use to verify this?

Thanks,
David

@tstuefe
Copy link
Member

tstuefe commented Feb 24, 2022

Hi Johannes,

thanks for taking my suggestion. This is better, and helps beyond your AsyncGetCallTrace scenario (e.g. in NMT).

safefetch works as an unconditional sub routine call to a prolog-free piece of code which does a single load. Basically:

(1) jump <safefetch pc> -> (2) load from questionable address -> (3) return

and the signal handler knows how to handle things if a segfault happens at (2).

So, for the standard case, if no fault happens, you pay for a subroutine call and a load. This is as cheap as it gets, but still not as cheap as a single inline load would be.


Still, I'm not sure I would add this to such a low-level function as frame::link(), at least not without analyzing the callers. Most of the callers of frame::link don't seem to be that performance-sensitive that a sub-routine call would throw them off. But I'm not sure here.

Moreover, even though your solution is beautifully simple, I don't like "lying" at this level. There may be cases where we rather have an honest crash when dereferencing an invalid frame, because we may want to analyze the root cause.

What I actually had in mind - sorry I was not too clear in my first review - was to use SafeFetch inside is_first_C_frame to check the validity of the link before dereferencing it. is_first_C_frame() is not super performace-critical, so it should be fine to use safefetch here.

Note that we have os::is_readable_pointer() which encapsulates SafeFetch for checking pointer validity. So I imagine something like this:

bool frame::link_is_valid() {
	return os::is_readable_pointer(link);
}

...

bool os::is_first_C_frame(frame* fr) {
...
  // If the link address is invalid we are not walkable beyond this point.
  if (!fr->link_is_valid()) return true;
}

@dholmes-ora : the motivation is to harden a piece of code which may run in unsafe situations in production scenarios. Examples: AsyncGetCallTrace, stack printing in error reports, stack printing in NMT... Error handling has its secondary crash guards, but the other scenarios are "naked". And we have downstream additional facilities which use VM stack printing.

About a test, I agree, that would be nice. But one would have to "fake" an invalid stack. Maybe a new error reporting test where one deliberately overwrites portions of the stack and then tries to print the stack. However, I imagine things could be brittle, because the OS may catch a stack overwrite first. It's not totally trivial, maybe something for a separate RFE?

Cheers, Thomas

@parttimenerd
Copy link
Contributor Author

Know I understand.

I simple test would be to just allocate an area of zeroes and then create a frame for it. The proposed changes should prevent it from crashing.

@parttimenerd
Copy link
Contributor Author

I changed it again, introducing "frame::link_or_null()" that is the safe version of "frame::link()".

About a test, I agree, that would be nice. But one would have to "fake" an invalid stack. Maybe a new error reporting test where one deliberately overwrites portions of the stack and then tries to print the stack. However, I imagine things could be brittle, because the OS may catch a stack overwrite first. It's not totally trivial, maybe something for a separate RFE?

I think tests would be nice but also quite difficult. A simple test would be to allocate a frame with zero values for all entries and check that os::is_first_C_frame returns true and that frame::link_or_null() returns also null. Then the same with a good frame (pointing to sensible values).

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost good now. Small remarks remain.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Feb 25, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 25, 2022
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Johannes,

Getting closer. More remarks inline.

Cheers, Thomas

The problem is that registering a thread for NMT uses
the os::is_first_C_frame method which calls Thread::enable_wx
internally. But enable_wx requires that the init_wx method
has been called before, not after.
Swapping two lines therefore fixes the problem.
@openjdk openjdk bot removed the rfr Pull request is ready for review label Feb 28, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 28, 2022
@parttimenerd
Copy link
Contributor Author

The failing tests are related to https://bugs.openjdk.java.net/browse/JDK-8282475, fixed in #7727

@parttimenerd
Copy link
Contributor Author

The SafeFetch PR is more work. I modified the CanUseSafeFetch methods. This should fix the tests.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. If you add the comment about JDK-8282475 (see inline remark) this is fine.

Cheers, Thomas

@openjdk
Copy link

openjdk bot commented Mar 15, 2022

@parttimenerd This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8282306: os::is_first_C_frame(frame*) crashes on invalid link access

Reviewed-by: stuefe, mdoerr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 271 new commits pushed to the master branch:

  • cb576da: 8283379: Memory leak in FileHeaderHelper
  • 909986c: 8283217: Leak FcObjectSet in getFontConfigLocations() in fontpath.c
  • b617f1d: 8283447: Remove unused LIR_Assembler::_bs
  • eb4849e: 8283327: Add methods to save/restore registers when calling into the VM from C1/interpreter barrier code
  • fd93015: 8283332: G1: Stricter assertion in G1BlockOffsetTablePart::forward_to_block_containing_addr
  • ec62d90: 8283365: G1: Remove duplicate assertions in HeapRegion::oops_on_memregion_seq_iterate_careful
  • e709cb0: 8283186: Explicitly pass a third temp register to MacroAssembler::store_heap_oop
  • 83a1c90: 8282789: Create a regression test for the JTree usecase of JDK-4618767
  • b451273: 8282548: Create a regression test for JDK-4330998
  • 8a2d5ab: 8282270: java/awt/Robot Screen Capture tests fail after 8280861
  • ... and 261 more: https://git.openjdk.java.net/jdk/compare/e1c98bd1f2f57ddf47e4660038059117af87f938...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@RealCLanger, @tstuefe, @dholmes-ora, @TheRealMDoerr) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 15, 2022
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except minor nits.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@parttimenerd
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 21, 2022
@openjdk
Copy link

openjdk bot commented Mar 21, 2022

@parttimenerd
Your change (at version 5cb5668) is now ready to be sponsored by a Committer.

@tstuefe
Copy link
Member

tstuefe commented Mar 21, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 21, 2022

Going to push as commit 999da9b.
Since your change was applied there have been 272 commits pushed to the master branch:

  • c4dc58e: 8283277: ISO 4217 Amendment 171 Update
  • cb576da: 8283379: Memory leak in FileHeaderHelper
  • 909986c: 8283217: Leak FcObjectSet in getFontConfigLocations() in fontpath.c
  • b617f1d: 8283447: Remove unused LIR_Assembler::_bs
  • eb4849e: 8283327: Add methods to save/restore registers when calling into the VM from C1/interpreter barrier code
  • fd93015: 8283332: G1: Stricter assertion in G1BlockOffsetTablePart::forward_to_block_containing_addr
  • ec62d90: 8283365: G1: Remove duplicate assertions in HeapRegion::oops_on_memregion_seq_iterate_careful
  • e709cb0: 8283186: Explicitly pass a third temp register to MacroAssembler::store_heap_oop
  • 83a1c90: 8282789: Create a regression test for the JTree usecase of JDK-4618767
  • b451273: 8282548: Create a regression test for JDK-4330998
  • ... and 262 more: https://git.openjdk.java.net/jdk/compare/e1c98bd1f2f57ddf47e4660038059117af87f938...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 21, 2022
@openjdk openjdk bot closed this Mar 21, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 21, 2022
@openjdk
Copy link

openjdk bot commented Mar 21, 2022

@tstuefe @parttimenerd Pushed as commit 999da9b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot [email protected] integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants