-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8302369: Reduce the stack size of the C1 compiler #12548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back sunny868! A progress list of the required criteria for merging this PR into |
Webrevs
|
|
On the LoongArch64 architecture, |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that C1 always allocate stack space for 4 arguments even if a method don't have arguments.
|
This code is very old (JDK-6320351). I will run our testing to make sure this change does not break anything and I will let you know results. |
| assert(offset_from_rsp_in_words >= 0, "invalid offset from rsp"); | ||
| int offset_from_rsp_in_bytes = offset_from_rsp_in_words * BytesPerWord; | ||
| assert(offset_from_rsp_in_bytes < frame_map()->reserved_argument_area_size(), "invalid offset"); | ||
| assert(offset_from_rsp_in_bytes <= frame_map()->reserved_argument_area_size(), "invalid offset"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. Doesn't the byte range go from 0 to frame_map()->reserved_argument_area_size() - 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really confusing here, I'm sure reserved_argument_area_size contains extra stubs for jsr292, but not contains other stub slots (i.e. CounterOverflowStub), so it should not be used here for comparison.
| { | ||
| PhaseTraceTime timeit(_t_emit_lir); | ||
|
|
||
| _frame_map = new FrameMap(method(), hir()->number_of_locks(), MAX2(4, hir()->max_stack())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be a minimum size (maybe 2 instead of 4?) because of code like CounterOverflowStub, RangeCheckStub, and generate_c1_load_barrier_stub() that call store_parameter() with small values (0, 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to reduce the minimum size lower than 2, C1 would need to determine (in advance?) if any stubs might be called. Maybe there is a way to calculate actual frame size required based on actual stores emitted, but that seems tricky if the prologue has already been emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// src/hotspot/share/c1/c1_FrameMap.cpp
// bool FrameMap::finalize_frame(int nof_slots) {
191 _framesize = align_up(in_bytes(sp_offset_for_monitor_base(0)) +
192 _num_monitors * (int)sizeof(BasicObjectLock) +
193 (int)sizeof(intptr_t) + // offset of deopt orig pc
194 frame_pad_in_bytes,
195 StackAlignmentInBytes) / 4;
Here the value of (int)sizeof(intptr_t) is 8 and the value of StackAlignmentInBytes is 16, so the minimum stack size of a method is guaranteed to be 16, which should ensure that CounterOverflowStub (or other stub) needs two slots(0, 1).
But I don't understand what the addition of (int)sizeof(intptr_t) does here and what does // offset of deopt orig pc mean, do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be to support nmethod::orig_pc_offset().
|
I got SEGV in almost all |
dean-long
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this change is too simplistic and the change to use <= instead of < only works because stubs only use slots 0 and 1 and max_stack() has an extra slot added for JSR292.
|
Thank you all for the review and @dean-long explanation, I don't understand the compiler well, I'll check it again. |
|
@vnkozlov After I ran |
|
I was not able to reproduce locally too. I start build and testing with -Xcomp without these changes to see if it is existing issue. It failed with |
|
Thanks @vnkozlov, I can't reproduce the problem locally using |
I had do a simple verification locally to confirm |
| assert(monitors >= 0, "not set"); | ||
| _num_monitors = monitors; | ||
| assert(reserved_argument_area_size >= 0, "not set"); | ||
| _reserved_argument_area_size = MAX2(4, reserved_argument_area_size) * BytesPerWord; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line probably deserves a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
My -Xcomp run without changes passed without issues. So something in these changes causing problem on machines we use (AMD and old MacOs x86 which have only AVX2). |
|
Thanks @vnkozlov a lot. The machines with the fatal SEGV error is 32-bit architecture, right? Or can you list the hardware information of the machine to me? |
|
commit 2 (4125dbb) does the following two things:
|
|
I tested only 64-bit fastdebug VM. |
I submitted new testing. But you need approval from @dean-long |
|
@sunny868 this pull request can not be integrated into git checkout 8302369
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
-Xcomp testing passed with latest version. I am running other tiers now. |
| #else | ||
| f->update_reserved_argument_area_size(2 * BytesPerWord); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving the CPU-specific value to CPU-specific code? Maybe c1_Defs_<cpu>.hpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dean-long
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
|
|
@sunny868 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @dean-long) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
/integrate |
|
How can I not see the details of GHA test failure https://github.com/sunny868/jdk/actions/runs/4209080257/jobs/7307234271 . Now just show "This job failed" |
|
I ran testing for version 03 and it passed. |
|
/sponsor |
|
Going to push as commit 36a0822.
Your commit was automatically rebased without conflicts. |
|
Thanks @vnkozlov @dean-long @theRealAph for review. |
The current fact is that C1 uses more stack space than the C2 compiler, taking method
java.lang.Object::<init>as an example on the x86_64 platform , the stack size used is 48 bytes for C1 compiler, while only 16 bytes is used for C2 compiler.========== C1-compiled nmethod =====
0x00007f93311cc747: push %rbp
0x00007f93311cc748: sub $0x30,%rsp // stack sizes is 48 bytes
========== C2-compiled nmethod =======
pushq rbp # Save rbp
subq rsp, #16 # Create frame //stack sizes is 16 bytes
After this patch, the C1 compiler will use less stack space. also taking method
java.lang.Object::<init>as an example on the x86_64 platform , the stack size used is 16 bytes on L1 and 32 bytes on L3.========== C1-compiled nmethod =====
Compiled method (c1) 264 24 1 java.lang.Object:: (1 bytes)
0x00007f80491ce647: push %rbp
0x00007f80491ce648: sub $0x10,%rsp //stack sizes is 16 bytes
========== C1-compiled nmethod =====
Compiled method (c1) 283 24 3 java.lang.Object:: (1 bytes)
0x00007f93711d01c7: push %rbp
0x00007f93711d01c8: sub $0x20,%rsp // stack sizes is 32 bytes
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12548/head:pull/12548$ git checkout pull/12548Update a local copy of the PR:
$ git checkout pull/12548$ git pull https://git.openjdk.org/jdk pull/12548/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12548View PR using the GUI difftool:
$ git pr show -t 12548Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12548.diff