Skip to content

Conversation

@sunny868
Copy link
Contributor

@sunny868 sunny868 commented Feb 14, 2023

The current fact is that C1 uses more stack space than the C2 compiler, taking method java.lang.Object::<init>as an example on the x86_64 platform , the stack size used is 48 bytes for C1 compiler, while only 16 bytes is used for C2 compiler.

========== C1-compiled nmethod =====
0x00007f93311cc747: push %rbp
0x00007f93311cc748: sub $0x30,%rsp // stack sizes is 48 bytes

========== C2-compiled nmethod =======
pushq rbp # Save rbp
subq rsp, #16 # Create frame //stack sizes is 16 bytes

After this patch, the C1 compiler will use less stack space. also taking method java.lang.Object::<init>as an example on the x86_64 platform , the stack size used is 16 bytes on L1 and 32 bytes on L3.

========== C1-compiled nmethod =====
Compiled method (c1) 264 24 1 java.lang.Object:: (1 bytes)
0x00007f80491ce647: push %rbp
0x00007f80491ce648: sub $0x10,%rsp //stack sizes is 16 bytes

========== C1-compiled nmethod =====
Compiled method (c1) 283 24 3 java.lang.Object:: (1 bytes)
0x00007f93711d01c7: push %rbp
0x00007f93711d01c8: sub $0x20,%rsp // stack sizes is 32 bytes


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8302369: Reduce the stack size of the C1 compiler

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12548/head:pull/12548
$ git checkout pull/12548

Update a local copy of the PR:
$ git checkout pull/12548
$ git pull https://git.openjdk.org/jdk pull/12548/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12548

View PR using the GUI difftool:
$ git pr show -t 12548

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12548.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 14, 2023

👋 Welcome back sunny868! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 14, 2023
@openjdk
Copy link

openjdk bot commented Feb 14, 2023

@sunny868 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Feb 14, 2023

Webrevs

@sunny868
Copy link
Contributor Author

On the LoongArch64 architecture, tools/javac/lambda/T8031967.java test failed with -XX:TieredStopAtLevel=3 , see #9934 .
This patch solves this problem.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that C1 always allocate stack space for 4 arguments even if a method don't have arguments.

@vnkozlov
Copy link
Contributor

This code is very old (JDK-6320351). I will run our testing to make sure this change does not break anything and I will let you know results.
The good news is GHA testing passed.

assert(offset_from_rsp_in_words >= 0, "invalid offset from rsp");
int offset_from_rsp_in_bytes = offset_from_rsp_in_words * BytesPerWord;
assert(offset_from_rsp_in_bytes < frame_map()->reserved_argument_area_size(), "invalid offset");
assert(offset_from_rsp_in_bytes <= frame_map()->reserved_argument_area_size(), "invalid offset");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change. Doesn't the byte range go from 0 to frame_map()->reserved_argument_area_size() - 1?

Copy link
Contributor Author

@sunny868 sunny868 Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really confusing here, I'm sure reserved_argument_area_size contains extra stubs for jsr292, but not contains other stub slots (i.e. CounterOverflowStub), so it should not be used here for comparison.

{
PhaseTraceTime timeit(_t_emit_lir);

_frame_map = new FrameMap(method(), hir()->number_of_locks(), MAX2(4, hir()->max_stack()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be a minimum size (maybe 2 instead of 4?) because of code like CounterOverflowStub, RangeCheckStub, and generate_c1_load_barrier_stub() that call store_parameter() with small values (0, 1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to reduce the minimum size lower than 2, C1 would need to determine (in advance?) if any stubs might be called. Maybe there is a way to calculate actual frame size required based on actual stores emitted, but that seems tricky if the prologue has already been emitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// src/hotspot/share/c1/c1_FrameMap.cpp
// bool FrameMap::finalize_frame(int nof_slots) { 

191   _framesize =  align_up(in_bytes(sp_offset_for_monitor_base(0)) +              
192                          _num_monitors * (int)sizeof(BasicObjectLock) +         
193                          (int)sizeof(intptr_t) +                        // offset of deopt orig pc
194                          frame_pad_in_bytes,                                    
195                          StackAlignmentInBytes) / 4; 

Here the value of (int)sizeof(intptr_t) is 8 and the value of StackAlignmentInBytes is 16, so the minimum stack size of a method is guaranteed to be 16, which should ensure that CounterOverflowStub (or other stub) needs two slots(0, 1).
But I don't understand what the addition of (int)sizeof(intptr_t) does here and what does // offset of deopt orig pc mean, do you know?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be to support nmethod::orig_pc_offset().

@vnkozlov
Copy link
Contributor

vnkozlov commented Feb 15, 2023

I got SEGV in almost all serviceability/sa/ tests when run with -Xcomp:

V  [libjvm.so+0xa8c950]  ClassLoaderData::is_alive() const+0x0  (classLoaderData.cpp:644)
V  [libjvm.so+0x16a3b43]  nmethod::metadata_do(MetadataClosure*)+0x303  (nmethod.cpp:1614)
V  [libjvm.so+0xb44761]  CompiledMethod::unload_nmethod_caches(bool)+0x101  (compiledMethod.cpp:557)
V  [libjvm.so+0x16a299b]  nmethod::do_unloading(bool)+0xdb  (nmethod.cpp:1742)
V  [libjvm.so+0x177577b]  CodeCacheUnloadingTask::work(unsigned int)+0x9b  (parallelCleaning.cpp:95)
V  [libjvm.so+0xe77f77]  G1ParallelCleaningTask::work(unsigned int)+0x27  (g1ParallelCleaning.cpp:70)
V  [libjvm.so+0x1bf6d10]  WorkerThread::run()+0x80  (workerThread.cpp:69)
V  [libjvm.so+0x1a93a10]  Thread::call_run()+0x100  (thread.cpp:224)
V  [libjvm.so+0x17341a3]  thread_native_entry(Thread*)+0x103  (os_linux.cpp:737)

Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this change is too simplistic and the change to use <= instead of < only works because stubs only use slots 0 and 1 and max_stack() has an extra slot added for JSR292.

@sunny868
Copy link
Contributor Author

Thank you all for the review and @dean-long explanation, I don't understand the compiler well, I'll check it again.

@sunny868
Copy link
Contributor Author

sunny868 commented Feb 15, 2023

@vnkozlov After I ran make test CONF=release JTREG="VM_OPTIONS=-Xcomp" TEST=serviceability/sa/* on x86_64 linux, all tests passed. How can I reproduce the SEGV problem you mentioned above?

@vnkozlov
Copy link
Contributor

vnkozlov commented Feb 15, 2023

I was not able to reproduce locally too. I start build and testing with -Xcomp without these changes to see if it is existing issue. It failed with fastdebug VM.

@sunny868
Copy link
Contributor Author

Thanks @vnkozlov, I can't reproduce the problem locally using make test CONF=fastdebug JTREG="VM_OPTIONS=-Xcomp" TEST=serviceability/sa/* also.

@sunny868
Copy link
Contributor Author

In order to reduce the minimum size lower than 2, C1 would need to determine (in advance?) if any stubs might be called. Maybe there is a way to calculate actual frame size required based on actual stores emitted, but that seems tricky if the prologue has already been emitted.

I had do a simple verification locally to confirm stubs be called happened before prologue be emitted, so this scheme is feasible. Thank @dean-long for your suggestion, I will make more detailed changes, test and submit new patch later.

assert(monitors >= 0, "not set");
_num_monitors = monitors;
assert(reserved_argument_area_size >= 0, "not set");
_reserved_argument_area_size = MAX2(4, reserved_argument_area_size) * BytesPerWord;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line probably deserves a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vnkozlov
Copy link
Contributor

My -Xcomp run without changes passed without issues. So something in these changes causing problem on machines we use (AMD and old MacOs x86 which have only AVX2).

@sunny868
Copy link
Contributor Author

Thanks @vnkozlov a lot. The machines with the fatal SEGV error is 32-bit architecture, right? Or can you list the hardware information of the machine to me?

@sunny868
Copy link
Contributor Author

sunny868 commented Feb 16, 2023

commit 2 (4125dbb) does the following two things:

  1. Update the reserved_argument_area_size when stubs with actual store emits(call store_parameter()) are created.
  2. The constructors for RangeCheckStub and MonitorEnterStub are moved to the common layer implementation, not in each
    cpu/arch/ directory.

@vnkozlov
Copy link
Contributor

I tested only 64-bit fastdebug VM.
MacMini i7 12 threads (avx2), 32Gb
Windows server 2019, AMD Epyc VM 12 threads (sse4), 24.00 GB
SA jtreg tests were run with additional options: -concurrency:6 -timeoutFactor:4 -vmoption:-XX:MaxRAMPercentage=4.16667

@sunny868
Copy link
Contributor Author

@vnkozlov Can you help me test the new commit 2 (4125dbb) again? I've tested tier1 tier2 locally for X86_64 and AArch64 and tier1-3 for LoongArch64, and the results are all PASSED (some tests failed but appear to be unrelated to the patch).

@vnkozlov
Copy link
Contributor

@vnkozlov Can you help me test the new commit 2 (4125dbb) again? I've tested tier1 tier2 locally for X86_64 and AArch64 and tier1-3 for LoongArch64, and the results are all PASSED (some tests failed but appear to be unrelated to the patch).

I submitted new testing. But you need approval from @dean-long

@openjdk
Copy link

openjdk bot commented Feb 17, 2023

@sunny868 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout 8302369
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Feb 17, 2023
@vnkozlov
Copy link
Contributor

-Xcomp testing passed with latest version. I am running other tiers now.

#else
f->update_reserved_argument_area_size(2 * BytesPerWord);
#endif
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving the CPU-specific value to CPU-specific code? Maybe c1_Defs_<cpu>.hpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 18, 2023
Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@openjdk
Copy link

openjdk bot commented Feb 18, 2023

⚠️ @sunny868 the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout 8302369
$ git commit --author='Preferred Full Name <[email protected]>' --allow-empty -m 'Update full name'
$ git push

@openjdk
Copy link

openjdk bot commented Feb 18, 2023

@sunny868 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8302369: Reduce the stack size of the C1 compiler

Reviewed-by: dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @dean-long) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 18, 2023
@sunny868
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Feb 18, 2023
@openjdk
Copy link

openjdk bot commented Feb 18, 2023

@sunny868
Your change (at version 4c9df06) is now ready to be sponsored by a Committer.

@sunny868
Copy link
Contributor Author

How can I not see the details of GHA test failure https://github.com/sunny868/jdk/actions/runs/4209080257/jobs/7307234271 . Now just show "This job failed"

@vnkozlov
Copy link
Contributor

I ran testing for version 03 and it passed.

@vnkozlov
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented Feb 20, 2023

Going to push as commit 36a0822.
Since your change was applied there have been 25 commits pushed to the master branch:

  • 0bf3a53: 8302599: Extend ASan support to Microsoft Visual C++
  • c7517b3: 8302525: Write a test to check various components send Events while mouse and key are used simultaneously
  • 9a79722: 8299234: JMX Repository.query performance
  • e47e9ec: 8300658: memory_and_swap_limit() reporting wrong values on systems with swapaccount=0
  • 7cf7e0a: 8302070: Factor null-check into load_klass() calls
  • e731695: 8302882: Newly added test javax/swing/JFileChooser/JFileChooserFontReset.java fails with HeadlessException
  • b5a7426: 8301749: Tracking malloc pooled memory size
  • 6ac5e05: 8302068: Serial: Refactor oop closures used in Young GC
  • 71cf7c4: 8302518: Add missing Op_RoundDoubleMode in VectorNode::vector_operands()
  • 98716e2: 8302709: Remove explicit remembered set verification in G1
  • ... and 15 more: https://git.openjdk.org/jdk/compare/43cf8b3d8067bc7128c98f86d5f8b6fa8bbed80e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 20, 2023
@openjdk openjdk bot closed this Feb 20, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Feb 20, 2023
@openjdk
Copy link

openjdk bot commented Feb 20, 2023

@vnkozlov @sunny868 Pushed as commit 36a0822.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@sunny868
Copy link
Contributor Author

sunny868 commented Feb 21, 2023

Thanks @vnkozlov @dean-long @theRealAph for review.

@sunny868 sunny868 deleted the 8302369 branch February 21, 2023 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants