Skip to content

Conversation

@graalvmbot
Copy link
Collaborator

With this change, the libsvm_container is initialized eagerly, just after parsing isolate arguments. This allows us to

  • get rid of checks to ContainerLibrary#isInitialized,
  • initialize PhysicalMemory eagerly and thus to
  • remove calls to PhysicalMemory#isInitialzied() and the need to deal with the uninitialized case.

As a result, the GC works with the right values from the beginning. This can be seen in action with a small test case:

public class Main {
  private static byte[] array;

  public static void main(String[] args) {
    try {
      array = new byte[500 * 1024 * 1024];
    } catch (OutOfMemoryError e) {
     System.out.println("Saw expected OutOfMemoryError");
     return;
    }
    throw new RuntimeException("Didn't saw the OutOfMemoryError");
  }
}

Compile and build native image:

javac Main.java
native-image Main

Running unrestricted (on a system with more than 500M ram):

./main
Exception in thread "main" java.lang.RuntimeException: Didn't saw the OutOfMemoryError
        at Main.main(Main.java:12)
        at java.base@24/java.lang.invoke.LambdaForm$DMH/sa346b79c.invokeStaticInit(LambdaForm$DMH)

Running in a cgroup restricted to 100M (e.g., via systemd).

Before:

systemd-run --scope -p MemoryMax=100M -p MemorySwapMax=0 --user ./main
Running scope as unit: run-re1846651fff346f28ce9b87f003afa1d.scope
Killed

Note that the OutOfMemoryError was not thrown, but the process was killed by the system due to exceeding its assigned memory. This is because the GC ran before the container support was fully initialized and thus worked with incorrect heap size assumptions.

After:

systemd-run --scope -p MemoryMax=100M -p MemorySwapMax=0 --user /home/zapster/graal/mx-graalvm/graal/substratevm/main
Running scope as unit: run-r18f0be6bc6974a57ae0df6739c695655.scope
Saw expected OutOfMemoryError

With this change, the GC knows the right memory limits up front and can correctly throw the OutOfMemoryError.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 13, 2024
@zapster zapster self-assigned this Aug 13, 2024
@zapster
Copy link
Member

zapster commented Aug 13, 2024

FYI @jerboaa

@jerboaa
Copy link
Collaborator

jerboaa commented Aug 13, 2024

Running in a cgroup restricted to 100M (e.g., via systemd).

@zapster The systemd mention peaked my interest. Note that the hotspot code doesn't yet work correctly on systemd slices on cgroups v2 (JDK-8322420). So depending where you test this you might get wrong results. I'd suggest to run in a container with volume mounts instead.

@zapster
Copy link
Member

zapster commented Aug 13, 2024

Note that the hotspot code doesn't yet work correctly on systemd slices on cgroups v2 (JDK-8322420).

Thanks a lot for the pointer! That could indeed lead to some surprises. :)

From the ticket it seems that all hierarchical cgroup v2 are affected, e.g., also those created via cgcreate, correct?

I'd suggest to run in a container with volume mounts instead.

Just curious, why are containers not affected? Are they not also using the same cgroup hierarchies in the background?

@zapster
Copy link
Member

zapster commented Aug 13, 2024

Update: I guess https://mail.openjdk.org/pipermail/container-discuss/2023-November/000001.html answers my question. :)

@jerboaa
Copy link
Collaborator

jerboaa commented Aug 13, 2024

Note that the hotspot code doesn't yet work correctly on systemd slices on cgroups v2 (JDK-8322420).

Thanks a lot for the pointer! That could indeed lead to some surprises. :)

From the ticket it seems that all hierarchical cgroup v2 are affected, e.g., also those created via cgcreate, correct?

It depends on the way the cgroup hierarchy is set up. When using docker/podman the limits are set at the leaf nodes so are not affected. Systemd slices do this slightly differently and for cgroup v2 the work-around added with JDK-8217338 doesn't work. Therefore, JDK-8322420 is needed.

I'd suggest to run in a container with volume mounts instead.

Just curious, why are containers not affected? Are they not also using the same cgroup hierarchies in the background?

Consider this hierarchy (where a/b is the controller's path):

  a -> b -> memory.max

Then Hotspot detects the limit by looking at a/b/memory.max only. It doesn't look at a/memory.max if a/b/memory.max is max (i.e. unlimited). Systemd slices on the other hand, use a different setup where usually it's one level up (e.g. a/memory.max) where the limit is set. Since container runtimes set up the limits at the leaf nodes it works. For systemd slices on cgroup v2 it doesn't (before JDK-8322420).

@zapster
Copy link
Member

zapster commented Aug 13, 2024

@jerboaa thanks again!

@graalvmbot graalvmbot closed this Aug 14, 2024
@graalvmbot graalvmbot deleted the je/svm-libcontainer-eager-initialization-GR-53451 branch August 14, 2024 09:05
@graalvmbot graalvmbot merged commit bb0af71 into master Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants