Skip to content

Conversation

@rjernst
Copy link
Member

@rjernst rjernst commented Jul 17, 2023

This commit rewrites the DieWithDignity test to use the new test infra. A side effect of this change is that it no longer relies on jps, which appears to have issues on Windows.

closes #77282

This commit rewrites the DieWithDignity test to use the new test infra.
A side effect of this change is that it no longer relies on jps, which
appears to have issues on Windows.

closes elastic#77282
@rjernst rjernst added >test Issues or PRs that are addressing/adding tests :Core/Infra/Core Core issues without another label v8.10.0 labels Jul 17, 2023
@rjernst rjernst requested a review from mark-vieira July 17, 2023 21:41
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jul 17, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, otherwise LGTM. Have we done any testing on Windows here? This seems like the kind of thing that would be sensitive to platform differences and I don't see anything expressly forbidding Windows execution.

GradleUtils.extendSourceSet(project, "main", "javaRestTest", tasks.named("javaRestTest"))

tasks.named("javaRestTest").configure {
it.onlyIf("snapshot build") { BuildParams.isSnapshotBuild() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't these tests run against release builds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses a test external-module, which are only bundled with snapshots. This is also why we must use the default distribution. We could probably rework this, but I'd like to not change that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, figured it out.

// disable exit on out of memory error to let DieWithDignityIT verify that OOM handling without that works (including OOMs that are not caused by
// memory like native threads. We leave it to the JVM to test that exit on OOM works via the flag.
jvmArgs '-XX:-ExitOnOutOfMemoryError'
usesDefaultDistribution()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it strictly necessary for these tests to use the default distribution? Could they not use the integ-test distro?


@ClassRule
public static ElasticsearchCluster cluster = ElasticsearchCluster.local()
.nodes(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is superfluous since by default, unless explicitly specified, it'll create a single-node cluster.

@rjernst rjernst merged commit 1d995eb into elastic:main Jul 17, 2023
@rjernst rjernst deleted the test/die-with-dignity-pid4 branch July 17, 2023 23:49
@jakelandis
Copy link
Contributor

@rjernst - any hints to what why this test fails when run with fips -Dtests.fips.enabled=true ?



[2023-07-19T07:59:52,206][INFO ][o.e.q.d.DieWithDignityIT ] [testDieWithDignity] before test |  
-- | --


[2023-07-19T07:59:52,206][INFO ][o.e.q.d.DieWithDignityIT ] [testDieWithDignity] before test	
[2023-07-19T07:59:52,213][INFO ][o.e.h.n.s.HealthNodeTaskExecutor] [test-cluster-0] Node [{test-cluster-0}{-Eadd_T3SUiaIIamFHEE3g}] is selected as the current health node.	
[2023-07-19T07:59:52,244][INFO ][o.e.t.c.l.WaitForHttpResource] [testDieWithDignity] Got successful response [200] from URL [http://127.0.0.1:38835/_cluster/health?wait_for_nodes=>=1&wait_for_status=yellow]	
[2023-07-19T07:59:52,247][INFO ][o.e.q.d.DieWithDignityIT ] [testDieWithDignity] initializing REST clients against [http://127.0.0.1:38835/]	
Jul 19, 2023 7:59:52 AM org.bouncycastle.jsse.provider.ProvKeyManagerFactorySpi getDefaultKeyStore	
INFO: Initializing empty key store	
Jul 19, 2023 7:59:52 AM org.bouncycastle.jsse.provider.ProvTrustManagerFactorySpi getDefaultTrustStore	
INFO: Initializing with trust store at path: /dev/shm/elastic+elasticsearch+pull-request+part-1-fips/test/external-modules/die-with-dignity/build/fips-resources/cacerts.bcfks	
[2023-07-19T07:59:52,323][INFO ][o.e.l.ClusterStateLicenseService] [test-cluster-0] license [667be6cc-d634-4ec6-b521-e31ff038e859] mode [trial] - valid	
java.lang.OutOfMemoryError: Requested array size exceeds VM limit	
Dumping heap to /dev/shm/elastic+elasticsearch+pull-request+part-1-fips/test/external-modules/die-with-dignity/build/testrun/javaRestTest/temp/test-cluster10166251179603076161/test-cluster-0/logs/java_pid313092.hprof ...	
Heap dump file created [154368866 bytes in 0.541 secs]	
Terminating due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit	
Enter password for the elasticsearch keystore : 	
ERROR: Elasticsearch exited unexpectedly, with exit code 3	
[2023-07-19T07:59:53,793][INFO ][o.e.q.d.DieWithDignityIT ] [testDieWithDignity] after test	
REPRODUCE WITH: ./gradlew ':test:external-modules:test-die-with-dignity:javaRestTest' --tests "org.elasticsearch.qa.die_with_dignity.DieWithDignityIT.testDieWithDignity" -Dtests.seed=AD752CA150BF7783 -Dtests.locale=id -Dtests.timezone=Australia/Lord_Howe -Druntime.java=20 -Dtests.fips.enabled=true

@rjernst
Copy link
Member Author

rjernst commented Jul 19, 2023

The exit code is not what I would expect:

1> Terminating due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit |  
2> ERROR: Elasticsearch exited unexpectedly, with exit code 3

But our uncaught exception handler (which handles termination from Errors) halts with code 127 on OOM. The test fails because it does not see the log messages the uncaught exception handler emits.

Is FIPS replacing our uncaught exception handler somehow? Or using some other trickiness to intercept the OOM and not allow our uncaught exception handler to proceed?

@tvernum @ChrisHegarty any ideas on what FIPS might be doing or ways our uncaught exception handler might be overhsadowed?

@tvernum
Copy link
Contributor

tvernum commented Jul 19, 2023

Well, that was fun to debug...

It turns out that it has nothing to do with the uncaught exception handler. It was entirely because (on FIPS) this line was ineffective:

And the reason is due to a minor bug in the way the test cluster framework puts together the environment properties.
The FIPS build has additional system properties (to configure BCFIPS) and the combination of the args was broken.

I opened #97776

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v8.10.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DieWithDignityIT.testDieWithDignity failures on CI

5 participants