Add test for dying with dignity #28987

jasontedor · 2018-03-12T09:29:52Z

I have long wanted an actual test that dying with dignity works. It is tricky because if dying with dignity works, it means the test JVM dies which is usually an abnormal condition. And anyway, how does one force a fatal error to be thrown. I was motivated to investigate this again by the fact that I missed a backport to one branch leading to an issue where Elasticsearch would not successfully die with dignity. And now we have a solution: we install a plugin that throws an out of memory error when it receives a request. We hack the standalone test infrastructure to prevent this from failing the test. To do this, we bypass the security manager and remove the PID file for the node; this tricks the test infrastructure into thinking that it does not need to stop the node. We also bypass seccomp so that we can fork jps to make sure that Elasticsearch really died. And to be extra paranoid, we parse the logs of the dead Elasticsearch process to make sure it died with dignity. Never forget.

Relates #19272

I have long wanted an actual test that dying with dignity works. It is tricky because if dying with dignity works, it means the test JVM dies which is usually an abnormal condition. And anyway, how does one force a fatal error to be thrown. I was motivated to investigate this again by the fact that I missed a backport to one branch leading to an issue where Elasticsearch would not successfully die with dignity. And now we have a solution: we install a plugin that throws an out of memory error when it receives a request. We hack the standalone test infrastructure to prevent this from failing the test. To do this, we bypass the security manager and remove the PID file for the node; this tricks the test infrastructure into thinking that it does not need to stop the node. We also bypass seccomp so that we can fork jps to make sure that Elasticsearch really died. And to be extra paranoid, we parse the logs of the dead Elasticsearch process to make sure it died with dignity. Never forget.

elasticmachine · 2018-03-12T09:29:54Z

Pinging @elastic/es-core-infra

hub-cap

LGTM but I would like another human to review

martijnvg

LGTM, nice to see how it is still possible to test this kind of stuff!

Too bad the PR build failed, because a JRE is installed instead of a JDK :(

martijnvg · 2018-03-12T14:23:33Z

qa/die-with-dignity/build.gradle

+    classname 'org.elasticsearch.DieWithDignityPlugin'
+}
+
+integTestCluster {


is this empty block needed?

I pushed 881a8d2.

nik9000 · 2018-03-12T14:41:48Z

qa/die-with-dignity/src/test/java/org/elasticsearch/qa/die_with_dignity/DieWithDignityIT.java

+        final int pid = Integer.parseInt(pidFileLines.get(0));
+        Files.delete(pidFile);
+        final CountDownLatch latch = new CountDownLatch(1);
+        client().performRequestAsync("GET", "/_die_with_dignity", new ResponseListener() {


Why not performRequest?

It's a long story how it ended up this way that is not interesting, but now it is no longer needed. It's leftover from a previous iteration. I pushed 14c720e.

rjernst

The test looks fine, I just have one comment about the name. I know we have used this phrase internally to mean "let the node die when an error occurs", but thus far we have only used it in PR titles, AFAIK. I think for actual code, we should use a more plain name that will not be lost in translation, or be only known to those that have been around working on ES in the 2.x-6.x range. I think something like "qa/error-exit" or "qa/exit-on-error" would be much more clear to anyone browsing through the qa tests who did not know this "insider" phrase.

jasontedor · 2018-03-12T21:15:12Z

I disagree, because we refer to this as "die with dignity" everywhere. It has appeared in release notes too. And if ever we were to write a blog post about why we do this, it would definitely refer to this as "dying with dignity". It's a fun name.

I have long wanted an actual test that dying with dignity works. It is tricky because if dying with dignity works, it means the test JVM dies which is usually an abnormal condition. And anyway, how does one force a fatal error to be thrown. I was motivated to investigate this again by the fact that I missed a backport to one branch leading to an issue where Elasticsearch would not successfully die with dignity. And now we have a solution: we install a plugin that throws an out of memory error when it receives a request. We hack the standalone test infrastructure to prevent this from failing the test. To do this, we bypass the security manager and remove the PID file for the node; this tricks the test infrastructure into thinking that it does not need to stop the node. We also bypass seccomp so that we can fork jps to make sure that Elasticsearch really died. And to be extra paranoid, we parse the logs of the dead Elasticsearch process to make sure it died with dignity. Never forget.

jasontedor added >test Issues or PRs that are addressing/adding tests review v7.0.0 v6.3.0 :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. labels Mar 12, 2018

hub-cap reviewed Mar 12, 2018

View reviewed changes

martijnvg approved these changes Mar 12, 2018

View reviewed changes

nik9000 approved these changes Mar 12, 2018

View reviewed changes

jasontedor added 2 commits March 12, 2018 11:03

Use performRequest

14c720e

Remove unnecessary empty block

881a8d2

nik9000 approved these changes Mar 12, 2018

View reviewed changes

Push test JVM to test so we can run jps

2ae89e1

rjernst reviewed Mar 12, 2018

View reviewed changes

jasontedor merged commit 8b6fbe2 into elastic:master Mar 13, 2018

jasontedor deleted the never-forget branch March 13, 2018 03:22

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add test for dying with dignity #28987

Add test for dying with dignity #28987

Uh oh!

jasontedor commented Mar 12, 2018

Uh oh!

elasticmachine commented Mar 12, 2018

Uh oh!

hub-cap left a comment

Uh oh!

martijnvg left a comment

Uh oh!

martijnvg Mar 12, 2018

Uh oh!

jasontedor Mar 12, 2018

Uh oh!

nik9000 Mar 12, 2018

Uh oh!

jasontedor Mar 12, 2018

Uh oh!

rjernst left a comment

Uh oh!

jasontedor commented Mar 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Add test for dying with dignity #28987

Add test for dying with dignity #28987

Uh oh!

Conversation

jasontedor commented Mar 12, 2018

Uh oh!

elasticmachine commented Mar 12, 2018

Uh oh!

hub-cap left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

jasontedor Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

nik9000 Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

jasontedor Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

rjernst left a comment

Choose a reason for hiding this comment

Uh oh!

jasontedor commented Mar 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants