[ML] Improve hard_limit audit message #42086

edsavage · 2019-05-10T16:09:01Z

Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use - primarily because the total
memory used by the model can decrease significantly after the models'
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.

Relates #38034

Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy. Relates elastic#38034

elasticmachine · 2019-05-10T16:09:04Z

Pinging @elastic/ml-core

droberts195 · 2019-05-10T16:23:04Z

...nt/rest-high-level/src/main/java/org/elasticsearch/client/ml/job/process/ModelSizeStats.java

        return modelBytes;
    }

+    public long getModelBytesExceeded() {


The return value needs to be Long, otherwise a user could get an NPE.

droberts195 · 2019-05-10T16:23:11Z

...nt/rest-high-level/src/main/java/org/elasticsearch/client/ml/job/process/ModelSizeStats.java

+        return modelBytesExceeded;
+    }
+
+    public long getModelBytesMemoryLimit() {


The return value needs to be Long, otherwise a user could get an NPE.

droberts195 · 2019-05-10T16:24:53Z

...nt/rest-high-level/src/main/java/org/elasticsearch/client/ml/job/process/ModelSizeStats.java

        private final String jobId;
        private long modelBytes;
+        private long modelBytesExceeded;
+        private long modelBytesMemoryLimit;


These two need to be Long, otherwise they'll default to 0 and zeroes will then propagate into objects that should really contain nulls.

...nt/rest-high-level/src/main/java/org/elasticsearch/client/ml/job/process/ModelSizeStats.java

droberts195 · 2019-05-10T16:35:02Z

...st-high-level/src/test/java/org/elasticsearch/client/ml/job/process/ModelSizeStatsTests.java

        ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
        assertEquals(0, stats.getModelBytes());
+        assertEquals(0, stats.getModelBytesExceeded());
+        assertEquals(0, stats.getModelBytesMemoryLimit());


These two will change to null if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.

droberts195 · 2019-05-10T16:35:54Z

...c/main/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/ModelSizeStats.java

+        if (in.getVersion().onOrAfter(Version.V_7_2_0)) {
+            modelBytesMemoryLimit = in.readOptionalLong();
+        } else {
+            modelBytesMemoryLimit = 0L;


I think it should be null, so we remember that the field didn't exist.

droberts195 · 2019-05-10T16:36:43Z

...c/main/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/ModelSizeStats.java

        return modelBytes;
    }

+    public long getModelBytesExceeded() {


The return value needs to be Long, otherwise a user could get an NPE.

droberts195 · 2019-05-10T16:36:49Z

...c/main/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/ModelSizeStats.java

+        return modelBytesExceeded;
+    }
+
+    public long getModelBytesMemoryLimit() {


The return value needs to be Long, otherwise a user could get an NPE.

droberts195 · 2019-05-10T16:37:34Z

...c/main/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/ModelSizeStats.java

        private final String jobId;
        private long modelBytes;
+        private long modelBytesExceeded;
+        private long modelBytesMemoryLimit;


These two need to be Long, otherwise they'll default to 0 and zeroes will then propagate into objects that should really contain nulls.

droberts195 · 2019-05-10T16:38:19Z

...t/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/ModelSizeStatsTests.java

        ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
        assertEquals(0, stats.getModelBytes());
+        assertEquals(0, stats.getModelBytesExceeded());
+        assertEquals(0, stats.getModelBytesMemoryLimit());


These two will change to null if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.

…e_hard_limit_audit_message

droberts195 · 2019-05-17T12:09:07Z

...st-high-level/src/test/java/org/elasticsearch/client/ml/job/process/ModelSizeStatsTests.java

        ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
        assertEquals(0, stats.getModelBytes());
+        assertEquals(null, stats.getModelBytesExceeded());
+        assertEquals(null, stats.getModelBytesMemoryLimit());


You can use assertNull for these two.

Use an appropriate hard_limit audit message when model size stats originate from a version prior to 7.2

droberts195

LGTM

Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy.

Add the current model memory limit and the number of bytes in excess of that at the point of the last allocation failure to the model size stats. These will be used to construct a (hopefully) more informative hard_limit audit message. The reported memory usage is also scaled to take into account the byte limit margin, which is in play in the initial period of a jobs' lifetime and is used to scale down the high memory limit. This should give a more accurate representation of how close the memory usage is to the high limit. relates elastic/elasticsearch#42086 closes elastic/elasticsearch#38034

relates elastic/elasticsearch#42086 Backports #486

Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates #42086

Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy.

Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates elastic#42086

Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: elastic#42086

Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: #42086

Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: elastic#42086 (cherry picked from commit c23677c)

edsavage added >enhancement :ml Machine learning v8.0.0 v7.2.0 labels May 10, 2019

edsavage assigned droberts195 May 10, 2019

droberts195 reviewed May 10, 2019

View reviewed changes

edsavage added 2 commits May 16, 2019 13:42

Attending to code review comments

5ea05bc

Merge branch 'master' of github.com:elastic/elasticsearch into improv…

684bca5

…e_hard_limit_audit_message

droberts195 reviewed May 17, 2019

View reviewed changes

edsavage added 2 commits May 17, 2019 08:10

Backwards compatibility fix.

bae1774

Use an appropriate hard_limit audit message when model size stats originate from a version prior to 7.2

Further tidy up of test code

d5f096f

droberts195 approved these changes May 17, 2019

View reviewed changes

edsavage merged commit 8c01a8d into elastic:master May 17, 2019

edsavage mentioned this pull request May 17, 2019

[ML] Improve hard_limit audit message elastic/ml-cpp#486

Merged

edsavage mentioned this pull request May 18, 2019

[7.2][ML] Improve hard_limit audit message (#486) elastic/ml-cpp#487

Merged

edsavage added a commit to elastic/ml-cpp that referenced this pull request May 18, 2019

[7.2][ML] Improve hard_limit audit message (#486) (#487)

7a782a9

relates elastic/elasticsearch#42086 Backports #486

edsavage mentioned this pull request May 19, 2019

[ML] Failures in ML tests in AutodetectMemoryLimitIT #42207

Closed

edsavage added a commit that referenced this pull request May 19, 2019

[ML] Temporarily muting failing tests

840af87

Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates #42086

edsavage added a commit that referenced this pull request May 19, 2019

[ML] Temporarily muting failing tests

3dbfe03

Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates #42086

edsavage deleted the improve_hard_limit_audit_message branch May 22, 2019 09:17

matriv added a commit to matriv/elasticsearch that referenced this pull request Apr 16, 2020

SQL: Fix ODBC metadata for DATE & TIME data types

b5b3c49

Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: elastic#42086

matriv mentioned this pull request Apr 16, 2020

SQL: Fix ODBC metadata for DATE & TIME data types #55316

Merged

matriv added a commit that referenced this pull request Apr 16, 2020

SQL: Fix ODBC metadata for DATE & TIME data types (#55316)

c23677c

Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: #42086

This was referenced Apr 16, 2020

SQL: Fix ODBC metadata for DATE & TIME data types (#55316) #55345

Merged

SQL: Fix ODBC metadata for DATE & TIME data types (#55316) #55346

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

[ML] Improve hard_limit audit message #42086

[ML] Improve hard_limit audit message #42086

Uh oh!

Conversation

edsavage commented May 10, 2019

Uh oh!

elasticmachine commented May 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants