Skip to content

Conversation

@edsavage
Copy link
Contributor

Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use - primarily because the total
memory used by the model can decrease significantly after the models'
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.

Relates #38034

Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.

Relates elastic#38034
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

return modelBytes;
}

public long getModelBytesExceeded() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

return modelBytesExceeded;
}

public long getModelBytesMemoryLimit() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

private final String jobId;
private long modelBytes;
private long modelBytesExceeded;
private long modelBytesMemoryLimit;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two need to be Long, otherwise they'll default to 0 and zeroes will then propagate into objects that should really contain nulls.

ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
assertEquals(0, stats.getModelBytes());
assertEquals(0, stats.getModelBytesExceeded());
assertEquals(0, stats.getModelBytesMemoryLimit());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two will change to null if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.

if (in.getVersion().onOrAfter(Version.V_7_2_0)) {
modelBytesMemoryLimit = in.readOptionalLong();
} else {
modelBytesMemoryLimit = 0L;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be null, so we remember that the field didn't exist.

return modelBytes;
}

public long getModelBytesExceeded() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

return modelBytesExceeded;
}

public long getModelBytesMemoryLimit() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

private final String jobId;
private long modelBytes;
private long modelBytesExceeded;
private long modelBytesMemoryLimit;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two need to be Long, otherwise they'll default to 0 and zeroes will then propagate into objects that should really contain nulls.

ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
assertEquals(0, stats.getModelBytes());
assertEquals(0, stats.getModelBytesExceeded());
assertEquals(0, stats.getModelBytesMemoryLimit());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two will change to null if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.

ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
assertEquals(0, stats.getModelBytes());
assertEquals(null, stats.getModelBytesExceeded());
assertEquals(null, stats.getModelBytesMemoryLimit());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use assertNull for these two.

edsavage added 2 commits May 17, 2019 08:10
Use an appropriate hard_limit audit message when model size stats
originate from a version prior to 7.2
Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsavage edsavage merged commit 8c01a8d into elastic:master May 17, 2019
edsavage added a commit that referenced this pull request May 17, 2019
Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.
edsavage added a commit to edsavage/ml-cpp that referenced this pull request May 18, 2019
Add the current model memory limit and the number of bytes in
excess of that at the point of the last allocation failure to the model
size stats. These will be used to construct a (hopefully) more
informative hard_limit audit message.

The reported memory usage is also scaled to take into account the byte
limit margin, which is in play in the initial period of a jobs' lifetime
and is used to scale down the high memory limit. This should give a more
accurate representation of how close the memory usage is to the high
limit.

relates elastic/elasticsearch#42086

closes elastic/elasticsearch#38034
edsavage added a commit to elastic/ml-cpp that referenced this pull request May 18, 2019
Add the current model memory limit and the number of bytes in
excess of that at the point of the last allocation failure to the model
size stats. These will be used to construct a (hopefully) more
informative hard_limit audit message.

The reported memory usage is also scaled to take into account the byte
limit margin, which is in play in the initial period of a jobs' lifetime
and is used to scale down the high memory limit. This should give a more
accurate representation of how close the memory usage is to the high
limit.

relates elastic/elasticsearch#42086

closes elastic/elasticsearch#38034
edsavage added a commit to edsavage/ml-cpp that referenced this pull request May 18, 2019
Add the current model memory limit and the number of bytes in
excess of that at the point of the last allocation failure to the model
size stats. These will be used to construct a (hopefully) more
informative hard_limit audit message.

The reported memory usage is also scaled to take into account the byte
limit margin, which is in play in the initial period of a jobs' lifetime
and is used to scale down the high memory limit. This should give a more
accurate representation of how close the memory usage is to the high
limit.

relates elastic/elasticsearch#42086

closes elastic/elasticsearch#38034
edsavage added a commit to elastic/ml-cpp that referenced this pull request May 18, 2019
edsavage added a commit that referenced this pull request May 19, 2019
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates #42086
edsavage added a commit that referenced this pull request May 19, 2019
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates #42086
@edsavage edsavage deleted the improve_hard_limit_audit_message branch May 22, 2019 09:17
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates elastic#42086
matriv added a commit to matriv/elasticsearch that referenced this pull request Apr 16, 2020
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: elastic#42086
matriv added a commit that referenced this pull request Apr 16, 2020
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: #42086
matriv added a commit to matriv/elasticsearch that referenced this pull request Apr 16, 2020
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: elastic#42086
(cherry picked from commit c23677c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants