-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Explain data frame analytics API #49455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Explain data frame analytics API #49455
Conversation
|
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll come back to add this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added this in
|
@alvarezmelissa87 pinging you as this is the PR that adds field selection explanation |
|
@szabosteve I'm updating documentation too in this PR, could you please take a look at the docs changes? |
benwtrent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! Some minor comments.
Also, I think you might need to black list the failure focused yaml tests for the ml with security tests
...high-level/src/test/java/org/elasticsearch/client/documentation/MlClientDocumentationIT.java
Outdated
Show resolved
Hide resolved
...high-level/src/test/java/org/elasticsearch/client/documentation/MlClientDocumentationIT.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/elasticsearch/xpack/ml/action/TransportDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/ml/rest/dataframe/RestDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
szabosteve
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @dimitris-athanasiou for documenting this! I left a couple of minor comments, but it LGTM.
x-pack/plugin/src/test/resources/rest-api-spec/test/ml/data_frame_analytics_info.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be kept in sync with log-levels in AnalyticsProcessManager.java (which are currently info)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this API is not part of the life-cycle of the job, I'd rather it stays quiet in the info level unless something goes wrong.
...rc/main/java/org/elasticsearch/xpack/ml/rest/dataframe/RestDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/ml/rest/dataframe/RestDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/elasticsearch/xpack/ml/action/TransportDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.requireNonNull(mappingTypes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.unmodifiableSet throws if it's passed null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return new FieldSelection(randomAlphaOfLength(10), | |
| return new FieldSelection( | |
| randomAlphaOfLength(10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.requireNonNull?
Also, should we use ExceptionHelper.requireNonNull instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExceptionsHelper.requireNonNull makes sense for user facing code. We thus use it when we parse request objects. In this case we create this ourselves Objects.requireNonNull should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return new FieldSelection(randomAlphaOfLength(10), | |
| return new FieldSelection( | |
| randomAlphaOfLength(10), |
This commit replaces the _estimate_memory_usage API with a new API, the _info API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why.
…documentation/MlClientDocumentationIT.java Co-Authored-By: Benjamin Trent <[email protected]>
…documentation/MlClientDocumentationIT.java Co-Authored-By: Benjamin Trent <[email protected]>
…frame/extractor/ExtractedFieldsDetector.java Co-Authored-By: Benjamin Trent <[email protected]>
…/dataframe/RestDataFrameAnalyticsInfoAction.java Co-Authored-By: Benjamin Trent <[email protected]>
Co-Authored-By: István Zoltán Szabó <[email protected]>
Co-Authored-By: István Zoltán Szabó <[email protected]>
Co-Authored-By: István Zoltán Szabó <[email protected]>
Co-Authored-By: István Zoltán Szabó <[email protected]>
666e764 to
0096136
Compare
|
run elasticsearch-ci/1 |
This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of elastic#49455
This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455
przemekwitek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.
The API consolidates information that is useful before
creating a data frame analytics job.
It includes:
Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.
Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.