-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML][Data Frame] Add optional defer_validation param to PUT #44455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML][Data Frame] Add optional defer_validation param to PUT #44455
Conversation
|
Pinging @elastic/ml-core |
...src/main/java/org/elasticsearch/xpack/core/dataframe/action/PutDataFrameTransformAction.java
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/core/dataframe/action/PutDataFrameTransformActionRequestTests.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/core/dataframe/action/PutDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/core/dataframe/action/PutDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/dataframe/action/TransportPutDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
.../rest-high-level/src/test/java/org/elasticsearch/client/DataFrameRequestConvertersTests.java
Outdated
Show resolved
Hide resolved
.../rest-high-level/src/test/java/org/elasticsearch/client/DataFrameRequestConvertersTests.java
Outdated
Show resolved
Hide resolved
...igh-level/src/main/java/org/elasticsearch/client/dataframe/PutDataFrameTransformRequest.java
Outdated
Show resolved
Hide resolved
...igh-level/src/main/java/org/elasticsearch/client/dataframe/PutDataFrameTransformRequest.java
Outdated
Show resolved
Hide resolved
...igh-level/src/main/java/org/elasticsearch/client/dataframe/PutDataFrameTransformRequest.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/dataframe/action/TransportPutDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
...igh-level/src/main/java/org/elasticsearch/client/dataframe/PutDataFrameTransformRequest.java
Outdated
Show resolved
Hide resolved
| public final class SourceDestValidator { | ||
|
|
||
| private SourceDestValidator() {} | ||
| interface SourceDestValidation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
hendrikmuhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments.
I think we should document this new parameter but make very clear what the usecase is. Nit: It would be good to add some logging if the parameter is used (debug level).
| public Request(StreamInput in) throws IOException { | ||
| super(in); | ||
| this.config = new DataFrameTransformConfig(in); | ||
| if (in.getVersion().onOrAfter(Version.CURRENT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as for check in BWC code like this: you can alternatively set the version, disable BWC in build.gradle, do the backports and then re-enable BWC. You need the same number of commits in the end, but using the "Version.CURRENT approach" still broke CI, disabling BWC seems to be the better approach now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hendrikmuhs I usually mute the tests before I backport in a separate PR. This way BWC can actually run against my changes and I can find any issues earlier rather than later.
...main/java/org/elasticsearch/xpack/dataframe/action/TransportPutDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
| private SourceDestValidator() {} | ||
| interface SourceDestValidation { | ||
| boolean isDeferrable(); | ||
| void check(DataFrameTransformConfig config, ClusterState clusterState, IndexNameExpressionResolver indexNameExpressionResolver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I still find it confusing that I have to call check and not validate on a "validator". (Vice versa it could also be a SourceDestChecker if you prefer to keep check)
Yes, the source is open after all we should document and explain when it should be use. Remember it only defers so any error will be hit when validations are run on start anyway. Also if the has privilege validation is deferrable it should checked on start |
I don't think this is, because the rule is that the transform accesses the source and destination indices with the security privileges of the user who put it, not the user who starts it. The has-privileges API can only return the privileges of the current user, so could only check if the user calling start had the necessary privileges, not the user who called put. This doesn't create a security issue - in the end if the user who put the transform doesn't have permission to read the source index and write to the destination then it won't do these things. But it does make the error reporting less friendly. Maybe this can just be documented as a caveat of deferring validation: if you defer validation we assume you've thought about the security aspects of the transform and the transform will run but fail if you haven't. I think this has actually opened up a can of worms though relating to the security aspects of the envisaged use case for this functionality: none of the built-in Beats roles have permission to put a data frame transform nor create an index other than those currently used by Beats. If Beats is going to use data frame transforms then the user that Beats connects as needs to have permission to manage data frame transforms and create and write to the destination indices configured in those transforms. Presumably to restrict abuse we'd have to set aside a pattern for those destination indices and only give the Beats user index permissions on that pattern. Then we'd have to document what that pattern is and warn users not to let indices storing unrelated confidential data match it. |
|
👍 makes sense. Assuming insufficient privileges causes a detectable exception we should handle that as an irrecoverable failure without retry. That change is not necessary for this PR Line 803 in 92709f5
|
| new RuntimeException(DataFrameMessages.REST_PUT_DATA_FRAME_FAILED_TO_VALIDATE_DATA_FRAME_CONFIGURATION, | ||
| validationException)) | ||
| validationException -> { | ||
| if (validationException instanceof Pivot.TestQueryException && request.isDeferValidation()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pivot.validateQuery() is called only when isDeferValidation() == false
This can be simplified by removing the if...else leaving the call to listener.onFailure
| * @param indexNameExpressionResolver A valid IndexNameExpressionResolver object | ||
| * @throws ElasticsearchStatusException when a validation fails | ||
| */ | ||
| public static void check(DataFrameTransformConfig config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per Hendrik's comment this should also be validate()
| ==== {api-query-parms-title} | ||
|
|
||
| `defer_validations`:: | ||
| (Optional, boolean) When `true`, this will cause deferrable validations to not run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include more detail about deferred validations before the second sentence.
| (Optional, boolean) When `true`, this will cause deferrable validations to not run. | |
| (Optional, boolean) When `true`, this will cause deferrable validations to not run. | |
| Deferrable validations include checks for the existence of the source index and that the | |
| user has sufficient privileges, skipping these validations at creation is useful for | |
| situations where the source index does not exist but will be created later. Deferred | |
| validations are always run when the {dataframe-transform} is started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the very last sentence of that suggested change should be changed to: "Deferred validations are always run when the {dataframe-transform} is started, with the exception of privilege checks. If the user who created the transform does not have the required privileges on the source and destination indices then the transform will start but then fail when it attempts the unauthorized operation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These details are great, thanks!
przemekwitek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| Request request = DataFrameRequestConverters.putDataFrameTransform(putRequest); | ||
|
|
||
| assertThat(request.getParameters(), not(hasKey("defer_validation"))); | ||
| assertEquals(HttpPut.METHOD_NAME, request.getMethod()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are, could you change "assertEquals" to "assertThat(..., equalTo(...)) so that the usage of assertions is consistent in the whole method.
| include-tagged::{doc-tests-file}[{api}-request] | ||
| -------------------------------------------------- | ||
| <1> The configuration of the {dataframe-transform} to create | ||
| <2> Whether or not to wait to run deferrable validations until `_start` is called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] "Whether or not to wait to run deferrable validations" sounds a bit complicated, but I guess you wanted to avoid "Whether or not to defer deferrable validations until ...". Anyway, I would try to somehow simplify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Postpone deferrable validations until _start is called.
| [[put-data-frame-transform-query-parms]] | ||
| ==== {api-query-parms-title} | ||
|
|
||
| `defer_validations`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is now singular everywhere.
hendrikmuhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
davidkyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…44455) * [ML][Data Frame] Add optional defer_validation param to PUT * addressing PR comments * reverting bad replace * addressing pr comments * Update put-transform.asciidoc * Update put-transform.asciidoc * Update put-transform.asciidoc
…4455) (#44697) * [ML][Data Frame] Add optional defer_validation param to PUT (#44455) * [ML][Data Frame] Add optional defer_validation param to PUT * addressing PR comments * reverting bad replace * addressing pr comments * Update put-transform.asciidoc * Update put-transform.asciidoc * Update put-transform.asciidoc * adjusting for backport * fixing imports * [DOCS] Fixes formatting in create data frame transform API
This adds the optional boolean parameter
defer_validationto thePUTtransform API. This parameter will defer any validations that are considered addressable AFTER the configuration has been created.We do many validations eagerly against the current state of the cluster on PUT. This is so any issues can be captured early to prevent failures later when the data frame transform starts running. But, certain validations could be deferred until
_startas the overall cluster state could have mutated betweenPUTand_start.The default behavior is to run the validations eagerly.
I did not add any reference or HLRC documentation yet. But will if desired.
@droberts195
Do we want to document this parameter? It seems like a super user parameter and should almost never be used.
closes #43439