Skip to content

Conversation

@lcawl
Copy link
Contributor

@lcawl lcawl commented Dec 18, 2020

@lcawl lcawl marked this pull request as ready for review December 18, 2020 18:14
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@lcawl lcawl requested a review from droberts195 December 18, 2020 18:15
Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should spell out a couple of things more explicitly:

  1. Adding remote_cluster_client to a dedicated ML or transform node still keeps it "dedicated"
  2. We strongly recommend to do this unless you have a very good reason not to, because we have seen through several support cases that it's completely baffling to end users when their searches work perfectly from dev console but fail when used in ML jobs

You probably don't want to accept my suggestions verbatim as I don't think they're worded very well, but hopefully you can come up with some words that say the same but sound better.

<1> The `xpack.ml.enabled` setting is enabled by default.

If you want the node to use {ccs}, add `remote_cluster_client` to the list of
roles. See <<remote-node>>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some clarification on whether a node with ml and remote_cluster_client is still considered a dedicated ML node. I think it should still be:

Suggested change
roles. See <<remote-node>>.
roles. See <<remote-node>>. A node with both the `ml` and `remote_cluster_client` roles
but no others is still considered a dedicated {ml} node.

----

If you want the node to use {ccs}, add `remote_cluster_client` to the list of
roles. See <<remote-node>>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
roles. See <<remote-node>>.
roles. See <<remote-node>>. A node with both the `transform` and `remote_cluster_client` roles
but no others is still considered a dedicated {transform} node.


If you want the node to use {ccs}, add `remote_cluster_client` to the list of
roles. See <<remote-node>>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you are using {ccs} in any way it is strongly recommended to allow {ml} nodes to do remote searches unless you have a very good reason not to. It will likely lead to confusion and frustration if cross cluster searches usually work perfectly but fail when used in {ml} job or {dfeed} configs.


If you want the node to use {ccs}, add `remote_cluster_client` to the list of
roles. See <<remote-node>>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you are using {ccs} in any way it is strongly recommended to allow {transform} nodes to do remote searches unless you have a very good reason not to. It will likely lead to confusion and frustration if cross cluster searches usually work perfectly but fail when used in {transform} configs.

@lcawl
Copy link
Contributor Author

lcawl commented Dec 18, 2020

Thanks for the feedback @droberts195 ! I've updated the node page, as well as the ML settings and transform settings pages. I also carried over a clarification that data nodes are by default transform nodes, since that seemed to be missing from the transform settings page. If that detail is no longer correct, however, I can remove it from both locations.

Comment on lines 20 to 21
node, it does not have the `transform` role. However, by default all generic
data nodes are also {transform} nodes.
Copy link

@droberts195 droberts195 Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now very complicated. I don't think you should use the phrase "generic data nodes", because what does that mean?

If you specify node.data: true, don't specify node.roles at all and don't specify node.transform at all then the node is a transform node. (node.data and node.transform are old deprecated settings that will not work at all in 8.x. So the "by default" statement used to make sense before node.roles was introduced.)

If you specify node.roles: [ data ] then the node is not a transform node.

If you don't specify any node role settings (old or new) then the node is both a data node and a transform node (and every other type of node). So the "by default" bit makes sense in this case, but for production it's an edge case.

A data/transform node that specifies node.roles needs to explicitly say both, for example, node.roles: [ data, transform ] or node.roles: [ data_hot, transform, remote_cluster_client ].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I've removed "All data nodes are also transform nodes" from the node settings page and re-worked the details in the transform and ML settings pages to try to make this clearer.

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remote_cluster_client role is also on by default if you don't have any node-related settings in your config.

But LGTM apart from that so feel free to merge without further review.

If you use the `node.roles` setting, then all required roles must be explicitly
set. Consult <<modules-node>> to learn more.
By default, every node is a {ml} node. If you set `node.roles`, however,
you must explicitly specify all the required roles for the node. To learn more,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence:

If you set node.roles, however, you must explicitly specify all the required roles for the node.

Would be better in docs/reference/modules/node.asciidoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already covered in the following sentence on that page: "If you set node.roles, the node is assigned only the roles you specify."

@lcawl
Copy link
Contributor Author

lcawl commented Dec 22, 2020

The remote_cluster_client role is also on by default if you don't have any node-related settings in your config.

On closer examination, the list of default roles is duplicated in this section: https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-node.html#node-roles, so I've added the transform role there and removed the earlier (redundant) paragraph.

@lcawl lcawl merged commit 6b463a7 into elastic:master Dec 22, 2020
@lcawl lcawl deleted the ml-nodes-ccs branch December 22, 2020 18:11
lcawl added a commit to lcawl/elasticsearch that referenced this pull request Dec 22, 2020
lcawl added a commit to lcawl/elasticsearch that referenced this pull request Dec 22, 2020
@pugnascotia pugnascotia added the >docs General docs changes label Jan 15, 2021
@elasticmachine elasticmachine added the Team:Docs Meta label for docs team label Jan 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@sophiec20
Copy link
Contributor

sophiec20 commented Jan 18, 2021

By default, every node is a {ml} node.

Can we please drop this sentence (and its transform twin)?

I believe this adds confusion. In reality it means by default every node is an ML node providing you have not set node.roles to anything, even an empty array.. It gives too much prominence to the default settings of the dev env in the past.

Imagine the user who has orchestration in place to deploy their cluster. It is likely, going forwards, that this orchestration would deploy with node.roles set. For this person, when they come to read this in the docs, in their frame of reference their default includes node.roles. Therefore we should tell users how to set node.roles in the most effective way for the production use case. We should not lead with the legacy or dev behaviour.

@lcawl Sorry about being late to the party.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes :ml Machine learning Team:Docs Meta label for docs team v7.11.0 v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants