-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Original comment by @bleskes:
Since the feature has been originally implemented (LINK REDACTED , LINK REDACTED), things have changed slightly. I have recently spoken to @jaymode , @pickypg and @spinscale to document how things currently work. Here's a summary which we can use for future reference, plus some follow ups at the end of each section.
@jaymode, @pickypg and @spinscale - please read carefully and correct if needed.
Security
Native Realm (.security index)
- ES 5.6 has introduced a new field, which prevents writing until the field has been added (see below when). Until this is done the native realm is read-only (on new nodes) as dynamic fields are disabled.
- Once the cluster master moves to version 5.6,
xpack.security.support.NativeRealmMigratordoes the following:- the security code issues a PUT template to update the
.securityindex template. It also updates the mapping of the existing index if exists. - some internal users are updated depending on ES version.
- the security code issues a PUT template to update the
- MetaDataUpgrader is used to upgrade the index template as well.
- Users need to manually call the
_xpack/migration/upgradeAPI to reindex and remove types
.audit-* indices
- The mapping never changed and thus there is no need to upgrade the template
- We rely on non-6.x compatible indices to fall out of the retention policy and thus have no upgrade scheme for these.
TODOs
- Document that the native realm is read only until the master has been upgraded.
- We need to choose who updates the
.securityindex template. It should only be done in one place. - We need better testing to make sure that
.securitykeeps on working after users have manually used the_xpack/migration/upgradeAPI. Those don't exist now.- Also take into account the upgrade of an index that was created on ES < 5.6, and upgraded to 5.6 first.
- Test that a mixed cluster
.securityworks in read-only mode - Test that we can write new user credentials after the cluster has been upgraded
Watcher
Watch CRUD & Execution
- Adding a watch tries to use the new
doctype. If this fails, it tries again using the pre-5.6 types. This seems to result in ugly log messages - see todo. - Watch execution runs on the master only and thus has a clean transition between old and new.
- The
.watchand.triggered-watches*indices are manually upgraded via the_xpack/migration/upgradeAPI.
Watch history
- Template is automatically updated by the
TemplateUpgradeServiceservice. - Since the watch execution is on the master, we only use the new template once the master moves to a new version.
TODOs
- We currently have the follow logs being logged until the user manually upgrades the
.watchindex. We should find a way to avoid it:
[2017-10-06T06:42:14,598][DEBUG][o.e.a.b.TransportShardBulkAction] [.watches][0] failed to execute bulk item (index) BulkShardRequest [[.watches][0]] containing [index {[.watches][doc][wAuN5cXhTiyjyCm58tH6ag_xpack_license_expiration], source[n/a, actual length: [4.5kb], max length: 2kb]}] and a refresh
org.elasticsearch.indices.TypeMissingException: type[doc] missing
at org.elasticsearch.index.mapper.MapperService.documentMapperWithAu
- Delay watch execution on the master until the required template version is visible in the cluster state. (LINK REDACTED)
Monitoring
Exporting
Local exporter:
- The monitoring indices template is upgraded by the monitoring service, when the master moves to 5.6.
- The exporter waits until it sees the new template in place (i.e., until the master is on 5.6)
Http Exporter:
- Always tries to update the remote template when it sets up.
Monitoring indices
- ES 5.6 uses a new index name schemes - i.e., new indices will be created next to the old indices as soon as the first data is shipped.
- Old indices are not upgraded, we let them age and fall out of the retention policy.
TODOs
-
Move template upgrades to the centralizedBeats will takeover this responsibility for Monitoring.TemplateUpgradeService. - Research the following log messages that repeated appear in the logs until the upgrade is complete
13:32:43 [2017-10-04T15:32:20,567][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-0] collector [cluster_stats] failed to collect data
13:32:43 java.lang.IllegalStateException: Security index is not on the current version - the native realm will not be operational until the upgrade API is run on the security index
and
[2017-10-06T06:42:14,621][ERROR][o.e.x.m.e.l.LocalExporter] failed to set monitoring watch [wAuN5cXhTiyjyCm58tH6ag_elasticsearch_version_mismatch]
org.elasticsearch.indices.TypeMissingException: type[doc] missing
at org.elasticsearch.index.mapper.MapperService.documentMapperWithAutoCreate(MapperService.java:765) ~[elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
at org.elasticsearch.index.shard.IndexShard.docMapper(IndexShard.java:2147) ~[elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:677) ~[elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
TemplateUpgradeService
The service is in charge of checking index template versions and upgrading them if needed. It currently tries to do so when the first 5.6 node joins the cluster. This fails because the _system user only has permissions to do so once the master has moved to 5.6 node (see LINK REDACTED). This results in repeated ugly messages about the _system not having the right permissions. Given the review of the different features above, we're safe to move to a simpler model where the templates are updated only on the current master (i.e., when the master is on the 5.6 version).
- Change the
TemplateUpgradeServiceto only run its upgrades once the local node is master ( TemplateUpgradeService should only run on the master #27294 )