Upgrading to 5.6 Review

*Original comment by @bleskes:*

Since the feature has been originally implemented (LINK REDACTED , LINK REDACTED), things have changed slightly. I have recently spoken to @jaymode , @pickypg  and @spinscale  to document how things currently work. Here's a summary which we can use for future reference, plus some follow ups at the end of each section. 

@jaymode, @pickypg and @spinscale - please read carefully and correct if needed.


### Security 

#### Native Realm (`.security` index)

- ES 5.6 has introduced a new field, which prevents writing until the field has been added (see below when). Until this is done the native realm is read-only (on new nodes) as dynamic fields are disabled. 
- Once the cluster master moves to version 5.6, `xpack.security.support.NativeRealmMigrator` does the following: 
	- the security code issues a PUT template to update the `.security` index template. It also updates the mapping of the existing index if exists.
	- some internal users are updated depending on ES version.
- MetaDataUpgrader is used to upgrade the index template as well.
- Users need to manually call the `_xpack/migration/upgrade` API to reindex and remove types


#### .audit-* indices
- The mapping never changed and thus there is no need to upgrade the template
- We rely on non-6.x compatible indices to fall out of the retention policy and thus have no upgrade scheme for these.

#### TODOs

 - [ ] Document that the native realm is read only until the master has been upgraded.
 - [ ] We need to choose who updates the `.security` index template. It should only be done in one place.
 - [ ] We need better testing to make sure that `.security` keeps on working after users have manually used the `_xpack/migration/upgrade` API. Those don't exist now.
	- [ ] Also take into account the upgrade of an index that was created on ES < 5.6, and upgraded to 5.6 first.
 - [ ] Test that a mixed cluster `.security` works in read-only mode
 - [ ] Test that we can write new user credentials after the cluster has been upgraded


### Watcher

#### Watch CRUD & Execution

- Adding a watch tries to use the new `doc` type. If this fails, it tries again using the pre-5.6 types. This seems to result in ugly log messages - see todo.
- Watch execution runs on the master only and thus has a clean transition between old and new.
- The `.watch` and `.triggered-watches*` indices are manually upgraded via the `_xpack/migration/upgrade` API.


#### Watch history

- Template is automatically updated by the `TemplateUpgradeService` service.
- Since the watch execution is on the master, we only use the new template once the master moves to a new version.

#### TODOs

 - [ ] We currently have the follow logs being logged until the user manually upgrades the `.watch` index. We should find a way to avoid it:

```
[2017-10-06T06:42:14,598][DEBUG][o.e.a.b.TransportShardBulkAction] [.watches][0] failed to execute bulk item (index) BulkShardRequest [[.watches][0]] containing [index {[.watches][doc][wAuN5cXhTiyjyCm58tH6ag_xpack_license_expiration], source[n/a, actual length: [4.5kb], max length: 2kb]}] and a refresh
org.elasticsearch.indices.TypeMissingException: type[doc] missing
        at org.elasticsearch.index.mapper.MapperService.documentMapperWithAu
```

 - [X] Delay watch execution on the master until the required template version is visible in the cluster state. (LINK REDACTED)


### Monitoring

#### Exporting

Local exporter:
- The monitoring indices template is upgraded by the monitoring service, when the master moves to 5.6.
- The exporter waits until it sees the new template in place (i.e., until the master is on 5.6)

Http Exporter:
- Always tries to update the remote template when it sets up.


#### Monitoring indices

- ES 5.6 uses a new index name schemes - i.e., new indices will be created next to the old indices as soon as the first data is shipped.
- Old indices are not upgraded, we let them age and fall out of the retention policy.


#### TODOs

 - [x] ~Move template upgrades to the centralized `TemplateUpgradeService`.~ Beats will takeover this responsibility for Monitoring.
 - [x] Research the following log messages that repeated appear in the logs until the upgrade is complete

```
13:32:43 [2017-10-04T15:32:20,567][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-0] collector [cluster_stats] failed to collect data
13:32:43 java.lang.IllegalStateException: Security index is not on the current version - the native realm will not be operational until the upgrade API is run on the security index
````

and 

```
[2017-10-06T06:42:14,621][ERROR][o.e.x.m.e.l.LocalExporter] failed to set monitoring watch [wAuN5cXhTiyjyCm58tH6ag_elasticsearch_version_mismatch]
org.elasticsearch.indices.TypeMissingException: type[doc] missing
        at org.elasticsearch.index.mapper.MapperService.documentMapperWithAutoCreate(MapperService.java:765) ~[elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.docMapper(IndexShard.java:2147) ~[elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:677) ~[elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
```


### `TemplateUpgradeService`

The service is in charge of checking index template versions and upgrading them if needed. It currently tries to do so when the first 5.6 node joins the cluster. This fails because the `_system` user only has permissions to do so once the master has moved to 5.6 node (see LINK REDACTED). This results in repeated ugly messages about the `_system` not having the right permissions. Given the review of the different features above, we're safe to move to a simpler model where the templates are updated only on the current master (i.e., when the master is on the 5.6 version).

 - [x] Change the `TemplateUpgradeService` to only run its upgrades once the local node is master ( https://github.com/elastic/elasticsearch/pull/27294 )


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrading to 5.6 Review #29818

Security

Native Realm (`.security` index)

.audit-* indices

TODOs

Watcher

Watch CRUD & Execution

Watch history

TODOs

Monitoring

Exporting

Monitoring indices

TODOs

`TemplateUpgradeService`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Upgrading to 5.6 Review #29818

Description

Security

Native Realm (.security index)

.audit-* indices

TODOs

Watcher

Watch CRUD & Execution

Watch history

TODOs

Monitoring

Exporting

Monitoring indices

TODOs

TemplateUpgradeService

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Native Realm (`.security` index)

`TemplateUpgradeService`