Skip to content

Enrich index getting deleted while policy is executing #85221

@consulthys

Description

@consulthys

Elasticsearch Version

7.17.1

Installed Plugins

N/A

Java Version

bundled

OS Version

Elastic Cloud

Problem Description

I'm executing an enrich match policy over a big index with 140M+ records and the whole execution takes about ~6h.

The enrich policy can never execute completely because of the EnrichPolicyMaintenanceService that runs regularly decides whether enrich indexes should be deleted or not. In my case, it decides that since there's no alias pointing at it (see logs below), the enrich index being created can be marked for deletion.

According to PR #43746, this should not happen:

Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.

A policy execution is clearly running while the maintenance service is doing its job, so this looks like a bug.

GET _tasks?actions=policy*&detailed
...
        "v_vP5ydxR0uxEqZUlfa8RA:142789" : {
          "node" : "v_vP5ydxR0uxEqZUlfa8RA",
          "id" : 142789,
          "type" : "enrich",
          "action" : "policy_execution",
          "status" : {
            "phase" : "RUNNING",
            "step" : "ReindexRequest"
          },
          "description" : "executing enrich policy [parcel-policy]",
          "start_time_in_millis" : 1647950733280,
          "running_time_in_nanos" : 7049742423,
          "cancellable" : true,
          "cancelled" : false,
          "parent_task_id" : "Bsk0iJs4Rc6nBFMChn-GGA:179608",
          "headers" : { }
        }
      }

Also the policy seems to be properly attached to the enrich index

GET .enrich-parcel-policy-1647950733289
=>
{
  ".enrich-parcel-policy-1647950733289" : {
    "aliases" : { },
    "mappings" : {
      "dynamic" : "false",
      "_meta" : {
        "enrich_policy_name" : "parcel-policy",
        "enrich_readme" : "This index is managed by Elasticsearch and should not be modified in any way.",
        "enrich_policy_type" : "match",
        "enrich_match_field" : "key-field"
      },
      "properties" : {
        "key-field" : {
          "type" : "keyword",
          "doc_values" : false
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "refresh_interval" : "-1",
        "number_of_shards" : "1",
        "provided_name" : ".enrich-parcel-policy-1647950733289",
        "creation_date" : "1647950733280",
        "number_of_replicas" : "0",
        "uuid" : "eeNg3H_kRjur9ZQTf1xF-A",
        "version" : {
          "created" : "7170199"
        },
        "warmer" : {
          "enabled" : "false"
        }
      }
    }
  }
}

Steps to Reproduce

  1. create an enrich policy that is supposed to last for more than enrich.cleanup_period (i.e. default to 15 minutes)
  2. execute it and wait until the next run of EnrichPolicyMaintenanceService
  3. witness that the following message appears in the logs: Enrich index [.enrich-xyz] is not marked as a live index since it has no alias information
  4. verify that the index is subsequently deleted by EnrichPolicyMaintenanceService

Logs (if relevant)

13:05:33.284 [elasticsearch.server][INFO] Policy [parcel-policy]: Running enrich policy
13:05:33.284 [elasticsearch.server][DEBUG] Policy [parcel-policy]: Checking source indices [[parcels-us]]
13:05:33.288 [elasticsearch.server][DEBUG] Policy [parcel-policy]: Validating [[parcels-us]] source mappings
13:05:33.289 [elasticsearch.server][DEBUG] Policy [parcel-policy]: Creating new enrich index [.enrich-parcel-policy-1647950733289]
13:05:33.306 [elasticsearch.server][INFO] creating index, cause [api], templates [], shards [1]/[0]
13:05:33.598 [elasticsearch.server][DEBUG] Policy [parcel-policy]: Transferring source data to new enrich index [.enrich-parcel-policy-1647950733289]
13:12:09.231 [elasticsearch.server][DEBUG] Triggering scheduled [enrich] maintenance task
13:12:09.234 [elasticsearch.server][DEBUG] Checking if should remove enrich index [.enrich-parcel-policy-1647950733289]
13:12:09.234 [elasticsearch.server][DEBUG] Enrich index [.enrich-parcel-policy-1647950733289] is not marked as a live index since it has no alias information
13:12:09.242 [elasticsearch.server][INFO] deleting index
13:12:09.625 [elasticsearch.server][DEBUG] Completed deletion of stale enrich indices [[.enrich-parcel-policy-1647950733289]]

Workarounds

  1. One workaround is to set enrich.cleanup_period to a duration longer than the time it takes to actually run the policy (7h in my case). Also, this is a static setting that can't be changed dynamically and since this is running in ES Cloud and they don't allow to specify this setting in elasticsearch.yml, the only people who can do it are the Elastic support people.

  2. Another simpler workaround is to manually add an alias to the new enrich index while it's being created, to fool the maintenance service into believing that the index is actually built and has an alias.

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": ".enrich-parcel-policy-1647954543422",
        "alias": ".enrich-parcel-policy"
      }
    }
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions