Skip to content

Snapshot in ABORTED state after rolling restart of nodes #22000

@desagar

Description

@desagar

Elasticsearch version: 2.3.1

Plugins installed: [a custom repository plugin]

JVM version: 1.8.0_101

OS version: Oracle Enterprise Linux 6 with Redhat kernel

Description of the problem including expected versus actual behavior:
We have a 2 node Elasticsearch cluster, and we have installed a custom repository plugin that is used for storing Elasticsearch snapshots. The custom plugin has a bug that occasionally causes it to hang indefinitely waiting for a connection to the back-end store for our snapshots. When this happened, we performed a rolling restart of the Elasticsearch cluster to clear the hanging thread. After the restart, we ended up with a state where the snapshot is in ABORTED status according to ES cluster state. However when querying the snapshot using the snapshot API, it reports that the snapshot is still in progress. As a result we are unable to take any further snapshots.
According to this link, snapshots in ABORTED status should be cleaned up when the master node is restarted.

Steps to reproduce:
Working on a reproducer - will provide I have one.

Provide logs (if relevant):
Please see attached files of cluster state and snapshot status.
snapshot_status.txt : output of /_snapshot/ppmgmt1645/snapshot_20161130_042001?pretty=true

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions