-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version: 2.3.1
Plugins installed: [a custom repository plugin]
JVM version: 1.8.0_101
OS version: Oracle Enterprise Linux 6 with Redhat kernel
Description of the problem including expected versus actual behavior:
We have a 2 node Elasticsearch cluster, and we have installed a custom repository plugin that is used for storing Elasticsearch snapshots. The custom plugin has a bug that occasionally causes it to hang indefinitely waiting for a connection to the back-end store for our snapshots. When this happened, we performed a rolling restart of the Elasticsearch cluster to clear the hanging thread. After the restart, we ended up with a state where the snapshot is in ABORTED status according to ES cluster state. However when querying the snapshot using the snapshot API, it reports that the snapshot is still in progress. As a result we are unable to take any further snapshots.
According to this link, snapshots in ABORTED status should be cleaned up when the master node is restarted.
Steps to reproduce:
Working on a reproducer - will provide I have one.
Provide logs (if relevant):
Please see attached files of cluster state and snapshot status.
snapshot_status.txt : output of /_snapshot/ppmgmt1645/snapshot_20161130_042001?pretty=true