Skip to content

Commit d2b8cd9

Browse files
andreidandakrone
andcommitted
Add troubleshooting guide for corrupt repository (elastic#88391)
Co-authored-by: Lee Hinman <[email protected]>
1 parent 3776923 commit d2b8cd9

File tree

8 files changed

+272
-0
lines changed

8 files changed

+272
-0
lines changed
161 KB
Loading
333 KB
Loading
159 KB
Loading
57 KB
Loading
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
++++
2+
<div class="tabs" data-tab-group="host">
3+
<div role="tablist" aria-label="Re-add repository">
4+
<button role="tab"
5+
aria-selected="true"
6+
aria-controls="cloud-tab-readd-repo"
7+
id="cloud-readd-repo">
8+
Elasticsearch Service
9+
</button>
10+
<button role="tab"
11+
aria-selected="false"
12+
aria-controls="self-managed-tab-readd-repo"
13+
id="self-managed-readd-repo"
14+
tabindex="-1">
15+
Self-managed
16+
</button>
17+
</div>
18+
<div tabindex="0"
19+
role="tabpanel"
20+
id="cloud-tab-readd-repo"
21+
aria-labelledby="cloud-readd-repo">
22+
++++
23+
24+
include::corrupt-repository.asciidoc[tag=cloud]
25+
26+
++++
27+
</div>
28+
<div tabindex="0"
29+
role="tabpanel"
30+
id="self-managed-tab-readd-repo"
31+
aria-labelledby="self-managed-readd-repo"
32+
hidden="">
33+
++++
34+
35+
include::corrupt-repository.asciidoc[tag=self-managed]
36+
37+
++++
38+
</div>
39+
</div>
40+
++++
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
// tag::cloud[]
2+
Fixing the corrupted repository will entail making changes in multiple deployments
3+
that write to the same snapshot repository.
4+
Only one deployment must be writing to a repository. The deployment
5+
that will keep writing to the repository will be called the "primary" deployment (the current cluster),
6+
and the other one(s) where we'll mark the repository read-only as the "secondary"
7+
deployments.
8+
9+
First mark the repository as read-only on the secondary deployments:
10+
11+
**Use {kib}**
12+
13+
//tag::kibana-api-ex[]
14+
. Log in to the {ess-console}[{ecloud} console].
15+
+
16+
17+
. On the **Elasticsearch Service** panel, click the name of your deployment.
18+
+
19+
20+
NOTE:
21+
If the name of your deployment is disabled your {kib} instances might be
22+
unhealthy, in which case please contact https://support.elastic.co[Elastic Support].
23+
If your deployment doesn't include {kib}, all you need to do is
24+
{cloud}/ec-access-kibana.html[enable it first].
25+
26+
. Open your deployment's side navigation menu (placed under the Elastic logo in the upper left corner)
27+
and go to **Stack Management > Snapshot and Restore > Repositories**.
28+
+
29+
[role="screenshot"]
30+
image::images/repositories.png[{kib} Console,align="center"]
31+
32+
. The repositories table should now be visible. Click on the pencil icon at the
33+
right side of the repository to be marked as read-only. On the Edit page that opened
34+
scroll down and check "Read-only repository". Click "Save".
35+
Alternatively if deleting the repository altogether is preferable, select the checkbox
36+
at the left of the repository name in the repositories table and click the
37+
"Remove repository" red button at the top left of the table.
38+
39+
At this point, it's only the primary (current) deployment that has the repository marked
40+
as writeable.
41+
{es} sees it as corrupt, so the repository needs to be removed and added back so that
42+
{es} can resume using it:
43+
44+
Note that we're now configuring the primary (current) deployment.
45+
46+
. Open the primary deployment's side navigation menu (placed under the Elastic logo in the upper left corner)
47+
and go to **Stack Management > Snapshot and Restore > Repositories**.
48+
+
49+
[role="screenshot"]
50+
image::images/repositories.png[{kib} Console,align="center"]
51+
52+
. Get the details for the repository we'll recreate later by clicking on the repository
53+
name in the repositories table and making note of all the repository configurations
54+
that are displayed on the repository details page (we'll use them when we recreate
55+
the repository). Close the details page using the link at
56+
the bottom left of the page.
57+
+
58+
[role="screenshot"]
59+
image::images/repo_details.png[{kib} Console,align="center"]
60+
61+
. With all the details above noted, next delete the repository. Select the
62+
checkbox at the left of the repository and hit the "Remove repository" red button
63+
at the top left of the page.
64+
65+
. Recreate the repository by clicking the "Register Repository" button
66+
at the top right corner of the repositories table.
67+
+
68+
[role="screenshot"]
69+
image::images/register_repo.png[{kib} Console,align="center"]
70+
71+
. Fill in the repository name, select the type and click "Next".
72+
+
73+
[role="screenshot"]
74+
image::images/register_repo_details.png[{kib} Console,align="center"]
75+
76+
. Fill in the repository details (client, bucket, base path etc) with the values
77+
you noted down before deleting the repository and click the "Register" button
78+
at the bottom.
79+
80+
. Select "Verify repository" to confirm that your settings are correct and the
81+
deployment can connect to your repository.
82+
//end::kibana-api-ex[]
83+
// end::cloud[]
84+
85+
// tag::self-managed[]
86+
Fixing the corrupted repository will entail making changes in multiple clusters
87+
that write to the same snapshot repository.
88+
Only one cluster must be writing to a repository. Let's call the cluster
89+
we want to keep writing to the repository the "primary" cluster (the current cluster),
90+
and the other one(s) where we'll mark the repository as read-only the "secondary"
91+
clusters.
92+
93+
Let's first work on the secondary clusters:
94+
95+
. Get the configuration of the repository:
96+
+
97+
[source,console]
98+
----
99+
GET _snapshot/my-repo
100+
----
101+
// TEST[skip:we're not setting up repos in these tests]
102+
+
103+
The reponse will look like this:
104+
+
105+
[source,console-result]
106+
----
107+
{
108+
"my-repo": { <1>
109+
"type": "s3",
110+
"settings": {
111+
"bucket": "repo-bucket",
112+
"client": "elastic-internal-71bcd3",
113+
"base_path": "myrepo"
114+
}
115+
}
116+
}
117+
----
118+
// TESTRESPONSE[skip:the result is for illustrating purposes only]
119+
+
120+
<1> Represents the current configuration for the repository.
121+
122+
. Using the settings retrieved above, add the `readonly: true` option to mark
123+
it as read-only:
124+
+
125+
[source,console]
126+
----
127+
PUT _snapshot/my-repo
128+
{
129+
"type": "s3",
130+
"settings": {
131+
"bucket": "repo-bucket",
132+
"client": "elastic-internal-71bcd3",
133+
"base_path": "myrepo",
134+
"readonly": true <1>
135+
}
136+
}
137+
----
138+
// TEST[skip:we're not setting up repos in these tests]
139+
+
140+
<1> Marks the repository as read-only.
141+
142+
. Alternatively, deleting the repository is an option using:
143+
+
144+
[source,console]
145+
----
146+
DELETE _snapshot/my-repo
147+
----
148+
// TEST[skip:we're not setting up repos in these tests]
149+
+
150+
The response will look like this:
151+
+
152+
[source,console-result]
153+
------------------------------------------------------------------------------
154+
{
155+
"acknowledged": true
156+
}
157+
------------------------------------------------------------------------------
158+
// TESTRESPONSE[skip:the result is for illustrating purposes only]
159+
160+
At this point, it's only the primary (current) cluster that has the repository marked
161+
as writeable.
162+
{es} sees it as corrupt though so let's remove the repository and recreate it so that
163+
{es} can resume using it:
164+
165+
Note that now we're configuring the primary (current) cluster.
166+
167+
. Get the configuration of the repository and save its configuration as we'll use it
168+
to recreate the repository:
169+
+
170+
[source,console]
171+
----
172+
GET _snapshot/my-repo
173+
----
174+
// TEST[skip:we're not setting up repos in these tests]
175+
. Delete the repository:
176+
+
177+
[source,console]
178+
----
179+
DELETE _snapshot/my-repo
180+
----
181+
// TEST[skip:we're not setting up repos in these tests]
182+
+
183+
The response will look like this:
184+
+
185+
[source,console-result]
186+
------------------------------------------------------------------------------
187+
{
188+
"acknowledged": true
189+
}
190+
------------------------------------------------------------------------------
191+
// TESTRESPONSE[skip:the result is for illustrating purposes only]
192+
193+
. Using the configuration we obtained above, let's recreate the repository:
194+
+
195+
[source,console]
196+
----
197+
PUT _snapshot/my-repo
198+
{
199+
"type": "s3",
200+
"settings": {
201+
"bucket": "repo-bucket",
202+
"client": "elastic-internal-71bcd3",
203+
"base_path": "myrepo"
204+
}
205+
}
206+
----
207+
// TEST[skip:we're not setting up repos in these tests]
208+
+
209+
The response will look like this:
210+
+
211+
[source,console-result]
212+
------------------------------------------------------------------------------
213+
{
214+
"acknowledged": true
215+
}
216+
------------------------------------------------------------------------------
217+
// TESTRESPONSE[skip:the result is for illustrating purposes only]
218+
// end::self-managed[]
219+

docs/reference/troubleshooting.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ include::troubleshooting/data/start-ilm.asciidoc[]
5858

5959
include::troubleshooting/data/start-slm.asciidoc[]
6060

61+
include::troubleshooting/snapshot/add-repository.asciidoc[]
62+
6163
include::monitoring/troubleshooting.asciidoc[]
6264

6365
include::transform/troubleshooting.asciidoc[leveloffset=+1]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[[add-repository]]
2+
== Multiple deployments writing to the same snapshot repository
3+
4+
Multiple {es} deployments are writing to the same snapshot repository. {es} doesn't
5+
support this configuration and only one cluster is allowed to write to the same
6+
repository.
7+
To remedy the situation mark the repository as read-only or remove it from all the
8+
other deployments, and re-add (recreate) the repository in the current deployment:
9+
10+
include::{es-repo-dir}/tab-widgets/troubleshooting/snapshot/corrupt-repository-widget.asciidoc[]
11+

0 commit comments

Comments
 (0)