-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Distinguish missing and invalid repositories #85551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (unstableNodes.contains(clusterService.getNodeName())) { | ||
| throw new RepositoryException(metadata.name(), "Failed to create repository: current node is not stable"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a more realistic example of how this might happen? Perhaps a better fix would be to avoid throwing an exception when creating the repository altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case ,we use alibaba OSS(an Object Storage service) to store snapshot, but some time the network may not be good enough to connect, which may make the reppository creatation failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And more general, repository creation logic is open to be implement in repository plugin which is uncontrolled, so if the implement is not good enough creation failed can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a bug in your plugin, it should not require a healthy network to create the repository. Compare for instance S3Repository which merely validates some settings in its constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a bug in your plugin, it should not require a healthy network to create the repository. Compare for instance
S3Repositorywhich merely validates some settings in its constructor.
I‘ve discuss with the plugin owner. It's true there are some network action during repository creation which can cause creation failure.
However, I still think elasticsearch core should not lay it's robustness on plugin implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think elasticsearch core should not lay it's robustness on plugin implement.
Yes I agree, I just want to be clear that protecting against this NPE will not really address the problem in the plugin. If transient network issues can prevent the plugin repository from even being created on some nodes then you will need to fix this by retrying the put-repository request once the transient issues are resolved. I don't see a great way to detect that a retry is needed here.
server/src/main/java/org/elasticsearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather fix this using an approach closer to the one in #82457 which avoids the null by creating a dummy repository that returns a more specific exception message than the RepositoryMissingException we get today. The repository isn't really missing, it's invalid on some nodes, and we should distinguish these cases.
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Thanks, I will update in that way. |
|
@DaveCTurner hi, I've update this PR by adding a new DummyRepository, please help to reivew it. |
|
@DaveCTurner By the way , there is a small PR of mine which has been there for a long time , could you please help to review it as well: #83706 |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, thanks @mushao999. I left a few small comments.
server/src/main/java/org/elasticsearch/repositories/DummyRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/DummyRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java
Outdated
Show resolved
Hide resolved
...nternalClusterTest/java/org/elasticsearch/repositories/RepositoriesDuplicatedCreationIT.java
Show resolved
Hide resolved
...nternalClusterTest/java/org/elasticsearch/repositories/RepositoriesDuplicatedCreationIT.java
Show resolved
Hide resolved
|
@DaveCTurner Thanks for your suggestion, I've updated the code. |
|
Should we add elasticsearch/test/framework/src/main/java/org/elasticsearch/test/hamcrest/ThrowableAssertions.java Lines 21 to 31 in caa0b61
|
Yes, I think so. |
so we add it in this PR or maybe open a new one to add it? |
|
Here would be fine with me. |
|
@elasticmachine ok to test |
code added |
|
Looks like your branch is based off an old commit, you will need to merge latest master before CI will pass. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a handful of comments and suggested changes but structurally this is good.
server/src/test/java/org/elasticsearch/repositories/InvalidRepositoryTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/repositories/InvalidRepositoryTests.java
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/test/hamcrest/ThrowableAssertions.java
Outdated
Show resolved
Hide resolved
...rTest/java/org/elasticsearch/repositories/encrypted/EncryptedRepositorySecretIntegTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/repositories/RepositoriesServiceTests.java
Outdated
Show resolved
Hide resolved
...nternalClusterTest/java/org/elasticsearch/repositories/RepositoriesDuplicatedCreationIT.java
Outdated
Show resolved
Hide resolved
...nternalClusterTest/java/org/elasticsearch/repositories/RepositoriesDuplicatedCreationIT.java
Outdated
Show resolved
Hide resolved
| ); | ||
| // verification should fail with some node has InvalidRepository | ||
| try { | ||
| client().admin().cluster().prepareVerifyRepository(repositoryName).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest using expectThrows here:
| client().admin().cluster().prepareVerifyRepository(repositoryName).get(); | |
| final var verificationException = expectThrows(RepositoryVerificationException.class, | |
| () -> client().admin().cluster().prepareVerifyRepository(repositoryName).get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not use expectThrows if we want to assert inner exception.
...nternalClusterTest/java/org/elasticsearch/repositories/RepositoriesDuplicatedCreationIT.java
Outdated
Show resolved
Hide resolved
...nternalClusterTest/java/org/elasticsearch/repositories/RepositoriesDuplicatedCreationIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you missed that expectThrows() returns the exception so you can assert more things about it. I think we can use expectThrows rather than the utilities in ThrowableAssertions here, and I'll open a follow-up to remove ThrowableAssertions entirely.
Edit: see #85671
| // verification should fail with some node has InvalidRepository | ||
| boolean verifyPass = false; | ||
| try { | ||
| client().admin().cluster().prepareVerifyRepository(repositoryName).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned before, but this should be final var repositoryVerificationException = expectThrows(RepositoryVerificationException.class, () -> client().admin().cluster().prepareVerifyRepository(repositoryName).get());
Note that expectThrows returns the exception it caught so you can assert things about it such as its causes and suppressed exceptions.
|
See #85671 for the PR to remove ThrowableAssertions in favour of using expectThrows. |
|
@DaveCTurner Thanks for your suggestion , I've replaced all the |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the extra iterations @mushao999
The
RepositoryServiceon each node maintains a repository instance forevery repository defined in the cluster state. The master validates each
repository definition before inserting it into the cluster state, but in
some cases this validation is incomplete. For instance, there may be
node-local configuration of which the master is unaware which prevents
the repository from being instantiated on some other node in the
cluster.
Today if a repository cannot be instantiated then the node will log a
warning and continue as if the repository doesn't exist. This results in
a confusing
RepositoryMissingExceptionwhen trying to use therepository, and various other surprises (e.g. #85550). With this commit
we create a placeholder
InvalidRepositorywhich reports a moreaccurate exception when it is used.
Relates #82457 which did the same sort of thing for unknown plugins.
Closes #85550 since the repository in question is no longer
null.