-
Notifications
You must be signed in to change notification settings - Fork 729
Persistent volumes for Zookeeper #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've tested this with GKE. How well does multi-zone cluster work with volumes? I get a feeling we lose the ability to transition, in case one zone becomes entirely unavailable, because each volume is created in a zone and pods get affinity to there. |
|
An issue with #32 seems to be that pod deletion is slow. Maybe signals don't reach the java process. |
The selected config is from the jmx_exporter examples.
… and two that can move automatically at node failures
Suggest a mix of persistent and ephemeral data to improve reliability across zones
and with the mix of PV and emptyDir there's no reason to make PVs faster than host disks. Use 10GB as it is the minimum for standard disks on GKE.
#34 merged with a potential solution for that. Interesting tests remain. Go ahead and kill nodes etc. Known remaining issues, non-blocers, with this branch:
|
|
Tested I also tested locally with and no trace of termination handling there either, though shutdown is really fast. Will test on Kafka instead, after merge, as it has a documented graceful shutdown behavior. I will also postpone metrics troubleshooting, as that too benefits from comparison with kafka. Might simply be about jmx_exporter config. |
|
Switched Kafka to dynamic provisioning in 10543bf |
|
Good resource on termination: https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html |
|
After 411192d I get properly logged shutdown behavior in kafka, taking around 15s with negligible load. So the alpine shell not forwardning signals might have been the issue. |
The README.md used to say:
But that's a risky assumption if you have partitioning, which we aim to improve at with #30.
We can do as suggested in #26