-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16547. [SBN read] Namenode in safe mode should not be transfered to observer state #4201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent seems good to me. This is similar to HDFS-14201 / PR #466, but applying to observer as well as active, which is sensible. Left inline comments.
synchronized void transitionToObserver() throws IOException { | ||
String operationName = "transitionToObserver"; | ||
namesystem.checkSuperuserPrivilege(operationName); | ||
if (namesystem.isInSafeMode()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can guard this by notBecomeActiveInSafemode
. Though the config claims to be about "active" status, the logic in monitorHealth
just generally considers a standby NN as unhealthy if it's in safemode, and I think the intent here is the same as with that config.
Also: namesystem.isInSafeMode()
-> isInSafeMode()
(NameNode
redefines this method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xkrogen very much for the review and careful suggestions.
The intent here is really the same as dfs.ha.nn.not-become-active-in-safemode
. It's just that the configuration name looks a bit conflicting.
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
Outdated
Show resolved
Hide resolved
...ject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSZKFailoverController.java
Outdated
Show resolved
Hide resolved
...ject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSZKFailoverController.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, but all minor.
Can you please also update the documentation for all of the places dfs.ha.nn.not-become-active-in-safemode
is mentioned to document this new behavior?
> find . -name "*.xml" -or -name "*.md" | xargs grep -e "not-become-active-in-safemode"
./hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md:* **dfs.ha.nn.not-become-active-in-safemode** - if prevent safe mode namenodes to become active
./hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md: <name>dfs.ha.nn.not-become-active-in-safemode</name>
./hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md:* **dfs.ha.nn.not-become-active-in-safemode** - if prevent safe mode namenodes to become active
./hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md: <name>dfs.ha.nn.not-become-active-in-safemode</name>
./hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml: <name>dfs.ha.nn.not-become-active-in-safemode</name>
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
Outdated
Show resolved
Hide resolved
...roject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java
Outdated
Show resolved
Hide resolved
...roject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java
Outdated
Show resolved
Hide resolved
...ject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
Outdated
Show resolved
Hide resolved
...ject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
Outdated
Show resolved
Hide resolved
...ject/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
Outdated
Show resolved
Hide resolved
synchronized void transitionToObserver() throws IOException { | ||
String operationName = "transitionToObserver"; | ||
namesystem.checkSuperuserPrivilege(operationName); | ||
if (notBecomeActiveInSafemode && isInSafeMode()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a common flag? notBecomeActiveInSafemode
is used to control Active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the document.
97d06e7
to
8387407
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The failed unit test is unrelated to the change. This is another issue. |
Hi @ayushtkn , could you please take a look? Thanks. |
Sorry for the delay, LGTM. @tomscut , since the PR is a little stale (21 days from last update), can you merge/rebase? Would be good to get a recent Yetus run as well. Assuming all looks good, I can help merge this. |
💔 -1 overall
This message was automatically generated. |
The failed unit test is unrelated to the change. It also failed in other PR #5135. Seems flaky. |
|
…o observer state (apache#4201) Signed-off-by: Erik Krogen <[email protected]> Reviewed-by: Zengqiang Xu <[email protected]>
…o observer state (apache#4201) Signed-off-by: Erik Krogen <[email protected]> Reviewed-by: Zengqiang Xu <[email protected]>
…o observer state (apache#4201) Signed-off-by: Erik Krogen <[email protected]> Reviewed-by: Zengqiang Xu <[email protected]>
…o observer state (apache#4201) Signed-off-by: Erik Krogen <[email protected]> Reviewed-by: Zengqiang Xu <[email protected]> (cherry picked from commit 8f971b0)
…o observer state (apache#4201) Signed-off-by: Erik Krogen <[email protected]> Reviewed-by: Zengqiang Xu <[email protected]> (cherry picked from commit 8f971b0)
…o observer state (apache#4201) Signed-off-by: Erik Krogen <[email protected]> Reviewed-by: Zengqiang Xu <[email protected]> (cherry picked from commit 8f971b0)
JIRA: HDFS-16547.
Currently, when a Namenode is in safemode(under starting or enter safemode manually), we can transfer this Namenode to Observer by command. This Observer node may receive many requests and then throw a SafemodeException, this causes unnecessary failover on the client.
So Namenode in safe mode should not be transfer to observer state.