-
Couldn't load subscription status.
- Fork 3.4k
HBASE-25032 Wait for region server to become online before adding it to online servers in Master #2769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
What about just add a flag in report duty to tell master whether the region server is ready to take regions? |
|
@Apache9 At the time of The flow is as follows: The issue we observed which led to the creation of this JIRA is a ~20min delay in RS initializing replication sources when RS is unable to connect to peer clusters because of some kerberos configuration. And since Master considers the RS online, it tries to assign regions, which fails with There are a few ways to address this issue, described here. The approach taken in this PR alters the handling of |
|
Oh, what I mean is the periodical report from RS to master, maybe it is called regionServerReport? FWIW, I think it will be better to let the region server tell master directly, so we do not need to schedule a background task on master to detect the status of a region server. Thanks. |
|
@Apache9 Okay I see. So you are suggesting a background task in RS instead of in Master. That will also require new RPC logic in Master. I am not sure if there is much of a difference in pros/cons between these two methods? Thanks |
Just a simple flag like initialized is enough? After we finish the initialization work, we set the flag to true, and regionServerReport will bring this flag to master, so master will know the region server is ready to take regions. |
|
@Apache9 There is already an Then my question is, do you know why we even do all this setup logic after we report to Master in the first place? Why not move the logic in |
|
Let me read the code again and then try to answer your questions. IIRC there is a problem on the order of register znode on zk and send reportForDuty to master, finally we decide to register to zk first but then there is a problem that if a region server only has a znode on zk but never send a reportForDuty to master, it will be in a strange state forever... |
|
Stumbled upon this one.
Agree. I commented on the jira. I think this is an init sequence problem. If we break down the RPCs into the following, I think it should work.
Looks like that should fix the problem. Doing this async stuff in master only complicates the master state machine. |
11b41c7 to
40cf6a3
Compare
|
@Apache9 @bharathv Thanks for the suggestion. I looked at the code some more and I believe we only need to remove the |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach seems in the right direction to me. Basically we are marking the server live on its first report rather than at the startup RPC. The test failures in precommit seem related, mind checking?
You have to update the comments in multiple places (github doesn't let me comment inline). Places like regionServerReport(), regionServerStartup() etc.
| request.setServerCurrentTime(System.currentTimeMillis() + warningSkew * 2); | ||
| sm.regionServerStartup(request.build(), 0, "0.0.0", ia5); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo this?
|
Oh, after checking the code, I remembered what I concerned was another problem... If we do not register our znode on zookeeper after calling regionServerStartup, there is no way for the master to mark this regionserver as dead so it will remain there forever. So I agree with @bharathv that, we should mark a region server as online after its first regionServerReport, instead of regionServerStartup, maybe we could have another map in ServerManager to record the region servers which have called regionServerStartup but no regionServerReport yet, and once there is a regionServerReport, we move it the actual active region server map and allow it to take regions. Thanks. |
12f89c9 to
33c49d8
Compare
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
33c49d8 to
65e6f7a
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@Apache9 I don't think I follow, can you clarify what you meant:
Why do we need another map? Isn't it enough that the server be marked as online upon its first
No need to worry about this I think, since I am not changing this logic. We are registering the znode in |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| long now = System.currentTimeMillis(); | ||
| if ((now - lastMsg) >= msgInterval) { | ||
| // Register with the Master now that our setup is complete. | ||
| tryRegionServerReport(lastMsg, now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caroliney14 You missed my last comment on checking whether tryRegionServerReport() was successful. It can also return without sending a report successfully. Ex: RS starts before master is up. Quoting my last comment here again
You need do this only if tryRegionServerReport() is successful. It can also return without a successful report if the exception thrown is not YouAreDeadException. You may need to make that method return a boolean that tells the caller if the report was successfully sent.
So the code should be something like
boolean reportSuccess = tryRegionServerReport(...)
if (reportSuccess && !online.get()) {
....
}
+1 once that is done.
42f1ec0 to
2eb52a7
Compare
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending QA. @saintstack Please sign-off on this change unless you have any other comments.
@caroliney14 Thanks for being patient and addressing the concerns, very useful contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion else, this PR is excellent... Nice back and forth there @bharathv and @caroliney14
|
|
||
| // We registered with the Master. Go into run mode. | ||
| // Run mode. | ||
| long lastMsg = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about setting this lastMsg here should be set to Long.MAX_LONG so we heartbeat immediately after the report-for-duty... so no lag before the RS is 'online'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean set it to LONG.MIN_LONG (or 0), since the line that checks it says if ((now - lastMsg) >= msgInterval), but yes good one 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do in a follow-on @caroliney14 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack do we have to do this one? asking because the lastMsg timestamp ultimately ends up getting stored here, not sure if it'll have any impact we care about. also, one stage of the PR build kept failing for some unknown reason when I had pushed up the change long lastMsg = 0; which is why I ended up undoing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We not have to do it. It was suggestion. You've done a mountain of great work in here already.
I'll keep an eye on it. Was thinking we heartbeat immediately but that might not be a good idea on a big cluster....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, thank you @saintstack
76c23de to
fc30075
Compare
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
fc30075 to
db7c7d6
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
db7c7d6 to
6558c50
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…to online servers in Master
6558c50 to
a628295
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@bharathv @saintstack the PR build passed -- can we merge this now? |
…s when refreshing state in some procedure implementations Revert "HBASE-25032 Wait for region server to become online before adding it to online servers in Master (#2769)" This reverts commit 1e4639d. Conflicts: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
No description provided.