-
Couldn't load subscription status.
- Fork 3.4k
HBASE-27389 Add cost function in balancer to consider the cost of bui… #4799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…lding bucket cache before moving regions
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
…lding bucket cache before moving regions
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Mind explaining more about the algorithm here? I guess the problem here is that, when moving a region from rs A to rs B, the block cache on A is useless now and then on rs B, we need to reload the block cache of this region, and it will evict other regions data? What is the algo here to measure this cost? Thanks. |
Hello, All the region servers maintain a list of all the HFiles which are already cached. This change was done as part of HBASE-27313. The stochastic load balancer uses this information to find out the cost of moving a region from one region server to the other by comparing the ratio of files already prefetched. Higher the ratio of files prefetched on a region server, lower the cost of moving the region. In addition to this information, the patch also considers the region size to measure the weighted cost moving the region. |
|
I do not fully understand why higher prefetched ratio leads to lower moving cost, if a region's hfiles have all been cached, after moving we need to fetched all the hfiles again, which is costly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this, @ragarkar ! Have made a few questions over the PR, trying to understand how the cost function/balance generator will run.
Also, does this new cost function always be enabled by default on the StochasticLoadBalancer?
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java
Show resolved
Hide resolved
...alancer/src/main/java/org/apache/hadoop/hbase/master/balancer/PrefetchCacheCostFunction.java
Outdated
Show resolved
Hide resolved
...alancer/src/main/java/org/apache/hadoop/hbase/master/balancer/PrefetchCacheCostFunction.java
Outdated
Show resolved
Hide resolved
...e-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
Outdated
Show resolved
Hide resolved
...e-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
Show resolved
Hide resolved
...r/src/main/java/org/apache/hadoop/hbase/master/balancer/PrefetchBasedCandidateGenerator.java
Show resolved
Hide resolved
If we are moving a region from server A to server B, if the prefetch ratio on B is higher than that on A, then the cost of moving the region is low. During region movement, there is a likelihood that the file is already prefetched on server B and hence there will be no need to prefetch it again. The changes done in HBASE-27313 already maintain a list of files which are already prefetched on the server. |
|
…lding bucket cache before moving regions
|
rebuild |
This comment was marked as outdated.
This comment was marked as outdated.
…lding bucket cache before moving regions Fixed spotless errors
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
recheck |
|
rebuild |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
| * persistence. If this parameter is not set this means that the cache persistence is disabled | ||
| * which means that the prefetch ratios of regions on region servers cannot be calculated and | ||
| * hence the regions should be moved based on how much they have been prefetched on a region | ||
| * server. The prefetch cache cost function is disabled if the multiplier is set to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's explicitly mention it's disabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also mention this:
The prefetch ratio function would be most relevant for non-hdfs deployments, which then makes locality irrelevant. In those cases, prefetch and region skewness would be competing to prevail over the final balancer decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Made the suggested changes to javadoc.
…lding bucket cache before moving regions Updated javadoc comments.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…lding bucket cache before moving regions Fixed following issues found during testing 1. The historical prefetch ratio was maintained in a map using the server name in the format <host,port,startcode>. The startcode changes with every server restart which means that the server name will be different after each server restart. This can cause issue while calculating the historical prefetch ratio as the balancer may not be able to find the historical prefetch because of the server name mismatch. 2. Fixed an issue in the region server which may lead to incorrect prefetch ratio calculation if the HFile in a region is a link 3. Added a system test to test the changes
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
…lding bucket cache before moving regions Fixed build failures
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
…lding bucket cache before moving regions
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
…lding bucket cache before moving regions
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…lding bucket cache before moving regions
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…lding bucket cache before moving regions
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…lding bucket cache before moving regions