Skip to content

Commit 4d4c928

Browse files
authored
Minor improvements to the README of hbck2 (#138)
* Minor improvements to the README of hbck2 * Fix typo and reflow so each sentence is on its own line * Fix typo and maybe fix whitespace issues as well * Fix a couple more typos
1 parent a4af0cc commit 4d4c928

File tree

1 file changed

+38
-45
lines changed

1 file changed

+38
-45
lines changed

hbase-hbck2/README.md

Lines changed: 38 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,15 @@ _HBCK2_ is the repair tool for Apache HBase clusters.
2222

2323
Problems in operation are bugs. The need for an _HBCK2_ fix
2424
is meant as workaround until the bug is fixed and deployed
25-
in a new hbase version.
25+
in a new HBase version.
2626

2727
## _HBCK2_ vs _hbck1_
28-
HBCK2 is the successor to [hbck](https://hbase.apache.org/book.html#hbck.in.depth),
29-
the repair tool that shipped with _hbase-1.x_ (A.K.A _hbck1_). Use _HBCK2_ in place of
30-
_hbck1_ making repairs against hbase-2.x clusters. _hbck1_ should not be run against an
31-
hbase-2.x install. It may do damage. While _hbck1_ is still bundled inside hbase-2.x
32-
-- to minimize surprise -- it is deprecated, to be removed in _hbase-3.x_. Its
33-
write-facility (`-fix`) has been removed. It can report on the state of an hbase-2.x
34-
cluster but its assessments will be inaccurate since it does not understand the internal
35-
workings of an hbase-2.x.
28+
HBCK2 is the successor to [hbck](https://hbase.apache.org/book.html#hbck.in.depth), the repair tool that shipped with _HBase 1.x_ (A.K.A _hbck1_).
29+
Use _HBCK2_ in place of _hbck1_ making repairs against hbase-2.x clusters.
30+
_hbck1_ should not be run against an HBase 2.x installation as it may do damage.
31+
While _hbck1_ is still included in HBase 2.x to avoid surprises, it is now deprecated and will be removed in _HBase 3.x_.
32+
The write-facility (`-fix`) of _hbck1_ has been removed.
33+
It can report on the state of a HBase 2.x cluster but its assessments will be inaccurate since it does not understand all internal workings of HBase 2.x.
3634

3735
_HBCK2_ does not work the way _hbck1_ used to, even for the case where commands are
3836
similarly named across the two versions. See the next section for how the tools
@@ -60,13 +58,13 @@ Run:
6058
```
6159
$ mvn install
6260
```
63-
The built _HBCK2_ jar will be in the `target` sub-directory.
61+
The built _HBCK2_ jar will be in the `target` subdirectory.
6462

6563
## Running _HBCK2_
6664
The _HBCK2_ jar does not include dependencies; it is not built as a 'fat' jar.
6765
Dependencies must be `provided`. Building, adjusting the target hbase version in the
68-
top-level pom to match your deploy will make for the smoothest operation when run
69-
against your deploy (See the parent pom.xml `hbase-operator-tools` for the
66+
top-level pom to match your deployment will make for the smoothest operation when run
67+
against your deployment (See the parent pom.xml `hbase-operator-tools` for the
7068
[hbase.version to set](https://github.com/apache/hbase-operator-tools/blob/master/pom.xml#L126)).
7169

7270
Where runtime interaction between _HBCK2_ and running cluster can get interesting is
@@ -77,15 +75,14 @@ it should fail gracefully. Use an older release or upgrade your cluster (if you
7775
The easiest means of 'providing' _HBCK2_ its dependencies is by launching
7876
_HBCK2_ via the `$HBASE_HOME/bin/hbase` script. The `bin/hbase` script natively
7977
makes mention of `hbck` -- there is a `hbck` option listed in the help output.
80-
By default, running `bin/hbase hbck`, the built-in _hbck1_ tooling will be run.
78+
By default, running `bin/hbase hbck`, will run the built-in _hbck1_ tool.
8179
To run _HBCK2_, you need to point at a built _HBCK2_ jar using the `-j` option
8280
as in:
8381
~~~~
8482
$ ${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
8583
~~~~
86-
where in the above, `/etc/hbase-conf` is where the deploy's configuration lives.
87-
The _HBCK2_ jar is at
88-
`~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar`.
84+
`/etc/hbase-conf` is where the deployment's configuration lives.
85+
The _HBCK2_ jar is at `~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar`.
8986
The above command with no options or arguments passed will dump out the _HBCK2_ help:
9087
```
9188
usage: HBCK2 [OPTIONS] COMMAND <ARGS>
@@ -445,13 +442,13 @@ Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
445442
```
446443
... it is because the HDFS jars are not on the CLASSPATH. The default is NOT
447444
to bundle HDFS jars on the CLASSPATH when running `hbck` via `bin/hbase`. Define
448-
`HADOOP_HOME` in the environment so `bin/hbase` can find your local hadoop
449-
install and then it will load its HDFS jars.
445+
`HADOOP_HOME` in the environment so `bin/hbase` can find your local Hadoop
446+
installation, and then it will load its HDFS jars.
450447

451448
## _HBCK2_ Overview
452449
_HBCK2_ is currently a simple tool that does one thing at a time only.
453450

454-
In hbase-2.x, the Master is the final arbiter of all state, so a general principal for most
451+
In hbase-2.x, the Master is the final arbiter of all state, so a general principle for most
455452
_HBCK2_ commands is that it asks the Master to effect all repair. This means a Master must be
456453
up before you can run _HBCK2_ commands.
457454

@@ -498,7 +495,7 @@ its _pid_ but also its _ppid_; its parent's _pid_.
498495

499496
Generally all run problem free but if some unforeseen circumstance
500497
arises, the assignment framework may sustain damage requiring
501-
operator intervention. Below we will discuss some such scenarios
498+
operator intervention. Below we will discuss some such scenarios,
502499
but they can manifest in the Master log as a Region being _STUCK_ or
503500
a Procedure transitioning an entity -- a Region or a Table --
504501
may be blocked because another Procedure holds the exclusive lock
@@ -531,10 +528,8 @@ Procedures and Locks as well as the current set of Master Procedure WALs
531528
directory in your hbase install). On startup, on a large
532529
cluster when furious assigning is afoot, this page is
533530
filled with lists of Procedures and Locks. The count of
534-
MasterProcWALs will bloat too. If after the cluster settles,
535-
there is a stuck Lock or Procedure or the count of WALs
536-
doesn't ever come down but only grows, then operator intervention
537-
is needed to alieve the blockage.
531+
MasterProcWALs will bloat too.
532+
If after the cluster settles, there is a stuck Lock or Procedure or the count of WALs doesn't ever come down but only grows, then operator intervention is needed to remove the blockage.
538533

539534
Lists of locks and procedures can also be obtained via the hbase shell:
540535

@@ -544,16 +539,15 @@ $ echo "list_procedures"| hbase shell &> /tmp/procedures.txt
544539
```
545540

546541
#### Master UI: The 'HBCK Report'
547-
An `HBCK Report` page was added to the Master in versions hbase 2.3.0/2.1.6/2.2.1
548-
at `/hbck.jsp`
549-
which shows output from two inspections run by the master on an interval; one
550-
is output by the CatalogJanitor whenever it runs. If overlaps or holes in
551-
`hbase:meta`, the CatalogJanitor half of the page will list what it has found
552-
(otherwise it is quiet). Another background 'chore' process was added to compare
553-
`hbase:meta` and filesystem content making compare; if anomaly, it will make
554-
note in its `HBCK Report` section.
542+
Starting with HBase 2.3.0/2.1.6/2.2.1, the Master UI now includes a `HBCK Report` page located at `/hbck.jsp`.
543+
This pages displays the output from two inspections run by the Master at regular intervals.
555544

556-
See the 'HBCK Report' page itself for how to force runs of the inspectors.
545+
1. The first is performed by the `CatalogJanitor` and reports any overlaps in regions or holes in `hbase:meta`.
546+
2. The second inspection is a background 'chore' process that compares `hbase:meta` and filesystem content, and makes a note of any anomalies in the HBCK Report section.
547+
548+
If you want to force a run of these inspectors, refer to the HBCK Report page for instructions.
549+
550+
Look at the `fixMeta` command to fix overlaps and holes found by these inspections.
557551

558552

559553
#### The [HBase Canary Tool](http://hbase.apache.org/book.html#_canary)
@@ -597,23 +591,23 @@ $ echo " scan 'hbase:meta', {ROWPREFIXFILTER => 'IntegrationTestBigLinkedList_20
597591

598592
...then grep for _OPENING_ or _CLOSING_ Regions.
599593

600-
To move an _OPENING_ issue to _OPEN_ so it agrees with a table's
594+
To move an _OPENING_ issue to _OPEN,_ so it agrees with a table's
601595
_ENABLED_ state, use the `assign` command in the hbase shell to
602596
queue a new Assign Procedure (watch the Master logs to see the
603597
Assign run). If many Regions to assign, use the _HBCK2_ tool. It
604598
can do bulk assigning.
605599

606600
## Fixing Problems
607601

608-
### Some General Principals
602+
### Some General principles
609603
When making repair, make sure hbase:meta is consistent first
610604
before you go about fixing any other issue type such as a filesystem
611605
deviance. Deviance in the filesystem or problems with assign should
612606
be addressed after the hbase:meta has been put in order. If hbase:meta
613607
is out of whack, the Master cannot make proper placements when adopting orphan
614608
filesystem data or making region assignments.
615609

616-
Other general principles to keep in mind include a Region can not be assigned if
610+
Other general principles to keep in mind include a Region cannot be assigned if
617611
it is in _CLOSING_ state (or the inverse, unassigned if in _OPENING_ state) without
618612
first transitioning via _CLOSED_: Regions must always move from _CLOSED_, to _OPENING_,
619613
to _OPEN_, and then to _CLOSING_, _CLOSED_.
@@ -629,12 +623,11 @@ _CLOSED_ state so it agrees with the table's _DISABLED_
629623
state. In this situation, you may have to temporarily set
630624
the table status to _ENABLED_, just so you can do the
631625
assign, and then set it back again after the unassign.
632-
_HBCK2_ has facility to allow you do this. See the
626+
_HBCK2_ has facility to allow you to do this. See the
633627
_HBCK2_ usage output.
634628

635629
What follows is a mix of notes and prescription that comes of experience running hbase-2.x so far.
636-
The root issues that brought on states described below has been fixed in later versions of hbase
637-
so upgrade if you can so as to avoid scenarios described.
630+
The root issues that brought on states described below has been fixed in later versions of HBase so upgrade if you can to avoid scenarios described.
638631

639632
### Assigning/Unassigning
640633

@@ -720,7 +713,7 @@ echo "scan 'hbase:meta', {COLUMN=>'info:regioninfo'}" | hbase shell
720713
```
721714

722715
_HBCK2_ _addFsRegionsMissingInMeta_ can be used if the above does not show any errors. It reads region
723-
metadata info available on the FS region directories in order to recreate regions
716+
metadata info available on the FS region directories to recreate regions
724717
in hbase:meta. Since it can run with hbase partially operational, it attempts to disable online tables
725718
that are affected by the reported problem and it is going to readd regions to _hbase:meta_.
726719
It can check for specific tables/namespaces, or all tables from all namespaces.
@@ -731,7 +724,7 @@ An example below shows adding missing regions for tables 'tbl_1' in the default
731724
$ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
732725
```
733726

734-
As it operates independently from Master, once it finishes successfully, additional steps are
727+
As it operates independently of Master, once it finishes successfully, additional steps are
735728
required to actually have the re-added regions assigned. These are listed below:
736729

737730
1. _addFsRegionsMissingInMeta_ outputs an _assigns_ command with all regions that got re-added. This
@@ -764,14 +757,14 @@ Start the cluster up. It won’t come up fully. It will be stuck because the _na
764757
2019-07-10 18:30:51,090 WARN [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562808216225.725a0fe6c2c869d3d0a9ed82bfa80fa3. is NOT online; state={725a0fe6c2c869d3d0a9ed82bfa80fa3 state=CLOSED, ts=1562808619952, server=null}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
765758
```
766759

767-
To assign the namespace table region, you cannot use the shell. If you use the shell, it will fail with a `PleaseHoldException` because the master is not yet up (it is waiting for the namepace table to come online before it declares itself ‘up’). You have to use the `HBCK2` _assigns_ command. To assign, you will need the namespace encoded name. It shows in the log quoted above: i.e. _725a0fe6c2c869d3d0a9ed82bfa80fa3_ in this case. You will also have to pass the -skip command to ‘skip’ the master version check (without it, your `HBCK2` invocation will also elicit the above `PleaseHoldException` because the master is not yet up). Here is an example adding an assign of the namespace table:
760+
To assign the namespace table region, you cannot use the shell. If you use the shell, it will fail with a `PleaseHoldException` because the master is not yet up (it is waiting for the namespace table to come online before it declares itself ‘up’). You have to use the `HBCK2` _assigns_ command. To assign, you will need the namespace encoded name. It shows in the log quoted above: i.e. _725a0fe6c2c869d3d0a9ed82bfa80fa3_ in this case. You will also have to pass the -skip command to ‘skip’ the master version check (without it, your `HBCK2` invocation will also elicit the above `PleaseHoldException` because the master is not yet up). Here is an example adding an assign of the namespace table:
768761
```
769762
$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3
770763
```
771764

772765
If the invocation comes back with ‘Connection refused’, is the Master up? The Master will shut down after a while if it can’t initialize itself. Just restart the cluster/master and rerun the above assigns command.
773766

774-
When the assigns runs successfully, you’ll see it emit the likes of the following. The ‘48’ on the end is the pid of the assign procedure schedule. If the pid returned is ‘-1’, then the master startup has not progressed sufficently… retry. Or, the encoded regionname is incorrect. Check.
767+
When the assigns runs successfully, you’ll see it emit the likes of the following. The ‘48’ on the end is the pid of the assign procedure schedule. If the pid returned is ‘-1’, then the master startup has not progressed sufficiently… retry. Or, the encoded regionname is incorrect. Check.
775768
```
776769
$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3
777770
18:40:43.817 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
@@ -785,14 +778,14 @@ master.HMaster: Master has completed initialization 132.515sec
785778
```
786779
It might take a while to appear.
787780
788-
The rebuild of _hbase:meta_ adds the user tables in _DISABLED_ state and the regions in _CLOSED_ mode. Reenable tables via the shell to bring all table regions back online.
781+
The rebuild of _hbase:meta_ adds the user tables in _DISABLED_ state and the regions in _CLOSED_ mode. Re-enable tables via the shell to bring all table regions back online.
789782
Do it one-at-a-time or see the `enable_all ".*"` command to enable all tables in one shot.
790783
791784
The rebuild meta will likely be missing edits and may need subsequent repair and cleaning using facility outlined higher up in this README.
792785
793786
### Dropped reference files, missing hbase.version file, and corrupted hfiles
794787
795-
_HBCK2_ can check for hanging references and corrupt hfiles. You can ask it to sideline bad files which may be needed to get over humps where regions won't online or reads are failing. See the _filesystem_ command in the _HBCK2_ listing. Pass one or more tablename (or 'none' to check all tables). It will report bad files. Pass the _--fix_ option to effect repairs.
788+
_HBCK2_ can check for hanging references and corrupt HFiles. You can ask it to sideline bad files, which may be needed to get over humps where regions won't online or reads are failing. See the _filesystem_ command in the _HBCK2_ listing. Pass one or more tablename (or 'none' to check all tables). It will report bad files. Pass the _--fix_ option to effect repairs.
796789
797790
### Procedure Start-over
798791

0 commit comments

Comments
 (0)