Skip to content

Commit 504fc52

Browse files
committed
HBASE-22625 documet use scan snapshot feature (#496)
Fix feedback from Clay Baenziger. Signed-off-by: Clay Baenziger <[email protected]>
1 parent 04feab9 commit 504fc52

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

src/main/asciidoc/_chapters/snapshot_scanner.adoc

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
3232
In HBase, a scan of a table costs server-side HBase resources reading, formating, and returning data back to the client.
3333
Luckily, HBase provides a TableSnapshotScanner and TableSnapshotInputFormat (introduced by link:https://issues.apache.org/jira/browse/HBASE-8369[HBASE-8369]),
34-
which scan snapshot the HBase-written HFiles directly in the HDFS filesystem completely by-passing hbase. This access mode
34+
which can scan HBase-written HFiles directly in the HDFS filesystem completely by-passing hbase. This access mode
3535
performs better than going via HBase and can be used with an offline HBase with in-place or exported
3636
snapshot HFiles.
3737
@@ -41,14 +41,14 @@ To read HFiles directly, the user must have sufficient permissions to access sna
4141
4242
TableSnapshotScanner provides a means for running a single client-side scan over snapshot files.
4343
When using TableSnapshotScanner, we must specify a temporary directory to copy the snapshot files into.
44-
The client user should have write permissions to this directory, and it should not be a subdirectory of
44+
The client user should have write permissions to this directory, and the dir should not be a subdirectory of
4545
the hbase.rootdir. The scanner deletes the contents of the directory once the scanner is closed.
4646
4747
.Use TableSnapshotScanner
4848
====
4949
[source,java]
5050
----
51-
Path restoreDir = new Path("XX"); // restore dir should not be a subdirectory HBase hbase.rootdir
51+
Path restoreDir = new Path("XX"); // restore dir should not be a subdirectory of hbase.rootdir
5252
Scan scan = new Scan();
5353
try (TableSnapshotScanner scanner = new TableSnapshotScanner(conf, restoreDir, snapshotName, scan)) {
5454
Result result = scanner.next();
@@ -61,14 +61,14 @@ try (TableSnapshotScanner scanner = new TableSnapshotScanner(conf, restoreDir, s
6161
====
6262
6363
=== TableSnapshotInputFormat
64-
TableSnapshotInputFormat provide a way to scan over snapshot files in a MapReduce job.
64+
TableSnapshotInputFormat provides a way to scan over snapshot HFiles in a MapReduce job.
6565
6666
.Use TableSnapshotInputFormat
6767
====
6868
[source,java]
6969
----
7070
Job job = new Job(conf);
71-
Path restoreDir = new Path("XX"); // restore dir should not be a subdirectory HBase rootdir
71+
Path restoreDir = new Path("XX"); // restore dir should not be a subdirectory of hbase.rootdir
7272
Scan scan = new Scan();
7373
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, MyTableMapper.class, MyMapKeyOutput.class, MyMapOutputValueWritable.class, job, true, restoreDir);
7474
----
@@ -77,31 +77,31 @@ TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, MyTableMapper.
7777
=== Permission to access snapshot and data files
7878
Generally, only the HBase owner or the HDFS admin have the permission to access HFiles.
7979
80-
link:https://issues.apache.org/jira/browse/HBASE-18659[HBASE-18659] use HDFS ACLs to make HBase granted user have the permission to access the snapshot files.
80+
link:https://issues.apache.org/jira/browse/HBASE-18659[HBASE-18659] uses HDFS ACLs to make HBase granted user have permission to access snapshot files.
8181
8282
==== link:https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Access_Control_Lists[HDFS ACLs]
8383
8484
HDFS ACLs supports an "access ACL", which defines the rules to enforce during permission checks, and a "default ACL",
8585
which defines the ACL entries that new child files or sub-directories receive automatically during creation.
86-
By HDFS ACLs, HBase sync granted users with read permission to HFiles.
86+
Via HDFS ACLs, HBase syncs granted users with read permission to HFiles.
8787
8888
==== Basic idea
8989
90-
The HBase files are orginazed as the following ways:
90+
The HBase files are organized in the following ways:
9191
9292
* {hbase-rootdir}/.tmp/data/{namespace}/{table}
9393
* {hbase-rootdir}/data/{namespace}/{table}
9494
* {hbase-rootdir}/archive/data/{namespace}/{table}
9595
* {hbase-rootdir}/.hbase-snapshot/{snapshotName}
9696
97-
So the basic idea is to add or remove HDFS ACLs to files of
98-
global/namespace/table directory when grant or revoke permission to global/namespace/table.
97+
So the basic idea is to add or remove HDFS ACLs to files of the global/namespace/table directory
98+
when grant or revoke permission to global/namespace/table.
9999
100100
See the design doc in link:https://issues.apache.org/jira/browse/HBASE-18659[HBASE-18659] for more details.
101101
102102
==== Configuration to use this feature
103103
104-
* Firstly, make sure that HDFS ACLs is enabled and umask is set to 027
104+
* Firstly, make sure that HDFS ACLs are enabled and umask is set to 027
105105
----
106106
dfs.namenode.acls.enabled = true
107107
fs.permissions.umask-mode = 027
@@ -119,7 +119,7 @@ hbase.acl.sync.to.hdfs.enable=true
119119
----
120120
121121
* Modify table scheme to enable this feature for a specified table, this config is
122-
false by default for every table, this means the HBase granted acls will not be synced to HDFS
122+
false by default for every table, this means the HBase granted ACLs will not be synced to HDFS
123123
----
124124
alter 't1', CONFIGURATION => {'hbase.acl.sync.to.hdfs.enable' => 'true'}
125125
----
@@ -137,11 +137,16 @@ HDFS has a config which limits the max ACL entries num for one directory or file
137137
----
138138
dfs.namenode.acls.max.entries = 32(default value)
139139
----
140-
The 32 entries include four fixed users for each directory or file: owner, group, other and mask. For a directory, the four users contain 8 ACL entries(access and default) and for a file, the four users contain 4 ACL entries(access). This means there are 24 ACL entries left for named users or groups.
140+
The 32 entries include four fixed users for each directory or file: owner, group, other, and mask.
141+
For a directory, the four users contain 8 ACL entries(access and default) and for a file, the four
142+
users contain 4 ACL entries(access). This means there are 24 ACL entries left for named users or groups.
141143
142-
Based on this limitation, we can only sync up to 12 HBase granted users' ACLs. This means, if a table enable this feature, then the total users with table, namespace of this table, global READ permission should not be greater than 12.
144+
Based on this limitation, we can only sync up to 12 HBase granted users' ACLs. This means, if a table
145+
enables this feature, then the total users with table, namespace of this table, global READ permission
146+
should not be greater than 12.
143147
=====
144148
145149
=====
146-
There are some cases that this coprocessor has not handled or could not handle, so the user HDFS ACLs are not syned normally. Such as a reference link to another hfile of other tables.
150+
There are some cases that this coprocessor has not handled or could not handle, so the user HDFS ACLs
151+
are not synced normally. It will not make a reference link to another hfile of other tables.
147152
=====

0 commit comments

Comments
 (0)