Skip to content

Commit f550d0a

Browse files
wchevreuilapurtell
authored andcommitted
HBASE-26265 Update ref guide to mention the new store file tracker im… (apache#3942)
1 parent 072fcf4 commit f550d0a

File tree

2 files changed

+146
-0
lines changed

2 files changed

+146
-0
lines changed
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
////
2+
/**
3+
*
4+
* Licensed to the Apache Software Foundation (ASF) under one
5+
* or more contributor license agreements. See the NOTICE file
6+
* distributed with this work for additional information
7+
* regarding copyright ownership. The ASF licenses this file
8+
* to you under the Apache License, Version 2.0 (the
9+
* "License"); you may not use this file except in compliance
10+
* with the License. You may obtain a copy of the License at
11+
*
12+
* http://www.apache.org/licenses/LICENSE-2.0
13+
*
14+
* Unless required by applicable law or agreed to in writing, software
15+
* distributed under the License is distributed on an "AS IS" BASIS,
16+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17+
* See the License for the specific language governing permissions and
18+
* limitations under the License.
19+
*/
20+
////
21+
22+
[[storefiletracking]]
23+
= Store File Tracking
24+
:doctype: book
25+
:numbered:
26+
:toc: left
27+
:icons: font
28+
:experimental:
29+
30+
== Overview
31+
32+
This feature introduces an abstraction layer to track store files still used/needed by store
33+
engines, allowing for plugging different approaches of identifying store
34+
files required by the given store.
35+
36+
Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
37+
those files to the actual store directory at operation commit time. That's a simple and convenient
38+
way to separate transient from already finalised files that are ready to serve client reads with data.
39+
This approach works well with strong consistent file systems, but with the popularity of less consistent
40+
file systems, mainly Object Store which can be used like file systems, dependency on atomic rename operations starts to introduce
41+
performance penalties. The Amazon S3 Object Store, in particular, has been the most affected deployment,
42+
due to its lack of atomic renames. The HBase community temporarily bypassed this problem by building a distributed locking layer called HBOSS,
43+
to guarantee atomicity of operations against S3.
44+
45+
With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
46+
commit is delegated to the specific Store File Tracking implementation.
47+
The implementation can be set at the HBase service leve in *hbase-site.xml* or at the
48+
Table or Column Family via the TableDescriptor configuration.
49+
50+
NOTE: When the store file tracking implementation is specified in *hbase_site.xml*, this configuration is also propagated into a tables configuration
51+
at table creation time. This is to avoid dangerous configuration mismatches between processes, which
52+
could potentially lead to data loss.
53+
54+
== Available Implementations
55+
56+
Store File Tracking initial version provides three builtin implementations:
57+
58+
* DEFAULT
59+
* FILE
60+
* MIGRATION
61+
62+
### DEFAULT
63+
64+
As per the name, this is the Store File Tracking implementation used by default when no explicit
65+
configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
66+
directories and renames. This is how all previous (implicit) implementation that HBase used to track store files.
67+
68+
### FILE
69+
70+
A file tracker implementation that creates new files straight in the store directory, avoiding the
71+
need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
72+
each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
73+
store is updated and a new meta file is written with this list contents, discarding the previous
74+
meta file now containing an out dated list.
75+
76+
### MIGRATION
77+
78+
A special implementation to be used when swapping between Store File Tracking implementations on
79+
pre-existing tables that already contain data, and therefore, files being tracked under an specific
80+
logic.
81+
82+
== Usage
83+
84+
For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
85+
value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
86+
to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
87+
88+
For clusters with data that are upgraded to a version of HBase containing the store file tracking
89+
feature, the Store File Tracking implementation can only be changed with the *MIGRATION*
90+
implementation, so that the _new tracker_ can safely build its list of tracked files based on the
91+
list of the _current tracker_.
92+
93+
NOTE: MIGRATION tracker should NOT be set at global configuration. To use it, follow below section
94+
about setting Store File Tacking at Table or Column Family configuration.
95+
96+
97+
### Configuring for Table or Column Family
98+
99+
Setting Store File Tracking configuration globally may not always be possible or desired, for example,
100+
in the case of upgraded clusters with pre-existing user data.
101+
Store File Tracking can be set at Table or Column Family level configuration.
102+
For example, to specify *FILE* implementation in the table configuration at table creation time,
103+
the following should be applied:
104+
105+
----
106+
create 'my-table', 'f1', 'f2', {CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}}
107+
----
108+
109+
To define *FILE* for an specific Column Family:
110+
111+
----
112+
create 'my-table', {NAME=> '1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}}
113+
----
114+
115+
### Switching trackers at Table or Column Family
116+
117+
A very common scenario is to set Store File Tracking on pre-existing HBase deployments that have
118+
been upgraded to a version that supports this feature. To apply the FILE tracker, tables effectively
119+
need to be migrated from the DEFAULT tracker to the FILE tracker. As explained previously, such
120+
process requires the usage of the special MIGRATION tracker implementation, which can only be
121+
specified at table or Column Family level.
122+
123+
For example, to switch _tracker_ from *DEFAULT* to *FILE* in a table configuration:
124+
125+
----
126+
alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION',
127+
'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT',
128+
'hbase.store.file-tracker.migration.dst.impl' => 'FILE'}
129+
----
130+
131+
To apply similar switch at column family level configuration:
132+
133+
----
134+
alter 'my-table', {NAME => 'f1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION',
135+
'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT',
136+
'hbase.store.file-tracker.migration.dst.impl' => 'FILE'}}
137+
----
138+
139+
Once all table regions have been onlined again, don't forget to disable MIGRATION, by now setting
140+
*hbase.store.file-tracker.migration.dst.impl* value as the *hbase.store.file-tracker.impl*. In the above
141+
example, that would be as follows:
142+
143+
----
144+
alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}
145+
----

src/main/asciidoc/book.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ include::_chapters/zookeeper.adoc[]
8787
include::_chapters/community.adoc[]
8888
include::_chapters/hbtop.adoc[]
8989
include::_chapters/tracing.adoc[]
90+
include::_chapters/store_file_tracking.adoc[]
9091
9192
= Appendix
9293

0 commit comments

Comments
 (0)