From 3a28fd775465db4d339ed3ea562aecfce7ec1057 Mon Sep 17 00:00:00 2001 From: Wellington Chevreuil Date: Tue, 14 Dec 2021 00:55:56 +0000 Subject: [PATCH 1/3] HBASE-26265 Update ref guide to mention the new store file tracker implementations --- .../_chapters/store_file_tracking.adoc | 175 ++++++++++++++++++ src/main/asciidoc/book.adoc | 1 + 2 files changed, 176 insertions(+) create mode 100644 src/main/asciidoc/_chapters/store_file_tracking.adoc diff --git a/src/main/asciidoc/_chapters/store_file_tracking.adoc b/src/main/asciidoc/_chapters/store_file_tracking.adoc new file mode 100644 index 000000000000..bb277e3cede6 --- /dev/null +++ b/src/main/asciidoc/_chapters/store_file_tracking.adoc @@ -0,0 +1,175 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[storefiletracking]] += Store File Tracking +:doctype: book +:numbered: +:toc: left +:icons: font +:experimental: + +== Overview + +This feature introduces an abstraction layer to track store files still used/needed by store +engines, allowing for plugging different approaches of identifying store +files required by the given store. + +Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming +those files to the actual store directory at operation commit time. That's a simple and convenient +way to separate transient from already finalised files that are ready to serve client reads with data. +This approach works well with strong consistent file systems, but with the popularity of less consistent +file systems, mainly Object Store file systems, dependency on rename operations starts to introduce +performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment, +due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS, +to guarantee consistency and integrity of operations. + +With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon +commit is delegated to the specific Store File Tracking implementation. +It can be set at individual Table or Column Family configurations, as well as in processes +*hbase-site.xml* configuration file. + +NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration +at table creation time. This is to avoid dangerous configuration mismatches between processes, which +could potentially lead to data loss. + +== Available Implementations + +Store File Tracking initial version provides three builtin implementations: + +* DEFAULT +* FILE +* MIGRATION + +### DEFAULT + +As per the name, this is the Store File Tracking implementation used by default when now explicit +configuration has been defined. The DEFAULT tracker implements the standard approach using temporary +directories and renames. + +### FILE + +A file tracker implementation that creates new files straight in the store directory, avoiding the +need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in +each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given +store is updated and a new meta file is written with this list contents, discarding the previous +meta file now containing an out dated list. + +### MIGRATION + +A special implementation to be used when swapping between Store File Tracking implementations on +pre-existing tables that already contain data, and therefore, files being tracked under an specific +logic. + +== Usage + +For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as +value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior +to the first hbase start. Omitting this property sets the *DEFAULT* implementation. + +### Switching implementations globally + +For running clusters with tables already containing data, Store File Tracking implementation can +only be changed with the *MIGRATION* implementation, so that the _new tracker_ can safely build its +list of tracked files based on the list of the _current tracker_. Additional to the +*hbase.store.file-tracker.impl* property, *MIGRATION* requires the +*hbase.store.file-tracker.migration.src.impl* and *hbase.store.file-tracker.migration.dst.impl*, +where the _current_ and _new_ tracker should be specified. For example, to set *MIGRATION* from +*DEFAULT* to *FILE*, the following should be set in the global config: + +---- + + hbase.store.file-tracker.impl + MIGRATION + + + hbase.store.file-tracker.migration.src.impl + DEFAULT + + + hbase.store.file-tracker.migration.dst.impl + FILE + +---- + +A cluster restart would be needed to effectivelly apply the above configuration. + +NOTE: When MIGRATION is defined globally, new tables creation is not allowed. + +Once cluster has completely started and all regions have already become online, *MIGRATION* tracker +can be disabled and the _new_ implementation should be the one set in *hbase.store.file-tracker.impl*. +On the above example, the new configuration would be: + +---- + + hbase.store.file-tracker.impl + FILE + +---- + +Restart the cluster again to complete migration and allow new tables creation to be executed again. + +### Configuring for Table or Column Family + +The previous example conveniently allows to set Store File Tracker desired configuration on a single +place for all cluster tables. That may not be always desired, either because clusters restarts can be +discouraging, or maybe because some user domain might be more critical to experiment a new feature. +Whatever the reason, Store File Tracking can be set at Table or Column Family level configuration. +For example, to specify *FILE* implementation in the table configuration at table creation time, +the following should be applied: + +---- +create 'my-table', 'f1', 'f2', {CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}} +---- + +To define *FILE* for an specific Column Family: + +---- +create 'my-table', {NAME=> '1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}} +---- + +### Switching trackers at Table or Column Family + +Similarly to when switching implementations at global configuration, when switching _trackers_ for +individual tables or column families, the *MIGRATION* tracker is also required. For example, to +switch _tracker_ from *DEFAULT* to *FILE* in a table configuration: + +---- +alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION', +'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT', +'hbase.store.file-tracker.migration.dst.impl' => 'FILE'} +---- + +To apply similar switch at column family level configuration: + +---- +alter 'my-table', {NAME => 'f1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION', +'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT', +'hbase.store.file-tracker.migration.dst.impl' => 'FILE'}} +---- + +Once all table regions have been onlined again, don't forget to disable MIGRATION, by now setting +*hbase.store.file-tracker.migration.dst.impl* value as the *hbase.store.file-tracker.impl*. In the above +example, that would be as follows: + +---- +alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'} +---- diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc index a622786e58d1..b8c648e8bb6c 100644 --- a/src/main/asciidoc/book.adoc +++ b/src/main/asciidoc/book.adoc @@ -89,6 +89,7 @@ include::_chapters/zookeeper.adoc[] include::_chapters/community.adoc[] include::_chapters/hbtop.adoc[] include::_chapters/tracing.adoc[] +include::_chapters/store_file_tracking.adoc[] = Appendix From 6cc51d749c6c64151a15d551d4b31c5200686ca8 Mon Sep 17 00:00:00 2001 From: Wellington Chevreuil Date: Tue, 14 Dec 2021 22:22:12 +0000 Subject: [PATCH 2/3] addressing review suggestions. --- .../_chapters/store_file_tracking.adoc | 31 +++++++++++-------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/src/main/asciidoc/_chapters/store_file_tracking.adoc b/src/main/asciidoc/_chapters/store_file_tracking.adoc index bb277e3cede6..71fe023076cb 100644 --- a/src/main/asciidoc/_chapters/store_file_tracking.adoc +++ b/src/main/asciidoc/_chapters/store_file_tracking.adoc @@ -37,17 +37,17 @@ Historically, HBase internals have relied on creating hfiles on temporary direct those files to the actual store directory at operation commit time. That's a simple and convenient way to separate transient from already finalised files that are ready to serve client reads with data. This approach works well with strong consistent file systems, but with the popularity of less consistent -file systems, mainly Object Store file systems, dependency on rename operations starts to introduce -performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment, -due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS, -to guarantee consistency and integrity of operations. +file systems, mainly Object Store which can be used like file systems, dependency on atomic rename operations starts to introduce +performance penalties. The Amazon S3 Object Store, in particular, has been the most affected deployment, +due to its lack of atomic renames. The HBase community temporarily bypassed this problem by building a distributed locking layer called HBOSS, +to guarantee atomicity of operations against S3. With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon commit is delegated to the specific Store File Tracking implementation. -It can be set at individual Table or Column Family configurations, as well as in processes -*hbase-site.xml* configuration file. +The implementation can be set at the HBase service leve in *hbase-site.xml* or at the +Table or Column Family via the TableDescriptor configuration. -NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration +NOTE: When the store file tracking implementation is specified in *hbase_site.xml*, this configuration is also propagated into a tables configuration at table creation time. This is to avoid dangerous configuration mismatches between processes, which could potentially lead to data loss. @@ -61,9 +61,9 @@ Store File Tracking initial version provides three builtin implementations: ### DEFAULT -As per the name, this is the Store File Tracking implementation used by default when now explicit +As per the name, this is the Store File Tracking implementation used by default when no explicit configuration has been defined. The DEFAULT tracker implements the standard approach using temporary -directories and renames. +directories and renames. This is how all previous (implicit) implementation that HBase used to track store files. ### FILE @@ -87,10 +87,15 @@ to the first hbase start. Omitting this property sets the *DEFAULT* implementati ### Switching implementations globally -For running clusters with tables already containing data, Store File Tracking implementation can -only be changed with the *MIGRATION* implementation, so that the _new tracker_ can safely build its -list of tracked files based on the list of the _current tracker_. Additional to the -*hbase.store.file-tracker.impl* property, *MIGRATION* requires the +For clusters with data that are upgraded to a version of HBase containing the store file tracking +feature, the Store File Tracking implementation can only be changed with the *MIGRATION* +implementation, so that the _new tracker_ can safely build its list of tracked files based on the +list of the _current tracker_. + +NOTE: Switch implementations globally is only possible when no Store File Tracking configuration +has ben explicitly set in *hbase-site.xml* nor in table descriptors. + +Additional to the *hbase.store.file-tracker.impl* property, *MIGRATION* requires the *hbase.store.file-tracker.migration.src.impl* and *hbase.store.file-tracker.migration.dst.impl*, where the _current_ and _new_ tracker should be specified. For example, to set *MIGRATION* from *DEFAULT* to *FILE*, the following should be set in the global config: From 1a7667c075ba0a5ba00001b15928f293c0cea9d9 Mon Sep 17 00:00:00 2001 From: Wellington Chevreuil Date: Thu, 16 Dec 2021 18:14:05 +0000 Subject: [PATCH 3/3] removing section about setting MIGRATION in global config --- .../_chapters/store_file_tracking.adoc | 59 ++++--------------- 1 file changed, 12 insertions(+), 47 deletions(-) diff --git a/src/main/asciidoc/_chapters/store_file_tracking.adoc b/src/main/asciidoc/_chapters/store_file_tracking.adoc index 71fe023076cb..74d802f386c5 100644 --- a/src/main/asciidoc/_chapters/store_file_tracking.adoc +++ b/src/main/asciidoc/_chapters/store_file_tracking.adoc @@ -85,59 +85,20 @@ For fresh deployments that don't yet contain any user data, *FILE* implementatio value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior to the first hbase start. Omitting this property sets the *DEFAULT* implementation. -### Switching implementations globally - For clusters with data that are upgraded to a version of HBase containing the store file tracking feature, the Store File Tracking implementation can only be changed with the *MIGRATION* implementation, so that the _new tracker_ can safely build its list of tracked files based on the list of the _current tracker_. -NOTE: Switch implementations globally is only possible when no Store File Tracking configuration -has ben explicitly set in *hbase-site.xml* nor in table descriptors. - -Additional to the *hbase.store.file-tracker.impl* property, *MIGRATION* requires the -*hbase.store.file-tracker.migration.src.impl* and *hbase.store.file-tracker.migration.dst.impl*, -where the _current_ and _new_ tracker should be specified. For example, to set *MIGRATION* from -*DEFAULT* to *FILE*, the following should be set in the global config: - ----- - - hbase.store.file-tracker.impl - MIGRATION - - - hbase.store.file-tracker.migration.src.impl - DEFAULT - - - hbase.store.file-tracker.migration.dst.impl - FILE - ----- - -A cluster restart would be needed to effectivelly apply the above configuration. +NOTE: MIGRATION tracker should NOT be set at global configuration. To use it, follow below section +about setting Store File Tacking at Table or Column Family configuration. -NOTE: When MIGRATION is defined globally, new tables creation is not allowed. - -Once cluster has completely started and all regions have already become online, *MIGRATION* tracker -can be disabled and the _new_ implementation should be the one set in *hbase.store.file-tracker.impl*. -On the above example, the new configuration would be: - ----- - - hbase.store.file-tracker.impl - FILE - ----- - -Restart the cluster again to complete migration and allow new tables creation to be executed again. ### Configuring for Table or Column Family -The previous example conveniently allows to set Store File Tracker desired configuration on a single -place for all cluster tables. That may not be always desired, either because clusters restarts can be -discouraging, or maybe because some user domain might be more critical to experiment a new feature. -Whatever the reason, Store File Tracking can be set at Table or Column Family level configuration. +Setting Store File Tracking configuration globally may not always be possible or desired, for example, +in the case of upgraded clusters with pre-existing user data. +Store File Tracking can be set at Table or Column Family level configuration. For example, to specify *FILE* implementation in the table configuration at table creation time, the following should be applied: @@ -153,9 +114,13 @@ create 'my-table', {NAME=> '1', CONFIGURATION => {'hbase.store.file-tracker.impl ### Switching trackers at Table or Column Family -Similarly to when switching implementations at global configuration, when switching _trackers_ for -individual tables or column families, the *MIGRATION* tracker is also required. For example, to -switch _tracker_ from *DEFAULT* to *FILE* in a table configuration: +A very common scenario is to set Store File Tracking on pre-existing HBase deployments that have +been upgraded to a version that supports this feature. To apply the FILE tracker, tables effectively +need to be migrated from the DEFAULT tracker to the FILE tracker. As explained previously, such +process requires the usage of the special MIGRATION tracker implementation, which can only be +specified at table or Column Family level. + +For example, to switch _tracker_ from *DEFAULT* to *FILE* in a table configuration: ---- alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION',