From 313142cc0b2311bdbc70877d2755d8fc77ceb6d2 Mon Sep 17 00:00:00 2001 From: Mate Szalay-Beko Date: Tue, 9 Jul 2019 17:25:28 +0200 Subject: [PATCH 1/3] HBASE-21606 document meta table load metrics --- src/main/asciidoc/_chapters/ops_mgt.adoc | 77 ++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index 2f139ddd4ba6..835fda7393ec 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -1738,6 +1738,83 @@ hbase.regionserver.authenticationFailures:: hbase.regionserver.mutationsWithoutWALCount :: Count of writes submitted with a flag indicating they should bypass the write ahead log +[[rs_meta_metrics]] +=== Meta Table Load Metrics + +HBase meta table metrics collection feature is available in HBase 1.4+ but it is disabled by default, as it can +affect the performance of the cluster. When it is enabled, it helps to monitor client access patterns by collecting +the following statistics: + +* number of get, put and delete operations on the `hbase:meta` table +* number of get, put and delete operations made by the top-N clients +* number of operations related to each table +* number of operations related to the top-N regions + +When to use the feature:: + This feature can help to identify hot spots in the meta table by showing the regions or tables where the meta info is + modified (e.g. by create, drop, split or move tables) or retrieved most frequently. It can also help to find misbehaving + client applications by showing which clients are using the meta table most heavily, which can for example suggest the + lack of meta table buffering or the lack of re-using open client connections in the client application. + +.Possible side-effects of enabling this feature +[WARNING] +==== +Having large number of clients and regions in the cluster can cause the registration and tracking of a large amount of +metrics, which can increase the memory and CPU footprint of the HBase region server handling the `hbase:meta` table. +It can also cause the significant increase of the JMX dump size, which can affect the monitoring or log aggregation +system you use beside HBase. It is recommended to turn on this feature only during debugging. +==== + +Where to find the metrics:: + Each metric attribute name will start with the ‘MetaTable_’ prefix. For all the metrics you will see five different + JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute rate. You will find these metrics in JMX + under the following MBean: + `Hadoop -> HBase -> RegionServer -> Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics` + +Configuration:: + To turn on this feature, you have to enable a custom coprocessor by adding the following section to hbase-site.xml. + This coprocessor will run on all the HBase RegionServers, but will be active (i.e. consume memory / CPU) only on + the region, where the `hbase:meta` table is located. It will produce JMX metrics which can be downloaded from the + web UI of the given RegionServer or by a simple REST call. + +.Enabling the Meta Table Metrics feature +[source,xml] +---- + + hbase.coprocessor.region.classes + org.apache.hadoop.hbase.coprocessor.MetaTableMetrics + +---- + +.How the top-N metrics are calculated? +[NOTE] +==== +The 'top-N' type of metrics will be counted using the lossy count algorithm, which is about to identify elements in a +data stream whose frequency count exceed a user-given threshold. The frequency computed by this algorithm is not always +accurate, but has an error threshold that can be specified by the user as a configuration parameter. +The run time space required by the algorithm is inversely proportional to the specified error threshold, hence larger +the error parameter, the smaller the footprint and the less accurate are the metrics. (see the following paper: +link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams"]) + +You can specify the error rate of the algorithm as a floating-point value between 0 and 1 (exclusive), it's default +value is 0.02. Having the error rate set to `E` and having `N` as the total number of meta table operations, then +(assuming the random distribution of the activity of low frequency elements) at most `7 / E` meters will be kept and +each kept element will have a frequency higher than `E * N`. + +An example: Let’s assume we are interested in the HBase clients that are most active in accessing the meta table. +When there was 1,000,000 operations on the meta table so far and the error rate parameter is set to 0.02, then we can +assume that only at most 350 client IP address related counters will be present in JMX and each of these clients +accessed the meta table at least 20,000 times. + +[source,xml] +---- + + hbase.util.default.lossycounting.errorrate + 0.02 + +---- +==== + [[ops.monitoring]] == HBase Monitoring From 5efe755e0bfb8ad38f5f311e83e4b87dbcb78652 Mon Sep 17 00:00:00 2001 From: Mate Szalay-Beko Date: Thu, 11 Jul 2019 11:21:31 +0200 Subject: [PATCH 2/3] HBASE-21606 implement code review comments --- src/main/asciidoc/_chapters/ops_mgt.adoc | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index 835fda7393ec..a1b7119a08cc 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -1765,7 +1765,7 @@ It can also cause the significant increase of the JMX dump size, which can affec system you use beside HBase. It is recommended to turn on this feature only during debugging. ==== -Where to find the metrics:: +Where to find the metrics in JMX:: Each metric attribute name will start with the ‘MetaTable_’ prefix. For all the metrics you will see five different JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute rate. You will find these metrics in JMX under the following MBean: @@ -1774,8 +1774,9 @@ Where to find the metrics:: Configuration:: To turn on this feature, you have to enable a custom coprocessor by adding the following section to hbase-site.xml. This coprocessor will run on all the HBase RegionServers, but will be active (i.e. consume memory / CPU) only on - the region, where the `hbase:meta` table is located. It will produce JMX metrics which can be downloaded from the - web UI of the given RegionServer or by a simple REST call. + the server, where the `hbase:meta` table is located. It will produce JMX metrics which can be downloaded from the + web UI of the given RegionServer or by a simple REST call. These metrics will not be present in the JMX dump of the + other RegionServers. .Enabling the Meta Table Metrics feature [source,xml] @@ -1789,16 +1790,17 @@ Configuration:: .How the top-N metrics are calculated? [NOTE] ==== -The 'top-N' type of metrics will be counted using the lossy count algorithm, which is about to identify elements in a -data stream whose frequency count exceed a user-given threshold. The frequency computed by this algorithm is not always -accurate, but has an error threshold that can be specified by the user as a configuration parameter. -The run time space required by the algorithm is inversely proportional to the specified error threshold, hence larger -the error parameter, the smaller the footprint and the less accurate are the metrics. (see the following paper: -link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams"]) +The 'top-N' type of metrics will be counted using the Lossy Counting Algorithm (as defined in +link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams"]), +which is designed to identify elements in a data stream whose frequency count exceed a user-given threshold. +The frequency computed by this algorithm is not always accurate but has an error threshold that can be specified by the +user as a configuration parameter. The run time space required by the algorithm is inversely proportional to the +specified error threshold, hence larger the error parameter, the smaller the footprint and the less accurate are the +metrics. You can specify the error rate of the algorithm as a floating-point value between 0 and 1 (exclusive), it's default value is 0.02. Having the error rate set to `E` and having `N` as the total number of meta table operations, then -(assuming the random distribution of the activity of low frequency elements) at most `7 / E` meters will be kept and +(assuming the uniform distribution of the activity of low frequency elements) at most `7 / E` meters will be kept and each kept element will have a frequency higher than `E * N`. An example: Let’s assume we are interested in the HBase clients that are most active in accessing the meta table. From b14bfd65b87a4798e6a1c6a554045e3507fd587c Mon Sep 17 00:00:00 2001 From: Mate Szalay-Beko Date: Fri, 12 Jul 2019 14:23:22 +0200 Subject: [PATCH 3/3] HBASE-21606 adding example MetaTable metrics to the documentation --- src/main/asciidoc/_chapters/ops_mgt.adoc | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index a1b7119a08cc..41965851ffe2 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -1750,6 +1750,7 @@ the following statistics: * number of operations related to each table * number of operations related to the top-N regions + When to use the feature:: This feature can help to identify hot spots in the meta table by showing the regions or tables where the meta info is modified (e.g. by create, drop, split or move tables) or retrieved most frequently. It can also help to find misbehaving @@ -1769,7 +1770,21 @@ Where to find the metrics in JMX:: Each metric attribute name will start with the ‘MetaTable_’ prefix. For all the metrics you will see five different JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute rate. You will find these metrics in JMX under the following MBean: - `Hadoop -> HBase -> RegionServer -> Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics` + `Hadoop -> HBase -> RegionServer -> Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics`. + +.Examples: some Meta Table metrics you can see in your JMX dump +[source,json] +---- +{ + "MetaTable_get_request_count": 77309, + "MetaTable_put_request_mean_rate": 0.06339092997186495, + "MetaTable_table_MyTestTable_request_15min_rate": 1.1020599841623246, + "MetaTable_client_/172.30.65.42_lossy_request_count": 1786 + "MetaTable_client_/172.30.65.45_put_request_5min_rate": 0.6189810954855728, + "MetaTable_region_1561131112259.c66e4308d492936179352c80432ccfe0._lossy_request_count": 38342, + "MetaTable_region_1561131043640.5bdffe4b9e7e334172065c853cf0caa6._lossy_request_1min_rate": 0.04925099917433935, +} +---- Configuration:: To turn on this feature, you have to enable a custom coprocessor by adding the following section to hbase-site.xml.