Skip to content

Commit e1d7dc8

Browse files
authored
HBASE-24405 : Document hbase:slowlog related operations (#1747)
Signed-off-by: ramkrish86 <[email protected]> Signed-off-by: Anoop Sam John <[email protected]>
1 parent 9d9f07b commit e1d7dc8

File tree

4 files changed

+151
-0
lines changed

4 files changed

+151
-0
lines changed

hbase-common/src/main/resources/hbase-default.xml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1947,6 +1947,18 @@ possible configurations would overwhelm and obscure the important.
19471947
responses with complete data.
19481948
</description>
19491949
</property>
1950+
<property>
1951+
<name>hbase.regionserver.slowlog.systable.enabled</name>
1952+
<value>false</value>
1953+
<description>
1954+
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled. If enabled
1955+
(true), all slow/large RPC logs would be persisted to system table hbase:slowlog (in addition
1956+
to in-memory ring buffer at each RegionServer). The records are stored in increasing
1957+
order of time. Operators can scan the table with various combination of ColumnValueFilter.
1958+
More details are provided in the doc section:
1959+
"Get Slow/Large Response Logs from System table hbase:slowlog"
1960+
</description>
1961+
</property>
19501962
<property>
19511963
<name>hbase.master.metafixer.max.merge.count</name>
19521964
<value>64</value>

src/main/asciidoc/_chapters/hbase-default.adoc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2246,6 +2246,24 @@ The percent of region server RPC threads failed to abort RS.
22462246
`false`
22472247

22482248

2249+
[[hbase.regionserver.slowlog.systable.enabled]]
2250+
*`hbase.regionserver.slowlog.systable.enabled`*::
2251+
+
2252+
.Description
2253+
2254+
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled.
2255+
If enabled (true), all slow/large RPC logs would be persisted to system table
2256+
hbase:slowlog (in addition to in-memory ring buffer at each RegionServer).
2257+
The records are stored in increasing order of time.
2258+
Operators can scan the table with various combination of ColumnValueFilter and
2259+
time range.
2260+
More details are provided in the doc section:
2261+
"Get Slow/Large Response Logs from System table hbase:slowlog"
2262+
2263+
+
2264+
.Default
2265+
`false`
2266+
22492267

22502268
[[hbase.master.metafixer.max.merge.count]]
22512269
*`hbase.master.metafixer.max.merge.count`*::

src/main/asciidoc/_chapters/ops_mgt.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2079,6 +2079,9 @@ Examples:
20792079
20802080
----
20812081

2082+
include::slow_log_responses_from_systable.adoc[]
2083+
2084+
20822085
=== Block Cache Monitoring
20832086

20842087
Starting with HBase 0.98, the HBase Web UI includes the ability to monitor and report on the performance of the block cache.
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
////
2+
/**
3+
*
4+
* Licensed to the Apache Software Foundation (ASF) under one
5+
* or more contributor license agreements. See the NOTICE file
6+
* distributed with this work for additional information
7+
* regarding copyright ownership. The ASF licenses this file
8+
* to you under the Apache License, Version 2.0 (the
9+
* "License"); you may not use this file except in compliance
10+
* with the License. You may obtain a copy of the License at
11+
*
12+
* http://www.apache.org/licenses/LICENSE-2.0
13+
*
14+
* Unless required by applicable law or agreed to in writing, software
15+
* distributed under the License is distributed on an "AS IS" BASIS,
16+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17+
* See the License for the specific language governing permissions and
18+
* limitations under the License.
19+
*/
20+
////
21+
22+
[[slow_log_responses_from_systable]]
23+
==== Get Slow/Large Response Logs from System table hbase:slowlog
24+
25+
The above section provides details about Admin APIs:
26+
27+
* get_slowlog_responses
28+
* get_largelog_responses
29+
* clear_slowlog_responses
30+
31+
All of the above APIs access online in-memory ring buffers from
32+
individual RegionServers and accumulate logs from ring buffers to display
33+
to end user. However, since the logs are stored in memory, after RegionServer is
34+
restarted, all the objects held in memory of that RegionServer will be cleaned up
35+
and previous logs are lost. What if we want to persist all these logs forever?
36+
What if we want to store them in such a manner that operator can get all historical
37+
records with some filters? e.g get me all large/slow RPC logs that are triggered by
38+
user1 and are related to region:
39+
cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. ?
40+
41+
If we have a system table that stores such logs in increasing (not so strictly though)
42+
order of time, it can definitely help operators debug some historical events
43+
(scan, get, put, compaction, flush etc) with detailed inputs.
44+
45+
Config which enabled system table to be created and store all log events is
46+
`hbase.regionserver.slowlog.systable.enabled`.
47+
48+
The default value for this config is `false`. If provided `true`
49+
(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`),
50+
a cron job running in every RegionServer will persist the slow/large logs into
51+
table hbase:slowlog. By default cron job runs every 10 min. Duration can be configured
52+
with key: `hbase.slowlog.systable.chore.duration`. By default, RegionServer will
53+
store upto 1000(config key: `hbase.regionserver.slowlog.systable.queue.size`)
54+
slow/large logs in an internal queue and the chore will retrieve these logs
55+
from the queue and perform batch insertion in hbase:slowlog.
56+
57+
hbase:slowlog has single ColumnFamily: `info`
58+
`info` contains multiple qualifiers which are the same attributes present as
59+
part of `get_slowlog_responses` API response.
60+
61+
* info:call_details
62+
* info:client_address
63+
* info:method_name
64+
* info:param
65+
* info:processing_time
66+
* info:queue_time
67+
* info:region_name
68+
* info:response_size
69+
* info:server_class
70+
* info:start_time
71+
* info:type
72+
* info:username
73+
74+
And example of 2 rows from hbase:slowlog scan result:
75+
[source]
76+
----
77+
78+
\x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
79+
\x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347
80+
\x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan
81+
\x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle
82+
vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks
83+
: true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_
84+
handles_heartbeats: true track_scan_metrics: false
85+
\x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18
86+
\x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0
87+
\x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1
88+
\x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575
89+
\x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer
90+
\x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732
91+
\x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL
92+
\x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2
93+
\x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
94+
\x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348
95+
\x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan
96+
\x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a
97+
ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2
98+
097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met
99+
rics: false
100+
\x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24
101+
\x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0
102+
\x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf.
103+
\x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227
104+
\x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer
105+
\x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932
106+
\x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL
107+
\x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1
108+
----
109+
110+
Operator can use ColumnValueFilter to filter records based on region_name, username,
111+
client_address etc.
112+
113+
Time range based queries will also be very useful.
114+
Example:
115+
[source]
116+
----
117+
scan 'hbase:slowlog', { TIMERANGE => [1589621394000, 1589637999999] }
118+
----

0 commit comments

Comments
 (0)