|  | 
|  | 1 | +//// | 
|  | 2 | +/** | 
|  | 3 | + * | 
|  | 4 | + * Licensed to the Apache Software Foundation (ASF) under one | 
|  | 5 | + * or more contributor license agreements.  See the NOTICE file | 
|  | 6 | + * distributed with this work for additional information | 
|  | 7 | + * regarding copyright ownership.  The ASF licenses this file | 
|  | 8 | + * to you under the Apache License, Version 2.0 (the | 
|  | 9 | + * "License"); you may not use this file except in compliance | 
|  | 10 | + * with the License.  You may obtain a copy of the License at | 
|  | 11 | + * | 
|  | 12 | + *     http://www.apache.org/licenses/LICENSE-2.0 | 
|  | 13 | + * | 
|  | 14 | + * Unless required by applicable law or agreed to in writing, software | 
|  | 15 | + * distributed under the License is distributed on an "AS IS" BASIS, | 
|  | 16 | + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | 
|  | 17 | + * See the License for the specific language governing permissions and | 
|  | 18 | + * limitations under the License. | 
|  | 19 | + */ | 
|  | 20 | +//// | 
|  | 21 | +
 | 
|  | 22 | +[[slow_log_responses_from_systable]] | 
|  | 23 | +==== Get Slow/Large Response Logs from System table hbase:slowlog | 
|  | 24 | +
 | 
|  | 25 | +The above section provides details about Admin APIs: | 
|  | 26 | +
 | 
|  | 27 | +* get_slowlog_responses | 
|  | 28 | +* get_largelog_responses | 
|  | 29 | +* clear_slowlog_responses | 
|  | 30 | +
 | 
|  | 31 | +All of the above APIs access online in-memory ring buffers from | 
|  | 32 | +individual RegionServers and accumulate logs from ring buffers to display | 
|  | 33 | +to end user. However, since the logs are stored in memory, after RegionServer is | 
|  | 34 | +restarted, all the objects held in memory of that RegionServer will be cleaned up | 
|  | 35 | +and previous logs are lost. What if we want to persist all these logs forever? | 
|  | 36 | +What if we want to store them in such a manner that operator can get all historical | 
|  | 37 | +records with some filters? e.g get me all large/slow RPC logs that are triggered by | 
|  | 38 | +user1 and are related to region: | 
|  | 39 | +cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. ? | 
|  | 40 | +
 | 
|  | 41 | +If we have a system table that stores such logs in increasing (not so strictly though) | 
|  | 42 | +order of time, it can definitely help operators debug some historical events | 
|  | 43 | +(scan, get, put, compaction, flush etc) with detailed inputs. | 
|  | 44 | +
 | 
|  | 45 | +Config which enabled system table to be created and store all log events is | 
|  | 46 | +`hbase.regionserver.slowlog.systable.enabled`. | 
|  | 47 | +
 | 
|  | 48 | +The default value for this config is `false`. If provided `true` | 
|  | 49 | +(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`), | 
|  | 50 | +a cron job running in every RegionServer will persist the slow/large logs into | 
|  | 51 | +table hbase:slowlog. By default cron job runs every 10 min. Duration can be configured | 
|  | 52 | +with key: `hbase.slowlog.systable.chore.duration`. By default, RegionServer will | 
|  | 53 | +store upto 1000(config key: `hbase.regionserver.slowlog.systable.queue.size`) | 
|  | 54 | +slow/large logs in an internal queue and the chore will retrieve these logs | 
|  | 55 | +from the queue and perform batch insertion in hbase:slowlog. | 
|  | 56 | +
 | 
|  | 57 | +hbase:slowlog has single ColumnFamily: `info` | 
|  | 58 | +`info` contains multiple qualifiers which are the same attributes present as | 
|  | 59 | +part of `get_slowlog_responses` API response. | 
|  | 60 | +
 | 
|  | 61 | +* info:call_details | 
|  | 62 | +* info:client_address | 
|  | 63 | +* info:method_name | 
|  | 64 | +* info:param | 
|  | 65 | +* info:processing_time | 
|  | 66 | +* info:queue_time | 
|  | 67 | +* info:region_name | 
|  | 68 | +* info:response_size | 
|  | 69 | +* info:server_class | 
|  | 70 | +* info:start_time | 
|  | 71 | +* info:type | 
|  | 72 | +* info:username | 
|  | 73 | +
 | 
|  | 74 | +And example of 2 rows from hbase:slowlog scan result: | 
|  | 75 | +[source] | 
|  | 76 | +---- | 
|  | 77 | +
 | 
|  | 78 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) | 
|  | 79 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347 | 
|  | 80 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan | 
|  | 81 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle | 
|  | 82 | +                                                             vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks | 
|  | 83 | +                                                             : true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_ | 
|  | 84 | +                                                             handles_heartbeats: true track_scan_metrics: false | 
|  | 85 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18 | 
|  | 86 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0 | 
|  | 87 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1 | 
|  | 88 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575 | 
|  | 89 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer | 
|  | 90 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732 | 
|  | 91 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL | 
|  | 92 | + \x024\xC1\x03\xE9\x04\xF5@                                  column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2 | 
|  | 93 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) | 
|  | 94 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 | 
|  | 95 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan | 
|  | 96 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a | 
|  | 97 | +                                                             ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2 | 
|  | 98 | +                                                             097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met | 
|  | 99 | +                                                             rics: false | 
|  | 100 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24 | 
|  | 101 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0 | 
|  | 102 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. | 
|  | 103 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 | 
|  | 104 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer | 
|  | 105 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 | 
|  | 106 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL | 
|  | 107 | + \x024\xC1\x06X\x81\xF6\xEC                                  column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1 | 
|  | 108 | +---- | 
|  | 109 | +
 | 
|  | 110 | +Operator can use ColumnValueFilter to filter records based on region_name, username, | 
|  | 111 | +client_address etc. | 
|  | 112 | +
 | 
|  | 113 | +Time range based queries will also be very useful. | 
|  | 114 | +Example: | 
|  | 115 | +[source] | 
|  | 116 | +---- | 
|  | 117 | +scan 'hbase:slowlog', { TIMERANGE => [1589621394000, 1589637999999] } | 
|  | 118 | +---- | 
0 commit comments