-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-26472 Adhere to semantic conventions regarding table data operations #3906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26472 Adhere to semantic conventions regarding table data operations #3906
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
taklwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just few minor comments.
| validatePutsInRowMutations(mutations, conn.connConf.getMaxKeyValueSize()); | ||
| final Supplier<Span> supplier = new TableOperationSpanBuilder() | ||
| .setTableName(tableName) | ||
| .setOperation(HBaseSemanticAttributes.Operation.BATCH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I found you have a class instance check for RowMutations , can't we just use .setOperation(mutations) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No -- this is confusing -- we're both wrong. The operation of the CheckAndMutateBuilder instance is a CHECK_AND_PUT. That operation is a "container" that ships multiple underlying operations. thenMutate allows the CHECH_AND_PUT to apply the contents of mutations ; thenDelete sends a DELETE, &c.
My patch on HBASE-26473 introduces the span attribute db.hbase.container_operations as a general mechanism for handling these operations that wrap other operations. That commit is on the feature branch PR that I posted earlier, ac1cfbb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we haven't gotten to that PR yet, I'm asking in the community what they suggest for this type of operation. https://cloud-native.slack.com/archives/C01QZFGMLQ7/p1638564279059600
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest patch corrects the operation type for the checkandmutate cases @taklwu pointed out. thanks for noticing!
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncTableImpl.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncTableImpl.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/trace/TableOperationSpanBuilder.java
Show resolved
Hide resolved
|
also, if possible please fix the checkstyle as well. |
|
|
||
| @Override | ||
| public CompletableFuture<Result> get(Get get) { | ||
| final Supplier<Span> supplier = new TableOperationSpanBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract a method to avoid so many duplicated lines? At least the setTableName is always the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed, but I'm not sure what it gets us. I think it's useful to see the entirety of builder arguments in place where its used, but it's entirely a style thing. Instead, I could add an instance method like
private <B> TableOperationSpanBuilder<B> newTableOperationSpanBuilder() {
return new TableOperationSpanBuilder<>().setTableName(tableName);
}Would this match your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pass in the operation too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation argument is polymorphic, so I'd have to implement several identical methods, each with a different operation type in their signature. I have wrapped up invocations of TableOperationSpanBuilder as described.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/trace/TableOperationSpanBuilder.java
Show resolved
Hide resolved
|
|
||
| @SuppressWarnings("unchecked") | ||
| public Span build() { | ||
| final String name = attributes.getOrDefault(DB_OPERATION, unknown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have different types for Scan, in the past, we only traced scanAll, so I named the span as 'scanAll', and leave the name 'scan' to be used in the future. So what is the plan now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the spec says that for anything that is a direct user action should have a span name that matches the DB operation. In this patch, I interpret those operations to map to our table data action verbs -- "get", "put", &c. -- basically matching up to our shell interface. "scan" would be another such user action. It's a good point that "scanAll" exists in the java client API but not in the shell API...
I think that AsyncTable.scanAll makes sense to use the DB operation name "SCAN", as I have here. I also noticed that AsyncTable<C>.scan(Scan, C) does not have a tracing test. I think we would being back the code you used to have, where each client-side call to scanner.next is traced. You mentioned that it could result in many thousands of scans, so you removed it. But I think this is the correct way to handle this part of our API. Anyway, the distributed tracing implementations seem to limit the number of scan children per parent to ~256. I think we should emit scans that actually happen, and leave it up to the tracing service to handle truncation or summarization of scan children.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised this question over here: https://cloud-native.slack.com/archives/C01QZFGMLQ7/p1638556336052800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed HBASE-26545.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However the "long" scan is traced, the both scan and scanAll methods are "scan" operations from the client's perspective, so I think it's find for both of them to have db.operation="SCAN".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we trace long scan in the Table code? We just return a Scanner to client or users just pass in a Consumer, I do not think we can have a big span to trace them all? It depends on how user call us.
And what you said about the scanner.next, I was not talking about client side scan, I was talking about the server side RegionScanner, a client scan request can lead to thousands or even more of RegionScanner.next if filter is used, which will impact the scan performance a lot, as it could scan all the whole region but return nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we will always have the rpc method to be traced, so even if we do nothing in the scan method, we could still see a lot of rpc spans when scanning. This is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what you said about the scanner.next, I was not talking about client side scan, I was talking about the server side RegionScanner...
Okay, understood. We can discuss that separately.
And we will always have the rpc method to be traced, so even if we do nothing in the scan method, we could still see a lot of rpc spans when scanning. This is true.
It sounds like we need to open an operation-level span, just to encapsulate all the RPC spans.
| + (tableName != null ? tableName.getNameWithNamespaceInclAsString() : unknown); | ||
| final SpanBuilder builder = TraceUtil.getGlobalTracer() | ||
| .spanBuilder(name) | ||
| // TODO: what about clients embedded in Master/RegionServer/Gateways/&c? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not recall clearly, but for a long trace path, we could visit lots of services, so it should be OK to have multiple CLIENT and SERVER kind spans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think lots of services, each with their own CLIENT and SERVER spans is expected in otel. An online HBase application's traced request might start with a CLIENT span coming from a web browser, then a SERVER span coming from the web server, then an HBase CLIENT span and a corresponding Region Server SERVER span, the HDFS CLIENT span, the Data Node SERVER span. Within the client JS application, the web server, Region Server, the Data Node, there could be several INTERNAL spans.
I believe this is the intended behavior of tracing.
Where it gets less clear for me are spans that are not the "main" logic of the activity. For example that extends the scenario I described, what if the HBase client needs to reach out to META to populate the region location, and needs to reach out to a master to locate META? How do we represent those spans -- are they more CLIENT/SERVER pairs, or are they INTERNAL/SERVER pairs? I think that they should all be CLIENT/SERVER pairs, because responsibility of control crosses between logical component boundaries.
It is because of questions like these that I started by working on the "simple" spans of table data operations -- I think these are the most obvious to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me, I had started an effort to update our checkstyle config so that these single-line short expressions would be accepted. I think I was held up due to a missing feature in checkstyle, but I lost track of it. Will be back. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ced47b6 to
303f1ce
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| public static final AttributeKey<String> DB_NAME = SemanticAttributes.DB_NAME; | ||
| public static final AttributeKey<String> NAMESPACE_KEY = SemanticAttributes.DB_HBASE_NAMESPACE; | ||
| public static final AttributeKey<String> DB_OPERATION = SemanticAttributes.DB_OPERATION; | ||
| public static final AttributeKey<String> TABLE_KEY = AttributeKey.stringKey("db.hbase.table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can drop the _KEY part here as none of these constants we import from SemanticAttributes use this naming convention.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Build failure is due to build machine issue, INFRA-22591. |
taklwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…tions Follows the guidance outlined in https://github.com/open-telemetry/opentelemetry-specification/blob/3e380e2/specification/trace/semantic_conventions/database.dm * all table data operations are assumed to be of type CLIENT * populate `db.name` and `db.operation` attributes * name table data operation spans as `db.operation` `db.name`:`db.hbase.table` note: this implementation deviates from the recommended `db.name`.`db.sql.table` and instead uses HBase's native String representation of namespace:tablename.
* correct places where CheckAndPut span names are emitted as "BATCH"
* extend TestAsyncTableTracing to include coverage for both
`AsyncTable.{CheckAndMutateBuilder,CheckAndMutateWithFilterBuilder}`
* add another method to TableOperationSpanBuilder that accepts `Collection<? extends Row>`
c361b9c to
d4b1421
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Follows the guidance outlined in https://github.com/open-telemetry/opentelemetry-specification/blob/3e380e2/specification/trace/semantic_conventions/database.dm
db.nameanddb.operationattributesdb.operationdb.name:db.hbase.tablenote: this implementation deviates from the recommended
db.name.db.sql.tableand insteaduses HBase's native String representation of namespace:tablename.