Skip to content

Rally PMC performance regression due to compression #36399

@Tim-Brooks

Description

@Tim-Brooks

Around early November we experienced a noticeable performance regression on the PMC track in our nightly benchmarks. This only applied to multi-node (network) workloads and was apparent on both the netty and nio runs.

This is very visible on the 90-day graph.

@dliappis identified that it was due to #35357. I investigated this and identified what the root cause is.

Individual actions in Elasticsearch can set compression to true in TransportRequestOptions.

TransportRequestOptions.Builder builder = TransportRequestOptions.*builder*();
builder.withCompress(true);
TransportRequestOptions options = builder.build();

A number of actions set this option manually:

  • TransportGetTaskAction - false
  • BulkAction - true
  • TransportNodesAction.AsyncAction - depends
  • TransportTasksAction - depends
  • PublishClusterStateAction - false
  • RemoteRecoveryTargetHandler - depends

Prior to #35357 this option had no effect. If it was set to false, TcpTransport would overwrite it if transport.tcp.compress was set to true.

if (compress) {
    options = TransportRequestOptions.builder(options).withCompress(true).build();
}

If it was set to true, TcpTransport would only respect it if transport.tcp.compress was also set to true.

private boolean canCompress(TransportRequest request) {
    return this.compress && (!(request instanceof BytesTransportRequest));
}

After #35357, we still overwrite false if compression is enabled based on settings. However, we incidentally started to respect true. So if compression was not enabled we would still compress if it was requested by the TransportRequestOptions.

private boolean canCompress(TransportRequest request) {
    return request instanceof BytesTransportRequest == false;
}

This one change led to the performance regression because the PMC benchmark probably uses messages that ask for compression. Our benchmarks set the compression setting to false. So these messages were not compressed before the change and are compressed after the change. Since the benchmarks (I think) have fast network connections, this compression likely increases CPU usage and the bandwidth savings are probably irrelevant.

This raises the question of what we want the behavior to be. The behavior prior to #35357 makes no sense as there was no point in the compression being set in the TransportRequestOptions.

Do we want:

  1. Compression is only used if both request and setting say true?
  2. Compression is used if either request or setting is true? This would overwrite false. This is equivalent to post-Move compression config to ConnectionProfile #35357 behavior.
  3. TransportRequestOptions be the primary choice and the setting be the fallback choice? This is tricky right now because TransportRequestOptions uses a boolean for the compression indicator. So we do not know if false was set or if it was not set.
  4. Remove compress from TransportRequestOptions and always compress based on the setting. This is equivalent to pre-Move compression config to ConnectionProfile #35357 behavior.

@jasontedor @s1monw

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions