Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
3c6c424
Initial import for TDigest forking.
kkrik-es May 9, 2023
d719767
Fix MedianTest.
kkrik-es May 10, 2023
e816b57
Fix Dist.
kkrik-es May 11, 2023
0079943
Fix AVLTreeDigest.quantile to match Dist for uniform centroids.
kkrik-es May 11, 2023
0a18e70
Update docs/changelog/96086.yaml
kkrik-es May 15, 2023
96ea462
Fix `MergingDigest.quantile` to match `Dist` on uniform distribution.
kkrik-es May 12, 2023
56350ee
Add merging to TDigestState.hashCode and .equals.
kkrik-es May 16, 2023
d731aee
Fix style violations for tdigest library.
kkrik-es May 16, 2023
18c1cd5
Fix typo.
kkrik-es May 16, 2023
f96b4d2
Fix more style violations.
kkrik-es May 16, 2023
6d0201b
Fix more style violations.
kkrik-es May 16, 2023
9d48856
Fix remaining style violations in tdigest library.
kkrik-es May 16, 2023
68e489f
Update results in docs based on the forked tdigest.
kkrik-es May 17, 2023
2b3b463
Fix YAML tests in aggs module.
kkrik-es May 18, 2023
0d6ea1e
Fix YAML tests in x-pack/plugin.
kkrik-es May 18, 2023
98c1146
Skip failing V7 compat tests in modules/aggregations.
kkrik-es May 18, 2023
d49ed67
Fix TDigest library unittests.
kkrik-es May 22, 2023
13f17e6
Remove YAML test versions for older releases.
kkrik-es May 22, 2023
5d9ae8e
Fix test failures in docs and mixed cluster.
kkrik-es May 22, 2023
1ad14f8
Reduce buffer sizes in MergingDigest to avoid oom.
kkrik-es May 22, 2023
af93c26
Exclude more failing V7 compatibility tests.
kkrik-es May 23, 2023
d82ec3e
Update results for JdbcCsvSpecIT tests.
kkrik-es May 23, 2023
c9ab354
Update results for JdbcDocCsvSpecIT tests.
kkrik-es May 23, 2023
c19de44
Revert unrelated change.
kkrik-es May 23, 2023
9753468
More test fixes.
kkrik-es May 23, 2023
a23ae32
Use version skips instead of blacklisting in mixed cluster tests.
kkrik-es May 23, 2023
054cd90
Switch TDigestState back to AVLTreeDigest.
kkrik-es May 23, 2023
2ccefde
Update docs and tests with AVLTreeDigest output.
kkrik-es May 23, 2023
3801358
Update flaky test.
kkrik-es May 23, 2023
52ae1ff
Remove dead code, esp around tracking of incoming data.
kkrik-es May 24, 2023
6c05469
Update docs/changelog/96086.yaml
kkrik-es May 24, 2023
082ac3c
Delete docs/changelog/96086.yaml
kkrik-es May 24, 2023
5352c96
Remove explicit compression calls.
kkrik-es May 24, 2023
90b21bc
Merge remote-tracking branch 'upstream/fix/95903' into fix/95903
kkrik-es May 24, 2023
0003908
Revert "Remove explicit compression calls."
kkrik-es May 24, 2023
91fd594
Remove explicit compression calls to MedianAbsoluteDeviation input.
kkrik-es May 24, 2023
4ff173a
Add unittests for AVL and merging digest accuracy.
kkrik-es May 25, 2023
281bfc7
Fix spotless violations.
kkrik-es May 25, 2023
e382efa
Delete redundant tests and benchmarks.
kkrik-es May 29, 2023
60ec86f
Fix spotless violation.
kkrik-es May 29, 2023
fc512a7
Use the old implementation of AVLTreeDigest.
kkrik-es May 29, 2023
8bd54c8
Merge branch 'main' into fix/95903
kkrik-es May 29, 2023
e53bcc8
Update docs with latest percentile results.
kkrik-es May 29, 2023
6b29dc7
Update docs with latest percentile results.
kkrik-es May 29, 2023
b66664c
Merge branch 'main' into fix/95903
kkrik-es May 29, 2023
fe3facd
Remove repeated compression calls.
kkrik-es May 29, 2023
7389bdc
Update more percentile results.
kkrik-es May 29, 2023
0ff89ae
Use approximate percentile values in integration tests.
kkrik-es May 30, 2023
497202b
Fix expected percentile value in test.
kkrik-es May 30, 2023
093abb7
Revert in-place node updates in AVL tree.
kkrik-es Jun 1, 2023
b0c7870
Add SortingDigest and HybridDigest.
kkrik-es Jun 2, 2023
40f1861
Remove deps to the 3.2 library.
kkrik-es Jun 2, 2023
248560d
Remove unused licenses for tdigest.
kkrik-es Jun 2, 2023
2336b11
Revert changes for SortingDigest and HybridDigest.
kkrik-es Jun 6, 2023
ec87dea
Remove unused Histogram classes and unit tests.
kkrik-es Jun 6, 2023
5e06d79
Remove Comparison class, not used.
kkrik-es Jun 6, 2023
3e12dd4
Merge branch 'main' into fix/95903
kkrik-es Jun 6, 2023
063ef27
Small fixes.
kkrik-es Jun 8, 2023
c028d9c
Merge branch 'main' into fix/95903
kkrik-es Jun 8, 2023
dc739af
Add javadoc and tests.
kkrik-es Jun 9, 2023
17b03cf
Remove special logic for singletons in the boundaries.
kkrik-es Jun 10, 2023
079a8c3
Revert changes to expected values in tests.
kkrik-es Jun 10, 2023
47414c7
Revert changes to expected values in tests.
kkrik-es Jun 10, 2023
b8f7c00
Merge branch 'main' into fix/95903
kkrik-es Jun 10, 2023
7718dbb
Tentatively restore percentile rank expected results.
kkrik-es Jun 10, 2023
4df40da
Use cdf version from 3.2
kkrik-es Jun 12, 2023
77c13eb
Revert "Tentatively restore percentile rank expected results."
kkrik-es Jun 12, 2023
4724905
Revert remaining changes compared to main.
kkrik-es Jun 12, 2023
64950c4
Revert excluded V7 compat tests.
kkrik-es Jun 12, 2023
7251150
Exclude V7 compat tests still failing.
kkrik-es Jun 12, 2023
4a78005
Exclude V7 compat tests still failing.
kkrik-es Jun 12, 2023
97ad6a9
Restore bySize function in TDigest and subclasses.
kkrik-es Jun 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
/*
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
* This project is based on a modification of https://github.com/tdunning/t-digest which is licensed under the Apache 2.0 License.
*/

package org.elasticsearch.benchmark.tdigest;

import org.elasticsearch.tdigest.Sort;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;

/** Explores the performance of Sort on pathological input data. */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 10, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 20, time = 2, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@Threads(1)
@State(Scope.Thread)
public class SortBench {
private final int size = 100000;
private final double[] values = new double[size];

@Param({ "0", "1", "-1" })
public int sortDirection;

@Setup
public void setup() {
Random prng = new Random(999983);
for (int i = 0; i < size; i++) {
values[i] = prng.nextDouble();
}
if (sortDirection > 0) {
Arrays.sort(values);
} else if (sortDirection < 0) {
Arrays.sort(values);
Sort.reverse(values, 0, values.length);
}
}

@Benchmark
public void quicksort() {
int[] order = new int[size];
for (int i = 0; i < size; i++) {
order[i] = i;
}
Sort.sort(order, values, null, values.length);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
/*
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
* This project is based on a modification of https://github.com/tdunning/t-digest which is licensed under the Apache 2.0 License.
*/

package org.elasticsearch.benchmark.tdigest;

import org.elasticsearch.tdigest.AVLTreeDigest;
import org.elasticsearch.tdigest.MergingDigest;
import org.elasticsearch.tdigest.TDigest;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.profile.GCProfiler;
import org.openjdk.jmh.profile.StackProfiler;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@Threads(1)
@State(Scope.Thread)
public class TDigestBench {

public enum TDigestFactory {
MERGE {
@Override
TDigest create(double compression) {
return new MergingDigest(compression, (int) (10 * compression));
}
},
AVL_TREE {
@Override
TDigest create(double compression) {
return new AVLTreeDigest(compression);
}
};

abstract TDigest create(double compression);
}

@Param({ "100", "300" })
double compression;

@Param({ "MERGE", "AVL_TREE" })
TDigestFactory tdigestFactory;

@Param({ "NORMAL", "GAUSSIAN" })
String distribution;

Random random;
TDigest tdigest;

double[] data = new double[1000000];

@Setup
public void setUp() {
random = ThreadLocalRandom.current();
tdigest = tdigestFactory.create(compression);

Supplier<Double> nextRandom = () -> distribution.equals("GAUSSIAN") ? random.nextGaussian() : random.nextDouble();
for (int i = 0; i < 10000; ++i) {
tdigest.add(nextRandom.get());
}

for (int i = 0; i < data.length; ++i) {
data[i] = nextRandom.get();
}
}

@State(Scope.Thread)
public static class ThreadState {
int index = 0;
}

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void add(ThreadState state) {
if (state.index >= data.length) {
state.index = 0;
}
tdigest.add(data[state.index++]);
}

public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(".*" + TDigestBench.class.getSimpleName() + ".*")
.warmupIterations(5)
.measurementIterations(5)
.addProfiler(GCProfiler.class)
.addProfiler(StackProfiler.class)
.build();

new Runner(opt).run();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ public class InternalDistributionModuleCheckTaskProvider {
"org.elasticsearch.preallocate",
"org.elasticsearch.securesm",
"org.elasticsearch.server",
"org.elasticsearch.tdigest",
"org.elasticsearch.xcontent"
);

Expand All @@ -75,7 +76,7 @@ public class InternalDistributionModuleCheckTaskProvider {

private static final Function<ModuleReference, String> toName = mref -> mref.descriptor().name();

private InternalDistributionModuleCheckTaskProvider() {};
private InternalDistributionModuleCheckTaskProvider() {}

/** Registers the checkModules tasks, which contains all checks relevant to ES Java Modules. */
static TaskProvider<Task> registerCheckModulesTask(Project project, TaskProvider<Copy> checkExtraction) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,16 @@ The response will look like this:
"aggregations": {
"load_time_ranks": {
"values": {
"500.0": 90.01,
"600.0": 100.0
"500.0": 55.0,
"600.0": 64.0
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
// TESTRESPONSE[s/"500.0": 90.01/"500.0": 55.00000000000001/]
// TESTRESPONSE[s/"600.0": 100.0/"600.0": 64.0/]
// TESTRESPONSE[s/"500.0": 55.0/"500.0": 55.00000000000001/]
// TESTRESPONSE[s/"600.0": 64.0/"600.0": 64.0/]

From this information you can determine you are hitting the 99% load time target but not quite
hitting the 95% load time target
Expand Down Expand Up @@ -101,20 +101,20 @@ Response:
"values": [
{
"key": 500.0,
"value": 90.01
"value": 55.0
},
{
"key": 600.0,
"value": 100.0
"value": 64.0
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
// TESTRESPONSE[s/"value": 90.01/"value": 55.00000000000001/]
// TESTRESPONSE[s/"value": 100.0/"value": 64.0/]
// TESTRESPONSE[s/"value": 55.0/"value": 55.00000000000001/]
// TESTRESPONSE[s/"value": 64.0/"value": 64.0/]


==== Script
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Apache License
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

Expand Down
28 changes: 28 additions & 0 deletions libs/tdigest/NOTICES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Elastic-t-digest

Copyright 2023 Elasticsearch B.V.

--
This project is based on a modification of https://github.com/tdunning/t-digest which is licensed under the Apache 2.0 License.

Licensed to Elasticsearch B.V. under one or more contributor
license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright
ownership. Elasticsearch B.V. licenses this file to you under
the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

--
The code for the t-digest was originally authored by Ted Dunning

Adrien Grand contributed the heart of the AVLTreeDigest (https://github.com/jpountz)
41 changes: 41 additions & 0 deletions libs/tdigest/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import org.elasticsearch.gradle.internal.conventions.precommit.LicenseHeadersTask

/*
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
apply plugin: 'elasticsearch.build'
apply plugin: 'elasticsearch.publish'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that once this is merged, we'll also need to open a pull request to the release manager to ensure this new artifact is published as part of our release process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, is there documentation or some example about what the PR should contain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


dependencies {
testImplementation(project(":test:framework")) {
exclude group: 'org.elasticsearch', module: 'elasticsearch-tdigest'
}
testImplementation 'org.junit.jupiter:junit-jupiter:5.8.1'
}

tasks.named('forbiddenApisMain').configure {
// t-digest does not depend on core, so only jdk signatures should be checked
replaceSignatureFiles 'jdk-signatures'
}

ext.projectLicenses.set(['The Apache Software License, Version 2.0': 'http://www.apache.org/licenses/LICENSE-2.0'])
licenseFile.set(rootProject.file('licenses/APACHE-LICENSE-2.0.txt'))

tasks.withType(LicenseHeadersTask.class).configureEach {
approvedLicenses = ['Apache', 'Generated', 'Vendored']
}
22 changes: 22 additions & 0 deletions libs/tdigest/src/main/java/module-info.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/*
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

module org.elasticsearch.tdigest {
exports org.elasticsearch.tdigest;
}
Loading