Skip to content

string terms is very slow when there are millions of buckets #30117

@pakein

Description

@pakein

elasticsearch 6.1.3

In GlobalOrdinalsStringTermsAggregator
When there are levels of aggregation, parent agg and valueCount both more than 100 thousands bucket

the loop may be explode
for (long globalTermOrd = 0; globalTermOrd < valueCount; ++globalTermOrd)

My temporary resolution is, loop by bucketOrds

            private static Field keysField;
            static {
                try {
                   keysField = LongHash.class.getDeclaredField("keys");
                   keysField.setAccessible(true);
                 } catch (Exception e) {
                LOGGER.error(e.getMessage(), e);
            }
     
           if (bucketOrds != null && bucketOrds.size() < valueCount && bucketCountThresholds.getMinDocCount() > 0 ) {
                try {
                    loop = false;
                    LongArray keys = ((LongArray) keysField.get(bucketOrds));
                    for (long i = 0; i < keys.size(); i++) {
                        //i: bucketOrd
                        int bucketDocCount = bucketDocCount(i);
                        if ( bucketDocCount == 0) {
                            continue;
                        }           
                        long globalTermOrd = keys.get(i);
                        if (includeExclude != null && !acceptedGlobalOrdinals.get(globalTermOrd)) {
                            continue;
                        }
                        otherDocCount += bucketDocCount;
                        spare.globalOrd = globalTermOrd;
                        spare.bucketOrd = i;
                        spare.docCount = bucketDocCount;
                        if (bucketCountThresholds.getShardMinDocCount() <= spare.docCount) {
                            spare = ordered.insertWithOverflow(spare);
                            if (spare == null) {
                                spare = new OrdBucket(-1, 0, null, showTermDocCountError, 0);
                            }
                        }

                    }

                } catch (IllegalAccessException e) {
                    LOGGER.error(e.getMessage(), e);
                    loop = true;
                }
            }

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions