Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented May 23, 2018

In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the caching of the store size to IndexShard
so that it has access to the existing logic regarding whether a shard is active
or not in order to be able to cache the store size more agressively.

The tricky bit is that an inactive shard might still be merged, which may have
a significant impact on the store size.

This should be especially useful for time-based data since most indices are
typically inactive.

In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the caching of the store size to `IndexShard`
so that it has access to the existing logic regarding whether a shard is active
or not in order to be able to cache the store size more agressively.

The tricky bit is that an inactive shard might still be merged, which may have
a significant impact on the store size.

This should be especially useful for time-based data since most indices are
typically inactive.
@jpountz jpountz added >enhancement WIP :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. v7.0.0 v6.4.0 labels May 23, 2018
@jpountz jpountz requested a review from s1monw May 23, 2018 15:38
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jpountz
Copy link
Contributor Author

jpountz commented May 23, 2018

WIP because of the lack of test.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is much simpler yet, I think it's incorrect or has too many corner cases. like if a reader gets closed we don't refresh the stats since we don't see the deletes. It's not good enough. sorry for pushing down that route.

private final CounterMetric totalMergesNumDocs = new CounterMetric();
private final CounterMetric totalMergesSizeInBytes = new CounterMetric();
private final CounterMetric currentMerges = new CounterMetric();
private final AtomicLong currentMerges = new AtomicLong();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change?

private final Set<OnGoingMerge> onGoingMerges = ConcurrentCollections.newConcurrentSet();
private final Set<OnGoingMerge> readOnlyOnGoingMerges = Collections.unmodifiableSet(onGoingMerges);
private final MergeSchedulerConfig config;
private volatile long lastMergeMillis;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these changes. I guess it would be enough to check Engine#getMergeStats() and then do:

mergeStats.current > 0 || mergeStats.total != previousStats.total?

}

@Override
protected boolean needsRefresh() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this is so complex, wouldn't it be enough to override needsRefresh()?

private MergeStats previousStats = new MergeStats();

if (super.needsRefresh()) {
  boolean refresh = false;
  if (isActive()) {
    MergeStats mergeStats = getEngine().getMergeStats();
    refresh = mergeStats.current > 0 || mergeStats.total != previousStats.total;
    previousStats = mergeStats;
  }
  return refresh;
}

if (active.get() == false) {
// We refresh when transitioning to an inactive state to make
// it easier to cache the store size.
refresh("transition to inactive");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 we should not refresh anything except of the internal reader. Visibility guarantees are important.

@jpountz
Copy link
Contributor Author

jpountz commented May 30, 2018

Closed in favor of the original PR.

@jpountz jpountz closed this May 30, 2018
@jpountz jpountz removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >enhancement WIP v6.4.0 v7.0.0 labels May 30, 2018
@jpountz jpountz deleted the store_size_cache branch May 30, 2018 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants