Add internal _primary_term doc values field, fix _seq_no indexing #21637

dakrone · 2016-11-17T20:39:12Z

This adds the _primary_term field internally to the mappings. This field is
populated with the current shard's primary term.

It is intended to be used for collision resolution when two document copies have
the same sequence id, therefore, doc_values for the field are stored but the
filed itself is not indexed.

This also fixes the _seq_no field so that doc_values are retrievable (they
were previously stored but irretrievable) and changes the stats implementation
to more efficiently use the points API to retrieve the min/max instead of
iterating on each doc_value value. Additionally, even though we intend to be
able to search on the field, it was previously not searchable. This commit makes
it searchable.

There is no user-visible _primary_term field. Instead, the fields are
updated by calling:

index.parsedDoc().updateSeqID(seqNum, primaryTerm);

This includes example methods in Versions and Engine for retrieving the
sequence id values from the index (see Engine.getSequenceID) that are only
used in unit tests. These will be extended/replaced by actual implementations
once we make use of sequence numbers as a conflict resolution measure.

Relates to #10708
Supercedes #21480

P.S. As a side effect of this commit, SlowCompositeReaderWrapper cannot be
used for documents that contain _seq_no because it is a Point value and SCRW
cannot wrap documents with points, so the tests have been updated to loop
through the LeafReaderContexts now instead.

dakrone · 2016-11-17T20:44:03Z

Because I'm curious, I'm also going to do some measurements about the impact this has on indexing performance for some before and after numbers using Rally, I will post the results here when that is complete.

dakrone · 2016-11-18T00:01:58Z

With _primary_term and _seq_no (this PR):

|   Lap |                         Metric |            Operation |     Value |   Unit |
|------:|-------------------------------:|---------------------:|----------:|-------:|
|   All |                  Indexing time |                      |    16.611 |    min |
|   All |                     Merge time |                      |   7.02977 |    min |
|   All |                   Refresh time |                      |   1.59738 |    min |
|   All |                     Flush time |                      |  0.239417 |    min |
|   All |            Merge throttle time |                      |   3.23633 |    min |
|   All |               Median CPU usage |                      |     401.9 |      % |
|   All |             Total Young Gen GC |                      |    11.778 |      s |
|   All |               Total Old Gen GC |                      |     1.262 |      s |
|   All |                     Index size |                      |   2.59816 |     GB |
|   All |                Totally written |                      |   13.2428 |     GB |
|   All |         Heap used for segments |                      |   13.5155 |     MB |
|   All |       Heap used for doc values |                      |  0.124355 |     MB |
|   All |            Heap used for terms |                      |   11.9919 |     MB |
|   All |            Heap used for norms |                      | 0.0654907 |     MB |
|   All |           Heap used for points |                      |  0.690285 |     MB |
|   All |    Heap used for stored fields |                      |  0.643501 |     MB |
|   All |                  Segment count |                      |        87 |        |
|   All |                 Min Throughput |         index-append |   61006.2 | docs/s |
|   All |              Median Throughput |         index-append |   61178.9 | docs/s |
|   All |                 Max Throughput |         index-append |   61315.8 | docs/s |
|   All |      50.0th percentile latency |         index-append |   571.986 |     ms |
|   All |      90.0th percentile latency |         index-append |   820.688 |     ms |
|   All |      99.0th percentile latency |         index-append |   996.761 |     ms |
|   All |       100th percentile latency |         index-append |   1104.55 |     ms |
|   All | 50.0th percentile service time |         index-append |   571.986 |     ms |
|   All | 90.0th percentile service time |         index-append |   820.688 |     ms |
|   All | 99.0th percentile service time |         index-append |   996.761 |     ms |
|   All |  100th percentile service time |         index-append |   1104.55 |     ms |

Without _primary_term and old _seq_no (master):

|   Lap |                         Metric |            Operation |     Value |   Unit |
|------:|-------------------------------:|---------------------:|----------:|-------:|
|   All |                  Indexing time |                      |   16.2738 |    min |
|   All |                     Merge time |                      |   5.25365 |    min |
|   All |                   Refresh time |                      |   1.59378 |    min |
|   All |                     Flush time |                      |  0.263417 |    min |
|   All |            Merge throttle time |                      |    1.9489 |    min |
|   All |               Median CPU usage |                      |     397.6 |      % |
|   All |             Total Young Gen GC |                      |    11.777 |      s |
|   All |               Total Old Gen GC |                      |     1.143 |      s |
|   All |                     Index size |                      |   2.57957 |     GB |
|   All |                Totally written |                      |   12.0763 |     GB |
|   All |         Heap used for segments |                      |   14.2795 |     MB |
|   All |       Heap used for doc values |                      | 0.0839424 |     MB |
|   All |            Heap used for terms |                      |   12.9326 |     MB |
|   All |            Heap used for norms |                      | 0.0606079 |     MB |
|   All |           Heap used for points |                      |  0.556252 |     MB |
|   All |    Heap used for stored fields |                      |  0.646149 |     MB |
|   All |                  Segment count |                      |        79 |        |
|   All |                 Min Throughput |         index-append |   61491.9 | docs/s |
|   All |              Median Throughput |         index-append |   61605.3 | docs/s |
|   All |                 Max Throughput |         index-append |   62084.6 | docs/s |
|   All |      50.0th percentile latency |         index-append |   514.811 |     ms |
|   All |      90.0th percentile latency |         index-append |   746.988 |     ms |
|   All |      99.0th percentile latency |         index-append |   1049.82 |     ms |
|   All |       100th percentile latency |         index-append |   1076.12 |     ms |
|   All | 50.0th percentile service time |         index-append |   514.811 |     ms |
|   All | 90.0th percentile service time |         index-append |   746.988 |     ms |
|   All | 99.0th percentile service time |         index-append |   1049.82 |     ms |
|   All |  100th percentile service time |         index-append |   1076.12 |     ms |

dakrone · 2016-11-18T00:02:16Z

retest this please

jasontedor · 2016-12-05T21:08:44Z

retest this please

jasontedor

I've read through all the production code. I think it looks right, I left some comments on some small things that I noticed.

I have not read the tests yet.

I will give the production code a final super-careful read tomorrow, and I will read all the tests then too. I just want to get the small comments that I have in front you sooner rather than later.

jasontedor · 2016-12-05T21:29:35Z

core/src/main/java/org/elasticsearch/common/lucene/uid/Versions.java

Nit: if we are going to be abbreviate here, can we be consistent with elsewhere: loadSeqNo?

jasontedor · 2016-12-05T21:29:50Z

core/src/main/java/org/elasticsearch/common/lucene/uid/Versions.java

Nit: seqnum -> seq_no

jasontedor · 2016-12-05T21:31:48Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

Can we please not use Tuple here? I'm fine with a wrapper class, anything but Tuple (plus, with a wrapper class, no boxing). 😄

As long as I can make it not an inner class :D

I'll update this to use an actual class

jasontedor · 2016-12-05T21:32:37Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

Nit: seqNum -> seqNo

jasontedor · 2016-12-05T21:33:55Z

core/src/main/java/org/elasticsearch/index/mapper/ParseContext.java

Should this method be named to seqID or something like that (and throughout this class)?

jasontedor · 2016-12-05T21:35:10Z

core/src/main/java/org/elasticsearch/index/mapper/ParseContext.java

seqNo -> seqID?

jasontedor · 2016-12-05T21:35:17Z

core/src/main/java/org/elasticsearch/index/mapper/ParseContext.java

seqNo -> seqID?

jasontedor · 2016-12-05T22:17:41Z

core/src/main/java/org/elasticsearch/common/lucene/uid/Versions.java

If I'm reading this correctly, the context variable is never used so can we just assign to leaf directly (one less thing to think about).

jasontedor · 2016-12-05T22:40:28Z

core/src/main/java/org/elasticsearch/common/lucene/uid/Versions.java

seqno -> _seq_no

jasontedor · 2016-12-05T23:15:23Z

core/src/main/java/org/elasticsearch/common/lucene/uid/Versions.java

Same thing, I don't think this variable is necessary.

dakrone · 2016-12-06T00:08:53Z

Thanks for taking a look @jasontedor, I pushed some commits for the small comments.

jasontedor

I think we can remove Engine#getSequenceID.

jasontedor · 2016-12-07T21:58:24Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

Sorry for not noticing this sooner (I was blinded by the Tuple), but I don't think that we need this method. Most of the time we do not need the primary term (it's only used to resolve conflicts in the sequence number, and we can just load both separately then) so I think that we can safely drop this (and thus drop the wrapper class).

That's part of the reason I used a Tuple in the first place, I expect this method to go away in the future, though maybe not necessarily in this PR (see my comment on the PR)

dakrone · 2016-12-07T23:17:35Z

I think we can remove Engine#getSequenceID.

Sure, that's only in there to show how to get this for the next person down the line. It's actually used for the unit test.

However, if we remove that, then we could also remove the Versions.loadSeqNo and Versions.loadPrimaryTerm since no one would use those, on down the line. For that reason, I'm in favor of keeping it. I fully expect it to go away down the line when we figure out how we actually want to use _seq_no and _primary_term, but in the meantime it is a very handy bit of code-documentation for how to retrieve these values for anyone dealing with _seq_no and _primary_term.

dakrone · 2016-12-07T23:43:50Z

@jasontedor I pushed two commits that I believe addresses your concerns while still keeping what I wanted around.

jasontedor

The change looks good, I left a comment, two nits, and a request. We are basically there though.

jasontedor · 2016-12-08T01:18:45Z

core/src/main/java/org/elasticsearch/common/lucene/uid/Versions.java

Nit: primary term -> _primary_term

jasontedor · 2016-12-08T01:26:14Z

core/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

Why not primary.getPrimaryTerm()? I don't think it matters since we are going to drop this on the floor and there isn't an assigned sequence number here, but we can set the primary term to the correct value, so why not?

Sure, I'll use primary.getPrimaryTerm instead

jasontedor · 2016-12-08T01:33:24Z

core/src/test/java/org/elasticsearch/index/fielddata/AbstractFieldDataImplTestCase.java

I see this was already like this, but this can go on a single line.

jasontedor · 2016-12-08T01:34:01Z

core/src/test/java/org/elasticsearch/index/fielddata/AbstractFieldDataImplTestCase.java

Can you leave a comment why this is needed?

dakrone · 2016-12-08T02:13:39Z

@jasontedor thanks again, I pushed another two commits addressing your comments.

bleskes · 2016-12-08T09:25:55Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

random drive by question - why is the primary term part of the index result? it's already part of index and index result is supposed to capture the dynamic things that the engine has assigned.

That's fair, I poked around the code and I do not think it's needed on the result at all. Can you confirm @dakrone?

I included it because the sequence number is included in the result. Also, it's used when constructing a new Index op from an Engine.IndexResult:

public Index(Engine.Index index, Engine.IndexResult indexResult) { this.id = index.id(); this.type = index.type(); this.source = index.source(); this.routing = index.routing(); this.parent = index.parent(); this.seqNo = indexResult.getSeqNo(); // <-- here this.primaryTerm = indexResult.getPrimaryTerm(); // <-- and here this.version = indexResult.getVersion(); this.versionType = index.versionType(); this.autoGeneratedIdTimestamp = index.getAutoGeneratedIdTimestamp(); }

(Also used when creating a new Delete op from a DeleteResult)

I see. The seqNo and the term do not necessarily always go together. the seqNo is the location of the operation and the term is the authority to put it there. I like the fact that the result object only contains the things that the internal engine creates / changes. Seq# are owned by the engine (on a primary). Terms are owned by the shard. I would prefer to remove the term. At least in the example you gave (Translog.Index#Index(Index, IndexResult) it's readily available from the index operation.

Okay, I'll remove it then!

bleskes · 2016-12-08T09:28:33Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

another drive by question - should we do it here, or separate this into two, more explicit, flows - one we when we create an Index operation on a replica (where we set the seq no number based on incoming request) and one here when we operate as a primary? If you guys feel the current version is more intuitive, I'm good but I wanted to get confirmation this was considered.

I prefer having exactly one place where this is updated, as it's less likely to get out of sync than if it were separated. It's also easier to find when it's only in a single place.

I'm totally open to suggestions otherwise though, @jasontedor you're about to use this PR, do you have a preference?

I prefer it the way that you have it.

jasontedor

LGTM.

This adds the `_primary_term` field internally to the mappings. This field is populated with the current shard's primary term. It is intended to be used for collision resolution when two document copies have the same sequence id, therefore, doc_values for the field are stored but the filed itself is not indexed. This also fixes the `_seq_no` field so that doc_values are retrievable (they were previously stored but irretrievable) and changes the `stats` implementation to more efficiently use the points API to retrieve the min/max instead of iterating on each doc_value value. Additionally, even though we intend to be able to search on the field, it was previously not searchable. This commit makes it searchable. There is no user-visible `_primary_term` field. Instead, the fields are updated by calling: ```java index.parsedDoc().updateSeqID(seqNum, primaryTerm); ``` This includes example methods in `Versions` and `Engine` for retrieving the sequence id values from the index (see `Engine.getSequenceID`) that are only used in unit tests. These will be extended/replaced by actual implementations once we make use of sequence numbers as a conflict resolution measure. Relates to elastic#10708 Supercedes elastic#21480 P.S. As a side effect of this commit, `SlowCompositeReaderWrapper` cannot be used for documents that contain `_seq_no` because it is a Point value and SCRW cannot wrap documents with points, so the tests have been updated to loop through the `LeafReaderContext`s now instead.

dakrone · 2016-12-09T03:24:42Z

Thanks @jasontedor and @bleskes

dakrone added :Sequence IDs v6.0.0-alpha1 labels Nov 17, 2016

clintongormley added the >enhancement label Nov 19, 2016

dakrone force-pushed the index-seq-id-and-primary-term branch from 30eca01 to 26f2a38 Compare December 5, 2016 20:13

jasontedor reviewed Dec 5, 2016

View reviewed changes

jasontedor self-assigned this Dec 7, 2016

jasontedor requested changes Dec 7, 2016

View reviewed changes

jasontedor reviewed Dec 8, 2016

View reviewed changes

bleskes reviewed Dec 8, 2016

View reviewed changes

jasontedor approved these changes Dec 9, 2016

View reviewed changes

dakrone force-pushed the index-seq-id-and-primary-term branch from 2275850 to ee22a47 Compare December 9, 2016 02:47

dakrone merged commit ee22a47 into elastic:master Dec 9, 2016

dakrone deleted the index-seq-id-and-primary-term branch January 23, 2017 17:22

clintongormley added :Engine :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018

Add internal _primary_term doc values field, fix _seq_no indexing #21637

Add internal _primary_term doc values field, fix _seq_no indexing #21637

Uh oh!

Conversation

dakrone commented Nov 17, 2016

Uh oh!

dakrone commented Nov 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dakrone commented Nov 18, 2016

Uh oh!

dakrone commented Nov 18, 2016

Uh oh!

jasontedor commented Dec 5, 2016

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone commented Dec 6, 2016

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

jasontedor Dec 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone commented Dec 7, 2016

Uh oh!

dakrone commented Dec 7, 2016

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone commented Dec 8, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dakrone commented Nov 17, 2016 •

edited

Loading

jasontedor Dec 7, 2016 •

edited

Loading

dakrone Dec 8, 2016 •

edited

Loading

bleskes Dec 8, 2016 •

edited

Loading