Skip to content

Conversation

@dakrone
Copy link
Member

@dakrone dakrone commented Nov 17, 2016

This adds the _primary_term field internally to the mappings. This field is
populated with the current shard's primary term.

It is intended to be used for collision resolution when two document copies have
the same sequence id, therefore, doc_values for the field are stored but the
filed itself is not indexed.

This also fixes the _seq_no field so that doc_values are retrievable (they
were previously stored but irretrievable) and changes the stats implementation
to more efficiently use the points API to retrieve the min/max instead of
iterating on each doc_value value. Additionally, even though we intend to be
able to search on the field, it was previously not searchable. This commit makes
it searchable.

There is no user-visible _primary_term field. Instead, the fields are
updated by calling:

index.parsedDoc().updateSeqID(seqNum, primaryTerm);

This includes example methods in Versions and Engine for retrieving the
sequence id values from the index (see Engine.getSequenceID) that are only
used in unit tests. These will be extended/replaced by actual implementations
once we make use of sequence numbers as a conflict resolution measure.

Relates to #10708
Supercedes #21480

P.S. As a side effect of this commit, SlowCompositeReaderWrapper cannot be
used for documents that contain _seq_no because it is a Point value and SCRW
cannot wrap documents with points, so the tests have been updated to loop
through the LeafReaderContexts now instead.

@dakrone
Copy link
Member Author

dakrone commented Nov 17, 2016

Because I'm curious, I'm also going to do some measurements about the impact this has on indexing performance for some before and after numbers using Rally, I will post the results here when that is complete.

@dakrone
Copy link
Member Author

dakrone commented Nov 18, 2016

With _primary_term and _seq_no (this PR):

|   Lap |                         Metric |            Operation |     Value |   Unit |
|------:|-------------------------------:|---------------------:|----------:|-------:|
|   All |                  Indexing time |                      |    16.611 |    min |
|   All |                     Merge time |                      |   7.02977 |    min |
|   All |                   Refresh time |                      |   1.59738 |    min |
|   All |                     Flush time |                      |  0.239417 |    min |
|   All |            Merge throttle time |                      |   3.23633 |    min |
|   All |               Median CPU usage |                      |     401.9 |      % |
|   All |             Total Young Gen GC |                      |    11.778 |      s |
|   All |               Total Old Gen GC |                      |     1.262 |      s |
|   All |                     Index size |                      |   2.59816 |     GB |
|   All |                Totally written |                      |   13.2428 |     GB |
|   All |         Heap used for segments |                      |   13.5155 |     MB |
|   All |       Heap used for doc values |                      |  0.124355 |     MB |
|   All |            Heap used for terms |                      |   11.9919 |     MB |
|   All |            Heap used for norms |                      | 0.0654907 |     MB |
|   All |           Heap used for points |                      |  0.690285 |     MB |
|   All |    Heap used for stored fields |                      |  0.643501 |     MB |
|   All |                  Segment count |                      |        87 |        |
|   All |                 Min Throughput |         index-append |   61006.2 | docs/s |
|   All |              Median Throughput |         index-append |   61178.9 | docs/s |
|   All |                 Max Throughput |         index-append |   61315.8 | docs/s |
|   All |      50.0th percentile latency |         index-append |   571.986 |     ms |
|   All |      90.0th percentile latency |         index-append |   820.688 |     ms |
|   All |      99.0th percentile latency |         index-append |   996.761 |     ms |
|   All |       100th percentile latency |         index-append |   1104.55 |     ms |
|   All | 50.0th percentile service time |         index-append |   571.986 |     ms |
|   All | 90.0th percentile service time |         index-append |   820.688 |     ms |
|   All | 99.0th percentile service time |         index-append |   996.761 |     ms |
|   All |  100th percentile service time |         index-append |   1104.55 |     ms |

Without _primary_term and old _seq_no (master):

|   Lap |                         Metric |            Operation |     Value |   Unit |
|------:|-------------------------------:|---------------------:|----------:|-------:|
|   All |                  Indexing time |                      |   16.2738 |    min |
|   All |                     Merge time |                      |   5.25365 |    min |
|   All |                   Refresh time |                      |   1.59378 |    min |
|   All |                     Flush time |                      |  0.263417 |    min |
|   All |            Merge throttle time |                      |    1.9489 |    min |
|   All |               Median CPU usage |                      |     397.6 |      % |
|   All |             Total Young Gen GC |                      |    11.777 |      s |
|   All |               Total Old Gen GC |                      |     1.143 |      s |
|   All |                     Index size |                      |   2.57957 |     GB |
|   All |                Totally written |                      |   12.0763 |     GB |
|   All |         Heap used for segments |                      |   14.2795 |     MB |
|   All |       Heap used for doc values |                      | 0.0839424 |     MB |
|   All |            Heap used for terms |                      |   12.9326 |     MB |
|   All |            Heap used for norms |                      | 0.0606079 |     MB |
|   All |           Heap used for points |                      |  0.556252 |     MB |
|   All |    Heap used for stored fields |                      |  0.646149 |     MB |
|   All |                  Segment count |                      |        79 |        |
|   All |                 Min Throughput |         index-append |   61491.9 | docs/s |
|   All |              Median Throughput |         index-append |   61605.3 | docs/s |
|   All |                 Max Throughput |         index-append |   62084.6 | docs/s |
|   All |      50.0th percentile latency |         index-append |   514.811 |     ms |
|   All |      90.0th percentile latency |         index-append |   746.988 |     ms |
|   All |      99.0th percentile latency |         index-append |   1049.82 |     ms |
|   All |       100th percentile latency |         index-append |   1076.12 |     ms |
|   All | 50.0th percentile service time |         index-append |   514.811 |     ms |
|   All | 90.0th percentile service time |         index-append |   746.988 |     ms |
|   All | 99.0th percentile service time |         index-append |   1049.82 |     ms |
|   All |  100th percentile service time |         index-append |   1076.12 |     ms |

@dakrone
Copy link
Member Author

dakrone commented Nov 18, 2016

retest this please

@dakrone dakrone force-pushed the index-seq-id-and-primary-term branch from 30eca01 to 26f2a38 Compare December 5, 2016 20:13
@jasontedor
Copy link
Member

retest this please

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through all the production code. I think it looks right, I left some comments on some small things that I noticed.

I have not read the tests yet.

I will give the production code a final super-careful read tomorrow, and I will read all the tests then too. I just want to get the small comments that I have in front you sooner rather than later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if we are going to be abbreviate here, can we be consistent with elsewhere: loadSeqNo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seqnum -> seq_no

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please not use Tuple here? I'm fine with a wrapper class, anything but Tuple (plus, with a wrapper class, no boxing). 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as I can make it not an inner class :D

I'll update this to use an actual class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seqNum -> seqNo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method be named to seqID or something like that (and throughout this class)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seqNo -> seqID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seqNo -> seqID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, the context variable is never used so can we just assign to leaf directly (one less thing to think about).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seqno -> _seq_no

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, I don't think this variable is necessary.

@dakrone
Copy link
Member Author

dakrone commented Dec 6, 2016

Thanks for taking a look @jasontedor, I pushed some commits for the small comments.

@jasontedor jasontedor self-assigned this Dec 7, 2016
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove Engine#getSequenceID.

Copy link
Member

@jasontedor jasontedor Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing this sooner (I was blinded by the Tuple), but I don't think that we need this method. Most of the time we do not need the primary term (it's only used to resolve conflicts in the sequence number, and we can just load both separately then) so I think that we can safely drop this (and thus drop the wrapper class).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of the reason I used a Tuple in the first place, I expect this method to go away in the future, though maybe not necessarily in this PR (see my comment on the PR)

@dakrone
Copy link
Member Author

dakrone commented Dec 7, 2016

I think we can remove Engine#getSequenceID.

Sure, that's only in there to show how to get this for the next person down the line. It's actually used for the unit test.

However, if we remove that, then we could also remove the Versions.loadSeqNo and Versions.loadPrimaryTerm since no one would use those, on down the line. For that reason, I'm in favor of keeping it. I fully expect it to go away down the line when we figure out how we actually want to use _seq_no and _primary_term, but in the meantime it is a very handy bit of code-documentation for how to retrieve these values for anyone dealing with _seq_no and _primary_term.

@dakrone
Copy link
Member Author

dakrone commented Dec 7, 2016

@jasontedor I pushed two commits that I believe addresses your concerns while still keeping what I wanted around.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good, I left a comment, two nits, and a request. We are basically there though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: primary term -> _primary_term

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not primary.getPrimaryTerm()? I don't think it matters since we are going to drop this on the floor and there isn't an assigned sequence number here, but we can set the primary term to the correct value, so why not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll use primary.getPrimaryTerm instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was already like this, but this can go on a single line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment why this is needed?

@dakrone
Copy link
Member Author

dakrone commented Dec 8, 2016

@jasontedor thanks again, I pushed another two commits addressing your comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random drive by question - why is the primary term part of the index result? it's already part of index and index result is supposed to capture the dynamic things that the engine has assigned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I poked around the code and I do not think it's needed on the result at all. Can you confirm @dakrone?

Copy link
Member Author

@dakrone dakrone Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included it because the sequence number is included in the result. Also, it's used when constructing a new Index op from an Engine.IndexResult:

        public Index(Engine.Index index, Engine.IndexResult indexResult) {
            this.id = index.id();
            this.type = index.type();
            this.source = index.source();
            this.routing = index.routing();
            this.parent = index.parent();
            this.seqNo = indexResult.getSeqNo();             // <-- here
            this.primaryTerm = indexResult.getPrimaryTerm(); // <-- and here
            this.version = indexResult.getVersion();
            this.versionType = index.versionType();
            this.autoGeneratedIdTimestamp = index.getAutoGeneratedIdTimestamp();
        }

(Also used when creating a new Delete op from a DeleteResult)

Copy link
Contributor

@bleskes bleskes Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The seqNo and the term do not necessarily always go together. the seqNo is the location of the operation and the term is the authority to put it there. I like the fact that the result object only contains the things that the internal engine creates / changes. Seq# are owned by the engine (on a primary). Terms are owned by the shard. I would prefer to remove the term. At least in the example you gave (Translog.Index#Index(Index, IndexResult) it's readily available from the index operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll remove it then!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another drive by question - should we do it here, or separate this into two, more explicit, flows - one we when we create an Index operation on a replica (where we set the seq no number based on incoming request) and one here when we operate as a primary? If you guys feel the current version is more intuitive, I'm good but I wanted to get confirmation this was considered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having exactly one place where this is updated, as it's less likely to get out of sync than if it were separated. It's also easier to find when it's only in a single place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally open to suggestions otherwise though, @jasontedor you're about to use this PR, do you have a preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it the way that you have it.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This adds the `_primary_term` field internally to the mappings. This field is
populated with the current shard's primary term.

It is intended to be used for collision resolution when two document copies have
the same sequence id, therefore, doc_values for the field are stored but the
filed itself is not indexed.

This also fixes the `_seq_no` field so that doc_values are retrievable (they
were previously stored but irretrievable) and changes the `stats` implementation
to more efficiently use the points API to retrieve the min/max instead of
iterating on each doc_value value. Additionally, even though we intend to be
able to search on the field, it was previously not searchable. This commit makes
it searchable.

There is no user-visible `_primary_term` field. Instead, the fields are
updated by calling:

```java
index.parsedDoc().updateSeqID(seqNum, primaryTerm);
```

This includes example methods in `Versions` and `Engine` for retrieving the
sequence id values from the index (see `Engine.getSequenceID`) that are only
used in unit tests. These will be extended/replaced by actual implementations
once we make use of sequence numbers as a conflict resolution measure.

Relates to elastic#10708
Supercedes elastic#21480

P.S. As a side effect of this commit, `SlowCompositeReaderWrapper` cannot be
used for documents that contain `_seq_no` because it is a Point value and SCRW
cannot wrap documents with points, so the tests have been updated to loop
through the `LeafReaderContext`s now instead.
@dakrone dakrone force-pushed the index-seq-id-and-primary-term branch from 2275850 to ee22a47 Compare December 9, 2016 02:47
@dakrone dakrone merged commit ee22a47 into elastic:master Dec 9, 2016
@dakrone
Copy link
Member Author

dakrone commented Dec 9, 2016

Thanks @jasontedor and @bleskes

@dakrone dakrone deleted the index-seq-id-and-primary-term branch January 23, 2017 17:22
@clintongormley clintongormley added :Engine :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants