-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-1607: Snapshot reads on Secondaries #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,706 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshot-sessions.yml will be added after json test version will be finalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we usually develop (and review) the yaml version because it's easier to read and supports comments and anchors to reduce the verbosity. Also the json file is auto-generated from the yaml.
ajdavis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so happy that we're choosing the ClientSession API!
| this property is false. | ||
|
|
||
| Snapshot reads and causal consistency are mutually exclusive. Therefore if ``isSnapshot`` is set to true, | ||
| ``causalConsistency`` property is set to false. Client MUST throw an Error if both ``isSnapshot`` and ``causalConsistency`` are set to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server permits a snapshot read to begin with afterClusterTime so it's causally after some previous operation. Within a snapshot read, of course, you can't be causally consistent. This API doesn't expose the "begin a snapshot read causally after an operation" feature. That might be fine, and we could enhance the API later if it turns out that users want that feature. I want to confirm it's our intention to omit the feature now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jyemin @ajdavis
Making sure that we want drivers forbid requesting both causalConsistency and isSnapshot (need to add a test for that).
I lean to raising an error in such case in driver, as otherwise due to drivers possible implementation differences it could be that some would send both atClusterTime and afterClusterTime, and some only one of the fields. Which will result in inconsistent behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with making it an error. It's easier to go from error to not error than the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, libmongoc defines mongoc_session_opt_t with a field flags that is an enum mongoc_session_flag_t. I'll add another variant for isSnapshot, and by nature of the enum, there will be no way to specify causalConsistency and isSnapshot simultaneously. I'll still throw an error if users try to, but worth noting that if other drivers use a similar enum, users won't be able to do this anyways.
| * assert that the command does not have an ``atClusterTime``. | ||
|
|
||
| 2. | Subsequent snapshot reads on a ``ClientSession`` should read from the snapshot of the first read in that session. | ||
| | Test SHOULD introduce two variations one for primaries and additional for secondary reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jyemin currently with test against secondaries only, should we duplicate the tests to run on primary as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be more a test of the server than of drivers, so I don't think it's necessary. What do you think, @ajdavis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is tested on the server in snapshot_reads.js, no need for drivers to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
| Snapshot Reads Specification | ||
| ================================ | ||
|
|
||
| :Spec Title: Snapshot Reads Specification (See the registry of specs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to call out a registry of specs here
| :Spec Title: Snapshot Reads Specification (See the registry of specs) | |
| :Spec Title: Snapshot Reads Specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's the convention here as this referral appears in causal-consistency.rst and driver-sessions.rst.
Removed for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe those are leftovers from the templates. I'll give the specs a quick look to clean these up. Thanks for pointing out other instances.
| specifications: | ||
|
|
||
| - `Causal Consistency <../causal-consistency/causal-consistency.rst#sessionoptions-changes>`__ | ||
| - `Snapshot Reads <../sessions/snapshot-sessions.rst#sessionoptions-changes>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention this, but you should add a make target for session tests in the unified test runner Makefile. This ensures all unified session tests are checked against the schema during the CI build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! Added, and will verify when yml tests are added.
| { | ||
| "minServerVersion": "5.0", | ||
| "topologies": [ | ||
| "sharded-replicaset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be:
topologies: [ "replicaset", "sharded-replicaset" ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
| "client": "client0", | ||
| "sessionOptions": { | ||
| "isSnapshot": true, | ||
| "snapshotTimestamp": 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear what your intention is here. In the spec this option is not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked this test as optional, for drivers that can alter snapshotTimestamp field internally.
Not sure what's the best approach here: Mandate drivers to provide functionality to set snapshotTimestamp or extract this test as optional prose test.
| "commandStartedEvent": { | ||
| "command": { | ||
| "find": "collection0", | ||
| "readConcern": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add commandStartedEvent assertions to all the tests, demonstrating that atClusterTime is included when appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's problematic as we don't have a way to obtain actual atClusterTime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with $$exists.
| } | ||
| }, | ||
| { | ||
| "session": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there really a need for all these session objects? Each test runs in isolation, so a new session instance is created for each test. So there is no re-use issues between tests. I suspect we can have just one session with an isSnapshot: true option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, removed.
- atClusterTime in Distinct command result documented
| ] | ||
| }, | ||
| { | ||
| "description": "First snapshot read does send atClusterTime", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be does not
| .. code:: typescript | ||
| interface ClientSession { | ||
| Optional<BsonTimestamp> snapshotTimestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it, it's not clear why we need this to be a public property if there is no way to initialize a new ClientSession with its value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds right, currently there is not need for snapshotTimestamp in public API. Removed.
As ClientSession.snapshotTimestamp is likely to be introduced later, drivers still need this field as part of ClientSession enitity as mentioned here.
| 2. aggregate | ||
| 3. distinct | ||
|
|
||
| Snapshot read commands errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced its necessary to specify this, since retryable reads are only enabled via an allow-list anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed.
|
|
||
| Drivers MUST NOT retry errors in SnapshotError category for snapshot reads if ``atClusterTime`` is supplied. | ||
|
|
||
| Test Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check with others, but I think this whole section should move to the README.rst in the tests subdirectory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree; seems like the normal place for prose tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| replica set or a sharded cluster supports snapshot reads. | ||
| The server ``minSnapshotHistoryWindowInSeconds`` parameter SHOULD be configured to match the test execution time. | ||
|
|
||
| 1. | The first read in a snapshot session must not send ``atClusterTime`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only needs to be included for prose tests. You don't have to duplicate what's already specified in YAML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barring tests with distinct (see my other comment), spec tests seem to pass in C. Will implement prose test and re-review.
| "id": "session0", | ||
| "client": "client0", | ||
| "sessionOptions": { | ||
| "isSnapshot": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify "causalConsistency": false here and on session1? causalConsistency is true by default which will cause an error with "isSnapshot": true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default causalConsistency value is "not supplied". So in that case it should be translated to !isSnapshot (now it's translated to true)?
I think drivers should account for this case, as we don't want mandate users to always specify "causalConsistency": false if snapshot session is requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that specifying causalConsistency = false just for isSnapshot = true should not be a burden on the users.
But if a user (or a test) sets isSnapshot to true, drivers should set causalConsistency to false? I suppose I'm confused about when drivers should explicitly error because of simultaneously true isSnapshot and causalConsistency values, and when drivers should quietly switch one or the other off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this point is not clear.
I suggest that causalConsistency defaults to true only if no snapshot where requested.
| ] | ||
| }, | ||
| { | ||
| "description": "Distinct operation with snapshot", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot run this test and Mixed operation with snapshot in libmongoc since we don't have a dedicated distinct helper, so I can't verify that these work, but they look good to me.
rstam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| specification defines how sessions are used to implement snapshot reads. | ||
|
|
||
| Snapshot reads | ||
| Reads with readconcern level ``snapshot`` that occur outside of transactions on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readconcern -> read concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| MongoClient changes | ||
| =================== | ||
|
|
||
| There are no API changes to ``MongoClient`` to support snapshot reads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing to have "MongoClient/MongoDatabase/MongoCollection changes" sections when there aren't any changes at all. Can we remove those sections? We could add one sentence to the rationale which says something like "This spec does not require any client, db, collection API changes."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we might not need even that cliet/db/collection section.
@rstam do you think we could skip those no-changes sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
| snapshot reads are implemented by: | ||
|
|
||
| 1. Saving the ``atClusterTime`` returned by 5.0+ servers for the first find/aggregate/distinct operation in a | ||
| property ``snapshotTimestamp`` of the ``ClientSession`` object. Drivers MUST save the ``atClusterTime`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name snapshotTimestamp is inconsistent with the other "time" properties, clusterTime, and operationTime. Can we rename it to snapshotTime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. snapshotTimestamp is not part of public API now, and in fact in C# it's called snapshotTime internally for this consistency reason.
@ajdavis do you have any objections to snapshotTime name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay. This line gave me the impression that the snapshotTimestamp needs to be a public session property:
property
snapshotTimestampof theClientSessionobject
Can you clarify that it must be private? Perhaps just saying "...private snapshotTime property..."
I still think it's more consistent to name it "snapshotTime" though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
benjirewis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM given new changes to causal-consistency. Prose test looks good in libmongoc; just had a question about the set server parameter.
source/sessions/tests/README.rst
Outdated
| ====================== | ||
| Snapshot sessions tests require server of version 5.0 or higher and | ||
| replica set or a sharded cluster deployment. | ||
| The server ``minSnapshotHistoryWindowInSeconds`` parameter SHOULD be configured to match the test execution time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be set? Why is this important for the new tests, and by "test execution time", do you mean how long it takes the spec + prose tests to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to ensure that the snapshot the test reads still exists on the server.
IIRC the default history is 10 seconds, which was not sufficient at least for .Net tests.
So drivers should be aware about this parameter and set it as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was 10 seconds not long enough? Are there snapshot reads tests that take longer than 10 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not sufficient when running in Debug mode.
Apparently it's 5 seconds and .Net tests in release are much quicker. So no changes to EG needed (at least in .Net)
I'd still suggest to track this parameter for such slower dubug mode runs. Not sure what's the best wording though, maybe:
The server
minSnapshotHistoryWindowInSecondsparameter MAY be configured to match the tests execution time.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like:
"Note that by default the server only keeps the snapshot alive for 10 seconds. Debugging a driver in the middle of a snapshot session may lead to SnapshotTooOld errors. Drivers can work around this issue by increasing the server's minSnapshotHistoryWindowInSeconds parameter, for example:
client.admin.command('setParameter', 1, minSnapshotHistoryWindowInSeconds=60)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| snapshot reads are implemented by: | ||
|
|
||
| 1. Saving the ``atClusterTime`` returned by 5.0+ servers for the first find/aggregate/distinct operation in a | ||
| property ``snapshotTimestamp`` of the ``ClientSession`` object. Drivers MUST save the ``atClusterTime`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay. This line gave me the impression that the snapshotTimestamp needs to be a public session property:
property
snapshotTimestampof theClientSessionobject
Can you clarify that it must be private? Perhaps just saying "...private snapshotTime property..."
I still think it's more consistent to name it "snapshotTime" though.
|
|
||
| .. code:: typescript | ||
| options = new SessionOptions(isSnapshot = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename isSnapshot to snapshot? Including is in the name doesn't add any meaning, it's just a filler word. It's also inconsistent with other boolean options in the driver api. For example causalConsistency isn't called isCausalConsistency or isCausallyConsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point re. consistency. Renamed.
| @@ -0,0 +1,706 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we usually develop (and review) the yaml version because it's easier to read and supports comments and anchors to reduce the verbosity. Also the json file is auto-generated from the yaml.
source/sessions/tests/README.rst
Outdated
| Default snapshot history window on the server is 5 seconds. Running the test in debug mode, or in any other slow configuration | ||
| may lead to `SnapshotTooOld` errors. Drivers can work around this issue by increasing the server's `minSnapshotHistoryWindowInSeconds` parameter, for example: | ||
|
|
||
| .. code:: typescript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "python".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| - Setting both ``snapshot`` and ``causalConsistency`` is not allowed | ||
|
|
||
| * ``client.startSession(snapshot = true, causalConsistency = true)`` | ||
| * Assert that an error was raised by driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a unified test for this? It could run the client.startSession operation with snapshot = true, causalConsistency = true arguments and assert that we raise a client-side error with isClientError: true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a good idea, but session entities creation stage does not have error validation, and we don't have startSession operation. Introducing startSession operation for all drivers instead of prose test seemed overkill in this case.
Is there any simple way to achieve that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we don't have startSession operation.
The unified test runner allows us to call any method on the client including startSession. I don't think it's overkill but we don't need to do it in this PR. I'll open a drivers ticket for it.
Edit: or maybe this would require a "schemaVersion" bump in the unified test format. Still I think it's worthwhile to be able to call startSession within tests.
| 2. Passing that ``snapshotTime`` in the ``atClusterTime`` field of the ``readConcern`` field | ||
| for subsequent snapshot read operations (for find/aggregate/distinct commands). | ||
|
|
||
| Server Command Responses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the driver starts a snapshot session but the server does not return the atClusterTime field the driver needs to raise a client side error indicating that server does not support snapshot reads. If we don't do this then snapshot reads can silently fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
False alarm. The problem was that my implementation neglected to set readConcern.level to ‘snapshot’. When the driver correctly sends ‘snapshot’ old servers will reject the request with this error:
{'operationTime': Timestamp(1624572085, 12), 'ok': 0.0, 'errmsg': 'read concern level snapshot is only valid in a transaction', 'code': 72, 'codeName': 'InvalidOptions', '$clusterTime': {'clusterTime': Timestamp(1624572085, 12), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}}
Can we add a unified test case that confirms that the server returns an error when drivers try to use a snapshot session? It should run on 3.6-4.4.
source/sessions/tests/README.rst
Outdated
| ====================== | ||
| Snapshot sessions tests require server of version 5.0 or higher and | ||
| replica set or a sharded cluster deployment. | ||
| Default snapshot history window on the server is 5 seconds. Running the test in debug mode, or in any other slow configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default was updated to 5 minutes in SERVER-47855 so this section is probably not needed at all anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it's still better to track minSnapshotHistoryWindowInSeconds parameter, for any future updates or longer debugger sessions.
ShaneHarvey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the .yml versions of the two new tests?:
source/sessions/tests/unified/snapshot-sessions-not-supported-server-error.yml
source/sessions/tests/unified/snapshot-sessions.yml
ShaneHarvey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Rebased and merged here. |
| collection-management: HAS_AJV | ||
| @ajv test -s $(SCHEMA) -d "../../collection-management/tests/*.yml" --valid | ||
|
|
||
| sessions: HAS_AJV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: "sessions" should also have been added to the all target above. See: #1026 (comment)
No description provided.