Skip to content

Conversation

@BorisDog
Copy link
Contributor

@BorisDog BorisDog commented Jun 16, 2021

No description provided.

@BorisDog BorisDog requested review from ajdavis, dariakp, jyemin and rstam June 16, 2021 10:51
@@ -0,0 +1,706 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot-sessions.yml will be added after json test version will be finalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we usually develop (and review) the yaml version because it's easier to read and supports comments and anchors to reduce the verbosity. Also the json file is auto-generated from the yaml.

Copy link
Member

@ajdavis ajdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so happy that we're choosing the ClientSession API!

this property is false.

Snapshot reads and causal consistency are mutually exclusive. Therefore if ``isSnapshot`` is set to true,
``causalConsistency`` property is set to false. Client MUST throw an Error if both ``isSnapshot`` and ``causalConsistency`` are set to true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server permits a snapshot read to begin with afterClusterTime so it's causally after some previous operation. Within a snapshot read, of course, you can't be causally consistent. This API doesn't expose the "begin a snapshot read causally after an operation" feature. That might be fine, and we could enhance the API later if it turns out that users want that feature. I want to confirm it's our intention to omit the feature now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jyemin @ajdavis
Making sure that we want drivers forbid requesting both causalConsistency and isSnapshot (need to add a test for that).
I lean to raising an error in such case in driver, as otherwise due to drivers possible implementation differences it could be that some would send both atClusterTime and afterClusterTime, and some only one of the fields. Which will result in inconsistent behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with making it an error. It's easier to go from error to not error than the other way around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, libmongoc defines mongoc_session_opt_t with a field flags that is an enum mongoc_session_flag_t. I'll add another variant for isSnapshot, and by nature of the enum, there will be no way to specify causalConsistency and isSnapshot simultaneously. I'll still throw an error if users try to, but worth noting that if other drivers use a similar enum, users won't be able to do this anyways.

* assert that the command does not have an ``atClusterTime``.

2. | Subsequent snapshot reads on a ``ClientSession`` should read from the snapshot of the first read in that session.
| Test SHOULD introduce two variations one for primaries and additional for secondary reads
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jyemin currently with test against secondaries only, should we duplicate the tests to run on primary as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be more a test of the server than of drivers, so I don't think it's necessary. What do you think, @ajdavis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is tested on the server in snapshot_reads.js, no need for drivers to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Snapshot Reads Specification
================================

:Spec Title: Snapshot Reads Specification (See the registry of specs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to call out a registry of specs here

Suggested change
:Spec Title: Snapshot Reads Specification (See the registry of specs)
:Spec Title: Snapshot Reads Specification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's the convention here as this referral appears in causal-consistency.rst and driver-sessions.rst.
Removed for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those are leftovers from the templates. I'll give the specs a quick look to clean these up. Thanks for pointing out other instances.

@BorisDog BorisDog requested a review from alcaeus June 17, 2021 10:57
specifications:

- `Causal Consistency <../causal-consistency/causal-consistency.rst#sessionoptions-changes>`__
- `Snapshot Reads <../sessions/snapshot-sessions.rst#sessionoptions-changes>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention this, but you should add a make target for session tests in the unified test runner Makefile. This ensures all unified session tests are checked against the schema during the CI build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Added, and will verify when yml tests are added.

{
"minServerVersion": "5.0",
"topologies": [
"sharded-replicaset"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be:

topologies: [ "replicaset", "sharded-replicaset" ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

"client": "client0",
"sessionOptions": {
"isSnapshot": true,
"snapshotTimestamp": 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what your intention is here. In the spec this option is not defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked this test as optional, for drivers that can alter snapshotTimestamp field internally.

Not sure what's the best approach here: Mandate drivers to provide functionality to set snapshotTimestamp or extract this test as optional prose test.

"commandStartedEvent": {
"command": {
"find": "collection0",
"readConcern": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add commandStartedEvent assertions to all the tests, demonstrating that atClusterTime is included when appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's problematic as we don't have a way to obtain actual atClusterTime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with $$exists.

}
},
{
"session": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really a need for all these session objects? Each test runs in isolation, so a new session instance is created for each test. So there is no re-use issues between tests. I suspect we can have just one session with an isSnapshot: true option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, removed.

BorisDog added 2 commits June 17, 2021 18:03
- atClusterTime in Distinct command result documented
@BorisDog BorisDog requested a review from jyemin June 17, 2021 19:49
]
},
{
"description": "First snapshot read does send atClusterTime",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be does not

.. code:: typescript
interface ClientSession {
Optional<BsonTimestamp> snapshotTimestamp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think about it, it's not clear why we need this to be a public property if there is no way to initialize a new ClientSession with its value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right, currently there is not need for snapshotTimestamp in public API. Removed.

As ClientSession.snapshotTimestamp is likely to be introduced later, drivers still need this field as part of ClientSession enitity as mentioned here.

2. aggregate
3. distinct

Snapshot read commands errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced its necessary to specify this, since retryable reads are only enabled via an allow-list anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed.


Drivers MUST NOT retry errors in SnapshotError category for snapshot reads if ``atClusterTime`` is supplied.

Test Plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check with others, but I think this whole section should move to the README.rst in the tests subdirectory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; seems like the normal place for prose tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

replica set or a sharded cluster supports snapshot reads.
The server ``minSnapshotHistoryWindowInSeconds`` parameter SHOULD be configured to match the test execution time.

1. | The first read in a snapshot session must not send ``atClusterTime``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only needs to be included for prose tests. You don't have to duplicate what's already specified in YAML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@BorisDog BorisDog requested review from benjirewis and jyemin June 23, 2021 09:09
Copy link
Contributor

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring tests with distinct (see my other comment), spec tests seem to pass in C. Will implement prose test and re-review.

"id": "session0",
"client": "client0",
"sessionOptions": {
"isSnapshot": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify "causalConsistency": false here and on session1? causalConsistency is true by default which will cause an error with "isSnapshot": true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default causalConsistency value is "not supplied". So in that case it should be translated to !isSnapshot (now it's translated to true)?
I think drivers should account for this case, as we don't want mandate users to always specify "causalConsistency": false if snapshot session is requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that specifying causalConsistency = false just for isSnapshot = true should not be a burden on the users.

But if a user (or a test) sets isSnapshot to true, drivers should set causalConsistency to false? I suppose I'm confused about when drivers should explicitly error because of simultaneously true isSnapshot and causalConsistency values, and when drivers should quietly switch one or the other off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this point is not clear.
I suggest that causalConsistency defaults to true only if no snapshot where requested.

]
},
{
"description": "Distinct operation with snapshot",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot run this test and Mixed operation with snapshot in libmongoc since we don't have a dedicated distinct helper, so I can't verify that these work, but they look good to me.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

specification defines how sessions are used to implement snapshot reads.

Snapshot reads
Reads with readconcern level ``snapshot`` that occur outside of transactions on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readconcern -> read concern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

MongoClient changes
===================

There are no API changes to ``MongoClient`` to support snapshot reads.
Copy link
Member

@ShaneHarvey ShaneHarvey Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing to have "MongoClient/MongoDatabase/MongoCollection changes" sections when there aren't any changes at all. Can we remove those sections? We could add one sentence to the rationale which says something like "This spec does not require any client, db, collection API changes."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we might not need even that cliet/db/collection section.
@rstam do you think we could skip those no-changes sections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

snapshot reads are implemented by:

1. Saving the ``atClusterTime`` returned by 5.0+ servers for the first find/aggregate/distinct operation in a
property ``snapshotTimestamp`` of the ``ClientSession`` object. Drivers MUST save the ``atClusterTime``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name snapshotTimestamp is inconsistent with the other "time" properties, clusterTime, and operationTime. Can we rename it to snapshotTime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. snapshotTimestamp is not part of public API now, and in fact in C# it's called snapshotTime internally for this consistency reason.
@ajdavis do you have any objections to snapshotTime name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay. This line gave me the impression that the snapshotTimestamp needs to be a public session property:

property snapshotTimestamp of the ClientSession object

Can you clarify that it must be private? Perhaps just saying "...private snapshotTime property..."

I still think it's more consistent to name it "snapshotTime" though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM given new changes to causal-consistency. Prose test looks good in libmongoc; just had a question about the set server parameter.

======================
Snapshot sessions tests require server of version 5.0 or higher and
replica set or a sharded cluster deployment.
The server ``minSnapshotHistoryWindowInSeconds`` parameter SHOULD be configured to match the test execution time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be set? Why is this important for the new tests, and by "test execution time", do you mean how long it takes the spec + prose tests to run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to ensure that the snapshot the test reads still exists on the server.
IIRC the default history is 10 seconds, which was not sufficient at least for .Net tests.
So drivers should be aware about this parameter and set it as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was 10 seconds not long enough? Are there snapshot reads tests that take longer than 10 seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not sufficient when running in Debug mode.
Apparently it's 5 seconds and .Net tests in release are much quicker. So no changes to EG needed (at least in .Net)

I'd still suggest to track this parameter for such slower dubug mode runs. Not sure what's the best wording though, maybe:

The server minSnapshotHistoryWindowInSeconds parameter MAY be configured to match the tests execution time.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like:
"Note that by default the server only keeps the snapshot alive for 10 seconds. Debugging a driver in the middle of a snapshot session may lead to SnapshotTooOld errors. Drivers can work around this issue by increasing the server's minSnapshotHistoryWindowInSeconds parameter, for example:

client.admin.command('setParameter', 1, minSnapshotHistoryWindowInSeconds=60)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

snapshot reads are implemented by:

1. Saving the ``atClusterTime`` returned by 5.0+ servers for the first find/aggregate/distinct operation in a
property ``snapshotTimestamp`` of the ``ClientSession`` object. Drivers MUST save the ``atClusterTime``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay. This line gave me the impression that the snapshotTimestamp needs to be a public session property:

property snapshotTimestamp of the ClientSession object

Can you clarify that it must be private? Perhaps just saying "...private snapshotTime property..."

I still think it's more consistent to name it "snapshotTime" though.


.. code:: typescript
options = new SessionOptions(isSnapshot = true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename isSnapshot to snapshot? Including is in the name doesn't add any meaning, it's just a filler word. It's also inconsistent with other boolean options in the driver api. For example causalConsistency isn't called isCausalConsistency or isCausallyConsistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point re. consistency. Renamed.

@@ -0,0 +1,706 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we usually develop (and review) the yaml version because it's easier to read and supports comments and anchors to reduce the verbosity. Also the json file is auto-generated from the yaml.

Default snapshot history window on the server is 5 seconds. Running the test in debug mode, or in any other slow configuration
may lead to `SnapshotTooOld` errors. Drivers can work around this issue by increasing the server's `minSnapshotHistoryWindowInSeconds` parameter, for example:

.. code:: typescript
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "python".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

- Setting both ``snapshot`` and ``causalConsistency`` is not allowed

* ``client.startSession(snapshot = true, causalConsistency = true)``
* Assert that an error was raised by driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unified test for this? It could run the client.startSession operation with snapshot = true, causalConsistency = true arguments and assert that we raise a client-side error with isClientError: true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a good idea, but session entities creation stage does not have error validation, and we don't have startSession operation. Introducing startSession operation for all drivers instead of prose test seemed overkill in this case.
Is there any simple way to achieve that?

Copy link
Member

@ShaneHarvey ShaneHarvey Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we don't have startSession operation.

The unified test runner allows us to call any method on the client including startSession. I don't think it's overkill but we don't need to do it in this PR. I'll open a drivers ticket for it.

Edit: or maybe this would require a "schemaVersion" bump in the unified test format. Still I think it's worthwhile to be able to call startSession within tests.

2. Passing that ``snapshotTime`` in the ``atClusterTime`` field of the ``readConcern`` field
for subsequent snapshot read operations (for find/aggregate/distinct commands).

Server Command Responses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the driver starts a snapshot session but the server does not return the atClusterTime field the driver needs to raise a client side error indicating that server does not support snapshot reads. If we don't do this then snapshot reads can silently fail.

Copy link
Member

@ShaneHarvey ShaneHarvey Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False alarm. The problem was that my implementation neglected to set readConcern.level to ‘snapshot’. When the driver correctly sends ‘snapshot’ old servers will reject the request with this error:

{'operationTime': Timestamp(1624572085, 12), 'ok': 0.0, 'errmsg': 'read concern level snapshot is only valid in a transaction', 'code': 72, 'codeName': 'InvalidOptions', '$clusterTime': {'clusterTime': Timestamp(1624572085, 12), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}}

Can we add a unified test case that confirms that the server returns an error when drivers try to use a snapshot session? It should run on 3.6-4.4.

======================
Snapshot sessions tests require server of version 5.0 or higher and
replica set or a sharded cluster deployment.
Default snapshot history window on the server is 5 seconds. Running the test in debug mode, or in any other slow configuration
Copy link
Member

@ShaneHarvey ShaneHarvey Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default was updated to 5 minutes in SERVER-47855 so this section is probably not needed at all anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's still better to track minSnapshotHistoryWindowInSeconds parameter, for any future updates or longer debugger sessions.

@BorisDog BorisDog requested a review from ShaneHarvey June 25, 2021 17:16
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the .yml versions of the two new tests?:

 source/sessions/tests/unified/snapshot-sessions-not-supported-server-error.yml
 source/sessions/tests/unified/snapshot-sessions.yml

@BorisDog BorisDog requested a review from ShaneHarvey June 25, 2021 21:07
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BorisDog
Copy link
Contributor Author

Rebased and merged here.

collection-management: HAS_AJV
@ajv test -s $(SCHEMA) -d "../../collection-management/tests/*.yml" --valid

sessions: HAS_AJV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: "sessions" should also have been added to the all target above. See: #1026 (comment)

@BorisDog BorisDog deleted the DRIVERS-1607 branch April 1, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants