Skip to content

Commit 07382ee

Browse files
DOCSP-14130 Guidance on consistency when querying an ADL (#126)
* DOCSP-14130 Guidance on consistency when querying an ADL * DOCSP-14130 updates for review feedback * DOCSP-14130 updates for copy review feedback
1 parent df1ab35 commit 07382ee

File tree

1 file changed

+89
-73
lines changed

1 file changed

+89
-73
lines changed

source/query/query-data-lake.txt

Lines changed: 89 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -12,29 +12,31 @@ Querying Your Data Lake
1212
:depth: 2
1313
:class: singlecol
1414

15+
You can use the MongoDB Query Language (MQL) on {+adl+} to query and
16+
analyze data on your data store. {+adl+} supports most, but not all the
17+
standard server commands. To learn more about the supported and
18+
unsupported MongoDB server commands and aggregation pipleline stages,
19+
see :ref:`data-lake-mql-support`.
20+
1521
You can run up to 30 simultaneous queries on your {+dl+} against:
1622

1723
- Data in your |s3| bucket.
1824
- Documents in your MongoDB |service| cluster.
1925
- Data in files hosted at publicly accessible |url|\s.
2026

21-
.. seealso::
27+
.. see::
2228

2329
- :doc:`How to Connect to Your Data Lake </tutorial/connect>`
24-
- :doc:`How to Run Queries Against Your Data Lake </tutorial/run-queries>`
30+
- :doc:`How to Run Queries Against Your Data Lake
31+
</tutorial/run-queries>`
2532

2633
.. _query-s3:
2734

2835
Querying Data on S3
2936
-------------------
3037

31-
You can use {+adl+} to query and analyze data on your cloud object store
32-
using MongoDB Query Language (MQL). {+adl+} supports most, but not all the
33-
standard server commands. To learn more about the supported and unsupported
34-
MongoDB server commands and aggregation pipleline stages, see
35-
:ref:`data-lake-mql-support`.
36-
37-
To query data on |s3|, your {+dl+} storage :ref:`configuration
38+
You can use {+adl+} to query and analyze data on your cloud object
39+
store. To query data on |s3|, your {+dl+} storage :ref:`configuration
3840
<datalake-configuration-file>` must contain settings that define:
3941

4042
- Your |s3| {+data-lake-store+}.
@@ -75,34 +77,37 @@ To query data on |s3|, your {+dl+} storage :ref:`configuration
7577
]
7678
}
7779

78-
To learn more about these settings, see :ref:`datalake-configuration-file`.
80+
To learn more about these settings, see
81+
:ref:`datalake-configuration-file`.
82+
83+
{+dl+} creates the virtual databases and collections you specified in
84+
your {+dl+} configuration for the data in your |s3| store. When you
85+
:doc:`connect </tutorial/connect>` to your {+dl+} and :doc:`run queries
86+
</tutorial/run-queries>`, {+dl+} processes your queries against the
87+
data and returns the query results.
7988

80-
{+dl+} creates the virtual databases and collections you specified in your
81-
{+dl+} configuration for the data in your |s3| store. When you :doc:`connect
82-
</tutorial/connect>` to your {+dl+} and :doc:`run queries
83-
</tutorial/run-queries>`, {+dl+} processes your queries against the data and
84-
returns the query results.
89+
When :doc:`deploying </tutorial/deploy>` your {+dl+}, if you specified
90+
an |s3| bucket with both read and write permissions or |aws| |s3|
91+
:aws:`s3:PutObject </AmazonS3/latest/dev/using-with-s3-actions.html#using-with-s3-actions-related-to-objects>`
92+
permission, you can also save your query results in your |s3| bucket
93+
using :ref:`adl-out-stage` to |s3|.
8594

86-
When :doc:`deploying </tutorial/deploy>` your {+dl+}, if you specified an |s3|
87-
bucket with both read and write permissions or |aws| |s3| :aws:`s3:PutObject
88-
</AmazonS3/latest/dev/using-with-s3-actions.html#using-with-s3-actions-related-to-objects>`
89-
permission, you can also save your query results in your |s3| bucket using
90-
:ref:`adl-out-stage` to |s3|.
95+
If you successfully create or update an object on your |s3| data store,
96+
{+dl+} returns the latest version of that object for any subsequent
97+
read requests and all list operations of the objects also reflect the
98+
changes. If your query contains multiple stages, each stage receives
99+
the most recent data available from the data store as that stage is
100+
processed.
91101

92102
.. _query-atlas:
93103

94104
Querying Data in Your |service| Cluster
95105
---------------------------------------
96106

97-
You can use {+adl+} to query and analyze data in your |service| cluster using
98-
MongoDB Query Language (MQL). {+adl+} supports most, but not all the standard
99-
server commands. To learn more about the supported and unsupported MongoDB
100-
server commands and aggregation pipleline stages, see
101-
:ref:`data-lake-mql-support`.
102-
103-
To query data in your |service| cluster, your {+dl+} storage
104-
:ref:`configuration <datalake-configuration-file>` must contain settings that
105-
define:
107+
You can use {+adl+} to query and analyze data in your |service|
108+
cluster. To query data in your |service| cluster, your {+dl+} storage
109+
:ref:`configuration <datalake-configuration-file>` must contain
110+
settings that define:
106111

107112
- Your |service| {+data-lake-store+}.
108113
- {+dl+} virtual databases and collections that map to your
@@ -140,19 +145,21 @@ define:
140145
]
141146
}
142147

143-
To learn more about these settings, see :ref:`datalake-configuration-file`.
148+
To learn more about these settings, see
149+
:ref:`datalake-configuration-file`.
144150

145-
{+dl+} automatically detects the file format and creates the virtual databases
146-
and collections you specified in your {+dl+} configuration. When you
147-
:doc:`connect </tutorial/connect>` to your {+dl+} and run queries, {+dl+}
148-
processes your queries against the data and returns the query results.
151+
{+dl+} automatically detects the file format and creates the virtual
152+
databases and collections you specified in your {+dl+} configuration.
153+
When you :doc:`connect </tutorial/connect>` to your {+dl+} and run
154+
queries, {+dl+} processes your queries against the data and returns the
155+
query results.
149156

150-
If you query a collection in {+adl+} that is mapped to only one |service|
151-
collection, {+adl+} acts as a proxy and forwards your query to |service|.
152-
When acting as a proxy, {+adl+} doesn't scan data into its virtual collection
153-
to proces the query thus improving performance and reducing cost. This
154-
optimization is not available for queries on {+adl+} collections that are
155-
mapped to multiple |service| collections.
157+
If you query a collection in {+adl+} that is mapped to only one
158+
|service| collection, {+adl+} acts as a proxy and forwards your query
159+
to |service|. When acting as a proxy, {+adl+} doesn't scan data into
160+
its virtual collection to proces the query thus improving performance
161+
and reducing cost. This optimization is not available for queries on
162+
{+adl+} collections that are mapped to multiple |service| collections.
156163

157164
.. example::
158165

@@ -203,31 +210,37 @@ mapped to multiple |service| collections.
203210
]
204211
}
205212

206-
For the above storage configuration, {+adl+} acts as a proxy for queries
207-
on ``foo`` collection and forwards the queries to |service|. This
208-
performance and cost optimization is not available for queries on ``barbaz``
209-
collection because ``barbaz`` is mapped to multiple |service| collections.
213+
For the above storage configuration, {+adl+} acts as a proxy for
214+
queries on ``foo`` collection and forwards the queries to |service|.
215+
This performance and cost optimization is not available for queries
216+
on ``barbaz`` collection because ``barbaz`` is mapped to multiple
217+
|service| collections.
218+
219+
You can also save your query results in your |service| cluster using
220+
:ref:`adl-out-stage` to |service|.
221+
222+
If you successfully create or update a document in your collection on
223+
the |service| cluster, {+dl+} returns the latest version of that
224+
document for any subsequent read requests and all list operations of
225+
the collection also reflect the changes. If your query contains
226+
multiple stages, each stage receives the most recent data available
227+
from the data store as that stage is processed.
210228

211229
.. _query-http:
212230

213-
Querying Data at a |http| |url|
214-
-------------------------------
231+
Querying Data at a |http| or |https| |url|
232+
------------------------------------------
215233

216234
.. include:: /includes/extracts/fact-http-beta-message.rst
217235

218-
You can use {+adl+} to query and analyze data in files hosted at publicly
219-
accessible |url|\s using MongoDB Query Language (MQL). To learn more about the
220-
supported data formats, see :ref:`data-lake-data-formats`. {+adl+} supports
221-
most, but not all the standard server commands. To learn more about the
222-
supported and unsupported MongoDB server commands and aggregation pipleline
223-
stages, see :ref:`data-lake-mql-support`.
224-
225-
To query data in your publicly accessible |url|\s, your {+dl+} storage
226-
:ref:`configuration <datalake-configuration-file>` must contain settings that
227-
define:
236+
You can use {+adl+} to query and analyze data in files hosted at
237+
publicly accessible |url|\s. To query data in your publicly accessible
238+
|url|\s, your {+dl+} storage :ref:`configuration
239+
<datalake-configuration-file>` must contain settings that define:
228240

229241
- Your |http| {+data-lake-store+}.
230-
- {+dl+} virtual databases and collections that map to your {+data-lake-store+}.
242+
- {+dl+} virtual databases and collections that map to your
243+
{+data-lake-store+}.
231244

232245
.. example::
233246

@@ -263,27 +276,29 @@ define:
263276
]
264277
}
265278

266-
To learn more about these settings, see :ref:`datalake-configuration-file`.
279+
To learn more about these settings, see
280+
:ref:`datalake-configuration-file`.
267281

268-
{+dl+} creates the virtual databases and collections you specified in your
269-
{+dl+} configuration for the data in your |url|. {+dl+} also creates one
270-
partition for each |url| in your collection. When you :doc:`connect
271-
</tutorial/connect>` to your {+dl+} and run queries, {+dl+} processes your
272-
queries against the data and returns the query results.
282+
{+dl+} creates the virtual databases and collections you specified in
283+
your {+dl+} configuration for the data in your |url|. {+dl+} also
284+
creates one partition for each |url| in your collection. When you
285+
:doc:`connect </tutorial/connect>` to your {+dl+} and run queries,
286+
{+dl+} processes your queries against the data and returns the query
287+
results.
273288

274289
.. _federated-queries:
275290

276291
Running Federated Queries
277292
-------------------------
278293

279294
You can use {+adl+} to query and analyze a unified view of data in your
280-
|service| cluster, |s3| bucket, and at your |http| URL. For federated queries,
281-
your {+dl+} storage :ref:`configuration <datalake-configuration-file>` must
282-
contain the settings that define:
295+
|service| cluster, |s3| bucket, and at your |http| URL. For federated
296+
queries, your {+dl+} storage :ref:`configuration
297+
<datalake-configuration-file>` must contain the settings that define:
283298

284299
- Your |s3|, |service|, and |http| {+data-lake-stores+}.
285-
- {+dl+} virtual databases and collections that map to your |s3|, |service|,
286-
and |http| {+data-lake-store+}\s.
300+
- {+dl+} virtual databases and collections that map to your |s3|,
301+
|service|, and |http| {+data-lake-store+}\s.
287302

288303
.. example::
289304

@@ -342,12 +357,13 @@ contain the settings that define:
342357
]
343358
}
344359

345-
To learn more about these settings, see :ref:`datalake-configuration-file`.
360+
To learn more about these settings, see
361+
:ref:`datalake-configuration-file`.
346362

347-
When you :doc:`connect </tutorial/connect>` to your {+dl+} and run federated
348-
queries, {+dl+} combines data from your |service| cluster and |s3| bucket
349-
in virtual databases and collections and returns a union of data in the
350-
results.
363+
When you :doc:`connect </tutorial/connect>` to your {+dl+} and run
364+
federated queries, {+dl+} combines data from your |service| cluster,
365+
|s3| bucket, and |http| store in virtual databases and collections and
366+
returns a union of data in the results.
351367

352368
.. toctree::
353369
:titlesonly:

0 commit comments

Comments
 (0)