@@ -12,29 +12,31 @@ Querying Your Data Lake
12
12
:depth: 2
13
13
:class: singlecol
14
14
15
+ You can use the MongoDB Query Language (MQL) on {+adl+} to query and
16
+ analyze data on your data store. {+adl+} supports most, but not all the
17
+ standard server commands. To learn more about the supported and
18
+ unsupported MongoDB server commands and aggregation pipleline stages,
19
+ see :ref:`data-lake-mql-support`.
20
+
15
21
You can run up to 30 simultaneous queries on your {+dl+} against:
16
22
17
23
- Data in your |s3| bucket.
18
24
- Documents in your MongoDB |service| cluster.
19
25
- Data in files hosted at publicly accessible |url|\s.
20
26
21
- .. seealso ::
27
+ .. see ::
22
28
23
29
- :doc:`How to Connect to Your Data Lake </tutorial/connect>`
24
- - :doc:`How to Run Queries Against Your Data Lake </tutorial/run-queries>`
30
+ - :doc:`How to Run Queries Against Your Data Lake
31
+ </tutorial/run-queries>`
25
32
26
33
.. _query-s3:
27
34
28
35
Querying Data on S3
29
36
-------------------
30
37
31
- You can use {+adl+} to query and analyze data on your cloud object store
32
- using MongoDB Query Language (MQL). {+adl+} supports most, but not all the
33
- standard server commands. To learn more about the supported and unsupported
34
- MongoDB server commands and aggregation pipleline stages, see
35
- :ref:`data-lake-mql-support`.
36
-
37
- To query data on |s3|, your {+dl+} storage :ref:`configuration
38
+ You can use {+adl+} to query and analyze data on your cloud object
39
+ store. To query data on |s3|, your {+dl+} storage :ref:`configuration
38
40
<datalake-configuration-file>` must contain settings that define:
39
41
40
42
- Your |s3| {+data-lake-store+}.
@@ -75,34 +77,37 @@ To query data on |s3|, your {+dl+} storage :ref:`configuration
75
77
]
76
78
}
77
79
78
- To learn more about these settings, see :ref:`datalake-configuration-file`.
80
+ To learn more about these settings, see
81
+ :ref:`datalake-configuration-file`.
82
+
83
+ {+dl+} creates the virtual databases and collections you specified in
84
+ your {+dl+} configuration for the data in your |s3| store. When you
85
+ :doc:`connect </tutorial/connect>` to your {+dl+} and :doc:`run queries
86
+ </tutorial/run-queries>`, {+dl+} processes your queries against the
87
+ data and returns the query results.
79
88
80
- {+dl+} creates the virtual databases and collections you specified in your
81
- {+dl+} configuration for the data in your |s3| store. When you :doc:`connect
82
- </tutorial/connect>` to your {+dl+} and :doc:`run queries
83
- </tutorial/run-queries>`, {+dl+} processes your queries against the data and
84
- returns the query results .
89
+ When :doc:`deploying </tutorial/deploy>` your {+dl+}, if you specified
90
+ an |s3| bucket with both read and write permissions or |aws| |s3|
91
+ :aws:`s3:PutObject </AmazonS3/latest/dev/using-with-s3-actions.html#using-with-s3-actions-related-to-objects>`
92
+ permission, you can also save your query results in your |s3| bucket
93
+ using :ref:`adl-out-stage` to |s3| .
85
94
86
- When :doc:`deploying </tutorial/deploy>` your {+dl+}, if you specified an |s3|
87
- bucket with both read and write permissions or |aws| |s3| :aws:`s3:PutObject
88
- </AmazonS3/latest/dev/using-with-s3-actions.html#using-with-s3-actions-related-to-objects>`
89
- permission, you can also save your query results in your |s3| bucket using
90
- :ref:`adl-out-stage` to |s3|.
95
+ If you successfully create or update an object on your |s3| data store,
96
+ {+dl+} returns the latest version of that object for any subsequent
97
+ read requests and all list operations of the objects also reflect the
98
+ changes. If your query contains multiple stages, each stage receives
99
+ the most recent data available from the data store as that stage is
100
+ processed.
91
101
92
102
.. _query-atlas:
93
103
94
104
Querying Data in Your |service| Cluster
95
105
---------------------------------------
96
106
97
- You can use {+adl+} to query and analyze data in your |service| cluster using
98
- MongoDB Query Language (MQL). {+adl+} supports most, but not all the standard
99
- server commands. To learn more about the supported and unsupported MongoDB
100
- server commands and aggregation pipleline stages, see
101
- :ref:`data-lake-mql-support`.
102
-
103
- To query data in your |service| cluster, your {+dl+} storage
104
- :ref:`configuration <datalake-configuration-file>` must contain settings that
105
- define:
107
+ You can use {+adl+} to query and analyze data in your |service|
108
+ cluster. To query data in your |service| cluster, your {+dl+} storage
109
+ :ref:`configuration <datalake-configuration-file>` must contain
110
+ settings that define:
106
111
107
112
- Your |service| {+data-lake-store+}.
108
113
- {+dl+} virtual databases and collections that map to your
@@ -140,19 +145,21 @@ define:
140
145
]
141
146
}
142
147
143
- To learn more about these settings, see :ref:`datalake-configuration-file`.
148
+ To learn more about these settings, see
149
+ :ref:`datalake-configuration-file`.
144
150
145
- {+dl+} automatically detects the file format and creates the virtual databases
146
- and collections you specified in your {+dl+} configuration. When you
147
- :doc:`connect </tutorial/connect>` to your {+dl+} and run queries, {+dl+}
148
- processes your queries against the data and returns the query results.
151
+ {+dl+} automatically detects the file format and creates the virtual
152
+ databases and collections you specified in your {+dl+} configuration.
153
+ When you :doc:`connect </tutorial/connect>` to your {+dl+} and run
154
+ queries, {+dl+} processes your queries against the data and returns the
155
+ query results.
149
156
150
- If you query a collection in {+adl+} that is mapped to only one |service|
151
- collection, {+adl+} acts as a proxy and forwards your query to |service|.
152
- When acting as a proxy, {+adl+} doesn't scan data into its virtual collection
153
- to proces the query thus improving performance and reducing cost. This
154
- optimization is not available for queries on {+adl+} collections that are
155
- mapped to multiple |service| collections.
157
+ If you query a collection in {+adl+} that is mapped to only one
158
+ |service| collection, {+adl+} acts as a proxy and forwards your query
159
+ to |service|. When acting as a proxy, {+adl+} doesn't scan data into
160
+ its virtual collection to proces the query thus improving performance
161
+ and reducing cost. This optimization is not available for queries on
162
+ {+adl+} collections that are mapped to multiple |service| collections.
156
163
157
164
.. example::
158
165
@@ -203,31 +210,37 @@ mapped to multiple |service| collections.
203
210
]
204
211
}
205
212
206
- For the above storage configuration, {+adl+} acts as a proxy for queries
207
- on ``foo`` collection and forwards the queries to |service|. This
208
- performance and cost optimization is not available for queries on ``barbaz``
209
- collection because ``barbaz`` is mapped to multiple |service| collections.
213
+ For the above storage configuration, {+adl+} acts as a proxy for
214
+ queries on ``foo`` collection and forwards the queries to |service|.
215
+ This performance and cost optimization is not available for queries
216
+ on ``barbaz`` collection because ``barbaz`` is mapped to multiple
217
+ |service| collections.
218
+
219
+ You can also save your query results in your |service| cluster using
220
+ :ref:`adl-out-stage` to |service|.
221
+
222
+ If you successfully create or update a document in your collection on
223
+ the |service| cluster, {+dl+} returns the latest version of that
224
+ document for any subsequent read requests and all list operations of
225
+ the collection also reflect the changes. If your query contains
226
+ multiple stages, each stage receives the most recent data available
227
+ from the data store as that stage is processed.
210
228
211
229
.. _query-http:
212
230
213
- Querying Data at a |http| |url|
214
- -------------------------------
231
+ Querying Data at a |http| or |https| |url|
232
+ ------------------------------------------
215
233
216
234
.. include:: /includes/extracts/fact-http-beta-message.rst
217
235
218
- You can use {+adl+} to query and analyze data in files hosted at publicly
219
- accessible |url|\s using MongoDB Query Language (MQL). To learn more about the
220
- supported data formats, see :ref:`data-lake-data-formats`. {+adl+} supports
221
- most, but not all the standard server commands. To learn more about the
222
- supported and unsupported MongoDB server commands and aggregation pipleline
223
- stages, see :ref:`data-lake-mql-support`.
224
-
225
- To query data in your publicly accessible |url|\s, your {+dl+} storage
226
- :ref:`configuration <datalake-configuration-file>` must contain settings that
227
- define:
236
+ You can use {+adl+} to query and analyze data in files hosted at
237
+ publicly accessible |url|\s. To query data in your publicly accessible
238
+ |url|\s, your {+dl+} storage :ref:`configuration
239
+ <datalake-configuration-file>` must contain settings that define:
228
240
229
241
- Your |http| {+data-lake-store+}.
230
- - {+dl+} virtual databases and collections that map to your {+data-lake-store+}.
242
+ - {+dl+} virtual databases and collections that map to your
243
+ {+data-lake-store+}.
231
244
232
245
.. example::
233
246
@@ -263,27 +276,29 @@ define:
263
276
]
264
277
}
265
278
266
- To learn more about these settings, see :ref:`datalake-configuration-file`.
279
+ To learn more about these settings, see
280
+ :ref:`datalake-configuration-file`.
267
281
268
- {+dl+} creates the virtual databases and collections you specified in your
269
- {+dl+} configuration for the data in your |url|. {+dl+} also creates one
270
- partition for each |url| in your collection. When you :doc:`connect
271
- </tutorial/connect>` to your {+dl+} and run queries, {+dl+} processes your
272
- queries against the data and returns the query results.
282
+ {+dl+} creates the virtual databases and collections you specified in
283
+ your {+dl+} configuration for the data in your |url|. {+dl+} also
284
+ creates one partition for each |url| in your collection. When you
285
+ :doc:`connect </tutorial/connect>` to your {+dl+} and run queries,
286
+ {+dl+} processes your queries against the data and returns the query
287
+ results.
273
288
274
289
.. _federated-queries:
275
290
276
291
Running Federated Queries
277
292
-------------------------
278
293
279
294
You can use {+adl+} to query and analyze a unified view of data in your
280
- |service| cluster, |s3| bucket, and at your |http| URL. For federated queries,
281
- your {+dl+} storage :ref:`configuration <datalake-configuration-file>` must
282
- contain the settings that define:
295
+ |service| cluster, |s3| bucket, and at your |http| URL. For federated
296
+ queries, your {+dl+} storage :ref:`configuration
297
+ <datalake-configuration-file>` must contain the settings that define:
283
298
284
299
- Your |s3|, |service|, and |http| {+data-lake-stores+}.
285
- - {+dl+} virtual databases and collections that map to your |s3|, |service|,
286
- and |http| {+data-lake-store+}\s.
300
+ - {+dl+} virtual databases and collections that map to your |s3|,
301
+ |service|, and |http| {+data-lake-store+}\s.
287
302
288
303
.. example::
289
304
@@ -342,12 +357,13 @@ contain the settings that define:
342
357
]
343
358
}
344
359
345
- To learn more about these settings, see :ref:`datalake-configuration-file`.
360
+ To learn more about these settings, see
361
+ :ref:`datalake-configuration-file`.
346
362
347
- When you :doc:`connect </tutorial/connect>` to your {+dl+} and run federated
348
- queries, {+dl+} combines data from your |service| cluster and |s3| bucket
349
- in virtual databases and collections and returns a union of data in the
350
- results.
363
+ When you :doc:`connect </tutorial/connect>` to your {+dl+} and run
364
+ federated queries, {+dl+} combines data from your |service| cluster,
365
+ |s3| bucket, and |http| store in virtual databases and collections and
366
+ returns a union of data in the results.
351
367
352
368
.. toctree::
353
369
:titlesonly:
0 commit comments