Skip to content

Commit ba0a2fb

Browse files
DOCSP-12365 Support defaultFormat for HTTP stores (#75)
* DOCSP-12365 Support defaultFormat for HTTP stores * DOCSP-12365 fix for merge related errors DOCSP-12365 fixes DOCSP-12365 indentation error DOCSP-12365 indentation error * DOCSP-12365 updates for tech review feedback
1 parent 4f30ca0 commit ba0a2fb

File tree

5 files changed

+107
-58
lines changed

5 files changed

+107
-58
lines changed

source/includes/extracts-common-conf-params.yaml

Lines changed: 13 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,22 @@
11
ref: param-default-format
22
content: |
3-
4-
.. datalakeconf:: databases.[n].collections.[n].dataSources.[n].defaultFormat
5-
6-
*Optional.* Specifies the default format {+data-lake-short+} assumes
7-
if it encounters a file without an extension while searching the
8-
:datalakeconf:`~databases.[n].collections.[n].dataSources.[n].storeName`.
9-
10-
If omitted, {+data-lake-short+} attempts to detect the file type by
11-
processing a few bytes of the file.
3+
*Optional.* Default format that {+data-lake-short+} assumes
4+
if it encounters a file without an extension while searching the
5+
:datalakeconf:`~databases.[n].collections.[n].dataSources.[n].storeName`.
126
13-
.. note::
14-
15-
If your file format is ``CSV`` or ``TSV``, you must include a header
16-
row in your data. See :ref:`data-lake-csv-tsv-data` for more
17-
information.
18-
19-
The following values are valid for the ``defaultFormat`` field:
7+
The following values are valid for the ``defaultFormat`` field:
208
21-
``.json, .json.gz, .bson, .bson.gz, .avro, .avro.gz, .orc, .tsv, .tsv.gz,
22-
.csv, .csv.gz, .parquet``
9+
``.json, .json.gz, .bson, .bson.gz, .avro, .avro.gz, .orc, .tsv, .tsv.gz,
10+
.csv, .csv.gz, .parquet``
2311
24-
.. seealso::
12+
.. note::
13+
14+
If your file format is ``CSV`` or ``TSV``, you must include a header
15+
row in your data. See :ref:`data-lake-csv-tsv-data` for more
16+
information.
2517
26-
:ref:`data-lake-data-formats`
18+
If omitted, {+data-lake-short+} attempts to detect the file type by
19+
processing a few bytes of the file.
2720
---
2821
ref: param-max-wildcard-collections
2922
content: |

source/query/query-data-lake.txt

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,8 @@ define:
174174
{
175175
"name" : "<store-name>",
176176
"provider": "http",
177-
"urls": ["<url>"]
177+
"urls": ["<url>"],
178+
"defaultFormat" : "<string>"
178179
}
179180
],
180181
"databases" : [
@@ -186,7 +187,8 @@ define:
186187
"dataSources" : [
187188
{
188189
"storeName" : "<store-name>",
189-
"urls" : ["<url>"]
190+
"urls" : ["<url>"],
191+
"defaultFormat" : "<string>"
190192
}
191193
]
192194
}
@@ -209,14 +211,13 @@ Running Federated Queries
209211
-------------------------
210212

211213
You can use {+adl+} to query and analyze a unified view of data in your
212-
|service| cluster and |s3| bucket. For federated queries, your {+dl+}
213-
storage :ref:`configuration <datalake-configuration-file>` must contain the
214-
settings that define:
214+
|service| cluster, |s3| bucket, and at your |http| URL. For federated queries,
215+
your {+dl+} storage :ref:`configuration <datalake-configuration-file>` must
216+
contain the settings that define:
215217

216-
- Your |s3| {+data-lake-store+}.
217-
- Your |service| {+data-lake-store+}.
218-
- {+dl+} virtual databases and collections that map to your |s3| and |service|
219-
{+data-lake-store+}\s.
218+
- Your |s3|, |service|, and |http| {+data-lake-stores+}.
219+
- {+dl+} virtual databases and collections that map to your |s3|, |service|,
220+
and |http| {+data-lake-store+}\s.
220221

221222
.. example::
222223

@@ -237,6 +238,12 @@ settings that define:
237238
"bucket" : "<s3-bucket-name>",
238239
"prefix" : "<file-path-prefix>",
239240
"delimiter" : "<path-separator>"
241+
},
242+
{
243+
"name" : "<store-name>",
244+
"provider": "http",
245+
"urls": ["<url>"],
246+
"defaultFormat" : "<string>"
240247
}
241248
],
242249
"databases" : [
@@ -254,6 +261,11 @@ settings that define:
254261
{
255262
"storeName" : "<s3-store-name>",
256263
"path" : "<path-to-file>"
264+
},
265+
{
266+
"storeName" : "<store-name>",
267+
"urls" : ["<url>"],
268+
"defaultFormat" : "<string>"
257269
}
258270
]
259271
}

source/reference/cli/collections/create-collections-views.txt

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ Syntax
9797

9898
.. code-block:: sh
9999

100-
db.runCommand({ "create" : "<collection-name>", "dataSources" : [{ "storeName" : "<store-name>", "urls" : [ "<url>" ] }]})
100+
db.runCommand({ "create" : "<collection-name>", "dataSources" : [{ "storeName" : "<store-name>", "urls" : [ "<url>" ], "defaultFormat" : "<file-extension>" }]})
101101

102102
.. tab:: Views
103103
:tabid: views
@@ -249,6 +249,14 @@ Parameters
249249
``dataSources.storeName`` parameter.
250250
- yes
251251

252+
* - ``dataSources.defaultFormat``
253+
- string
254+
- .. include:: /includes/extracts/cli-param-default-format.rst
255+
256+
If included, the specified format only applies to the
257+
|url|\s in the ``dataSource``.
258+
- no
259+
252260
.. tab:: Views
253261
:tabid: views
254262

@@ -498,7 +506,7 @@ Examples
498506
.. code-block:: json
499507

500508
use sampleDB
501-
db.runCommand({ "create" : "http-collection", "dataSources" : [{ "storeName" : "http-store", "urls" : [ "https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json","https://atlas-data-lake.s3.amazonaws.com/json/sample_weatherdata/data.json" ] }]})
509+
db.runCommand({ "create" : "http-collection", "dataSources" : [{ "storeName" : "http-store", "urls" : [ "https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json","https://atlas-data-lake.s3.amazonaws.com/json/sample_weatherdata/data.json" ], "defaultFormat" : ".json" }]})
502510

503511
The previous command returns the following output:
504512

@@ -526,7 +534,8 @@ Examples
526534
"urls" : [
527535
"https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json",
528536
"https://atlas-data-lake.s3.amazonaws.com/json/sample_weatherdata/data.json"
529-
]
537+
],
538+
"defaultFormat" : ".json"
530539
}
531540
],
532541
"databases" : [
@@ -538,6 +547,7 @@ Examples
538547
"dataSources" : [
539548
{
540549
"storeName" : "http-store",
550+
"defaultFormat" : ".json",
541551
"urls" : [
542552
"https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json",
543553
"https://atlas-data-lake.s3.amazonaws.com/json/sample_weatherdata/data.json"

source/reference/cli/stores/create-store.txt

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ Syntax
6767

6868
.. code-block:: sh
6969

70-
db.runCommand({ createStore: <store-name>, provider: <storage-provider>, urls: [ <url> ] })
70+
db.runCommand({ createStore: <store-name>, provider: <storage-provider>, urls: [ <url> ], defaultFormat: <file-extension> })
7171

7272
.. _dl-create-store-cmd-params:
7373

@@ -188,6 +188,14 @@ Parameters
188188
{+data-lake-store+}.
189189
- no
190190

191+
* - ``defaultFormat``
192+
- string
193+
- .. include:: /includes/extracts/cli-param-default-format.rst
194+
195+
If included, the specified format only applies to the |url|\s in
196+
the store.
197+
- no
198+
191199
.. _dl-create-store-cmd-output:
192200

193201
Output
@@ -247,7 +255,8 @@ fails, see :ref:`dl-create-store-cmd-errors` for recommended solutions.
247255
"provider" : "<storage-provider>",
248256
"urls" : [
249257
"<url>"
250-
]
258+
],
259+
"defaultFormat: "<file-extension>"
251260
}
252261
}
253262

@@ -320,7 +329,7 @@ The following example uses the ``createStore`` command to create a new
320329
.. code-block:: json
321330

322331
use sample
323-
db.runCommand({ createStore: "myStore", provider: "http", urls: ["https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json"] })
332+
db.runCommand({ createStore: "myStore", provider: "http", urls: ["https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json"], defaultFormat: ".json" })
324333

325334
The previous command prints the following:
326335

@@ -334,7 +343,8 @@ The following example uses the ``createStore`` command to create a new
334343
"provider" : "http",
335344
"urls" : [
336345
"https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json"
337-
]
346+
],
347+
"defaultFormat" : ".json"
338348
}
339349
}
340350

source/reference/format/data-lake-configuration.txt

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,8 @@ Click on the tab below to learn more about the {+dl+} configuration for that dat
208208
"urls" : [
209209
"https://www.datacenter-hardware.com/data.json",
210210
"https://www.datacenter-software.com/data.json"
211-
]
211+
],
212+
"defaultFormat" : ".json"
212213
}
213214
],
214215
"databases" : [
@@ -221,8 +222,9 @@ Click on the tab below to learn more about the {+dl+} configuration for that dat
221222
{
222223
"storeName" : "httpStore",
223224
"urls" : [
224-
"https://www.datacenter-metrics.com/data.json"
225-
]
225+
"https://www.datacenter-metrics.com/data"
226+
],
227+
"defaultFormat" : ".json"
226228
}
227229
]
228230
}
@@ -275,7 +277,8 @@ to run federated queries.
275277
"urls" [
276278
"https://www.datacenter-hardware.com/data.json",
277279
"https://www.datacenter-software.com/data.json"
278-
]
280+
],
281+
"defaultFormat" : ".json"
279282
}
280283
],
281284
"databases" : [
@@ -298,7 +301,8 @@ to run federated queries.
298301
"storeName" : "httpStore",
299302
"urls": [
300303
"https://www.datacenter-metrics.com/data.json"
301-
]
304+
],
305+
"defaultFormat" : ".json"
302306
}
303307
]
304308
}
@@ -422,7 +426,8 @@ The {+data-lake-short+} configuration has the following format:
422426
{
423427
"name" : "<string>",
424428
"provider": "<string>",
425-
"urls": ["<string>"]
429+
"urls": ["<string>"],
430+
"defaultFormat" : "<string>"
426431
}
427432
],
428433
"databases" : [
@@ -434,7 +439,8 @@ The {+data-lake-short+} configuration has the following format:
434439
"dataSources" : [
435440
{
436441
"storeName" : "<string>",
437-
"urls" : ["<string>"]
442+
"urls" : ["<string>"],
443+
"defaultFormat" : "<string>"
438444
}
439445
]
440446
}
@@ -509,7 +515,8 @@ The {+data-lake-short+} configuration has the following format:
509515
{
510516
"name" : "<string>",
511517
"provider" : "<string>",
512-
"urls" : ["<string>"]
518+
"urls" : ["<string>"],
519+
"defaultFormat" : "<string>"
513520
}
514521
]
515522

@@ -661,6 +668,17 @@ The {+data-lake-short+} configuration has the following format:
661668
not generate any virtual {+adl+} databases or collections that
662669
reference the {+data-lake-store+}.
663670

671+
.. datalakeconf:: stores.[n].defaultFormat
672+
673+
.. include:: /includes/extracts/param-default-format.rst
674+
675+
The specified format only applies to the |url|\s specified in the
676+
:datalakeconf:`stores` object.
677+
678+
.. seealso::
679+
680+
:ref:`data-lake-data-formats`
681+
664682
.. _datalake-databases-reference:
665683

666684
``databases``
@@ -737,7 +755,8 @@ The {+data-lake-short+} configuration has the following format:
737755
"dataSources" : [
738756
{
739757
"storeName" : "<string>",
740-
"urls" : ["<string>"]
758+
"urls" : ["<string>"],
759+
"defaultFormat" : "<string>"
741760
}
742761
]
743762
}
@@ -857,7 +876,17 @@ The {+data-lake-short+} configuration has the following format:
857876

858877
.. include:: /includes/fact-path-delimiter.rst
859878

860-
.. include:: /includes/extracts/param-default-format.rst
879+
.. datalakeconf:: databases.[n].collections.[n].dataSources.[n].defaultFormat
880+
881+
.. include:: /includes/extracts/param-default-format.rst
882+
883+
.. seealso::
884+
885+
:ref:`data-lake-data-formats`
886+
887+
.. datalakeconf:: databases.[n].maxWildcardCollections
888+
889+
.. include:: /includes/extracts/param-max-wildcard-collections.rst
861890

862891
.. tab:: Atlas Cluster
863892
:tabid: atlas
@@ -888,21 +917,16 @@ The {+data-lake-short+} configuration has the following format:
888917
If omitted, {+dl+} uses the :datalakeconf:`~stores.[n].urls` in the
889918
specified :datalakeconf:`~databases.[n].collections.[n].dataSources.[n].storeName`.
890919

891-
.. tabs::
892-
:hidden:
893-
894-
.. tab:: S3
895-
:tabid: s3
896-
897-
.. datalakeconf:: databases.[n].maxWildcardCollections
920+
.. datalakeconf:: databases.[n].collections.[n].dataSources.[n].defaultFormat
921+
922+
.. include:: /includes/extracts/param-default-format.rst
898923

899-
.. include:: /includes/extracts/param-max-wildcard-collections.rst
924+
The specified format only applies to the |url|\s specified in the
925+
:datalakeconf:`databases.[n].collections.[n].dataSources` object.
900926

901-
.. tab:: Atlas
902-
:tabid: atlas
927+
.. seealso::
903928

904-
.. tab:: HTTP
905-
:tabid: http
929+
:ref:`data-lake-data-formats`
906930

907931
.. datalakeconf:: databases.[n].views
908932

0 commit comments

Comments
 (0)