Skip to content

Commit 8551574

Browse files
DOCSP-10755 doc for padded numeric values for attribute types (#34)
* DOCSP-10755 doc for padded numeric values for attribute types
1 parent d84954b commit 8551574

File tree

2 files changed

+66
-2
lines changed

2 files changed

+66
-2
lines changed

source/reference/examples/path-syntax-examples.txt

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,10 @@ example filename include the following fields:
261261
Queries that include *all* generated fields can be targeted to only
262262
those files that match the specified values.
263263

264+
.. seealso::
265+
266+
:ref:`parse-padded-numeric-values`
267+
264268
.. _datalake-advanced-path-parse-range:
265269

266270
Identify Ranges of Queryable Data from Filename
@@ -312,6 +316,10 @@ and ``max`` date.
312316
undesired behavior. {+data-lake-short+} does *not* perform any
313317
validation that the underlying data conforms to this constraint.
314318

319+
.. seealso::
320+
321+
:ref:`parse-padded-numeric-values`
322+
315323
.. _datalake-advanced-parse-nested-fields:
316324

317325
Identify Nested Fields from Filename
@@ -507,14 +515,18 @@ a small set of filtered files:
507515
"dataSources" : [
508516
{
509517
"storeName" : "accountingArchive",
510-
"path" : "/invoices/{invoiceNumber string}/{year int}/{month int}/{day int}/*"
518+
"path" : "/invoices/{invoiceNumber string}/{year int}/{month int:\\d{2}}/{day int:\\d{2}}/*"
511519
}
512520
]
513521
}
514522
]
515523
}
516524
}
517525

526+
.. seealso::
527+
528+
:ref:`parse-padded-numeric-values`
529+
518530
.. _datalake-advanced-path-generate-collection:
519531

520532
Generate Dynamic Collection Names from File Path

source/supported-unsupported/supported-partition-attributes.txt

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ Supported Partition Attribute Types
66

77
.. default-domain:: mongodb
88

9+
.. contents:: On this page
10+
:local:
11+
:backlinks: none
12+
:depth: 2
13+
:class: singlecol
14+
915
The following table lists the supported data types for partition attributes and
1016
an example :datalakeconf:`~databases.[n].collections.[n].dataSources.[n].path`
1117
for each data type:
@@ -35,6 +41,10 @@ for each data type:
3541
In the above ``path`` examples, ``phone`` is interpreted
3642
as a string.
3743

44+
.. seealso::
45+
46+
:ref:`parse-null-values`
47+
3848
* - ``int``
3949
- Parses the filename as an integer.
4050
- filename: ``/zipcodes/90210.json``
@@ -44,6 +54,10 @@ for each data type:
4454
In the above example, ``zipcode`` is interpreted
4555
as an integer.
4656

57+
.. seealso::
58+
59+
:ref:`parse-padded-numeric-values`
60+
4761
* - ``isodate``
4862
- Parses the filename in `RFC 3339 <https://tools.ietf.org/html/rfc3339>`_
4963
format as an ISO-8601 format date.
@@ -89,6 +103,10 @@ for each data type:
89103
In the above example, ``startTimestamp`` is interpreted
90104
as a Unix timestamp in seconds.
91105

106+
.. seealso::
107+
108+
:ref:`parse-padded-numeric-values`
109+
92110
* - ``epoch_millis``
93111
- Parses the filename as a Unix timestamp in milliseconds.
94112
- filename: ``/metrics/1549046112000.json``
@@ -98,6 +116,10 @@ for each data type:
98116
In the above example, ``startTimestamp`` is interpreted
99117
as a Unix timestamp in milliseconds.
100118

119+
.. seealso::
120+
121+
:ref:`parse-padded-numeric-values`
122+
101123
* - ``objectid``
102124
- Parses the filename as an
103125
:manual:`ObjectId </reference/method/ObjectId/>`.
@@ -123,7 +145,9 @@ for each data type:
123145
{+adl+} supports the `Package Syntax
124146
<https://golang.org/pkg/regexp/syntax/>`__ for regular expressions
125147
in the path to the filename.
126-
148+
149+
.. _parse-null-values:
150+
127151
Parsing Null Values from Filenames
128152
----------------------------------
129153

@@ -142,3 +166,31 @@ attribute types except ``string``. For example, consider the following |s3|
142166
For the path ``/records/{month string}/*``, {+dl+} does not add any
143167
computed fields for the ``month`` attribute to documents generated
144168
from the third record in the above store.
169+
170+
.. _parse-padded-numeric-values:
171+
172+
Parsing Padded Numbers from Filenames
173+
-------------------------------------
174+
175+
For attribute types like ``int``, ``epoch_millis``, and ``epoch_secs``,
176+
if you want {+dl+} to correctly parse numeric values that are padded
177+
with leading zeros in the path to the file, specify the number
178+
of digits in the padded value using regular expressions. For example,
179+
consider a |s3| store with the following files:
180+
181+
.. code-block:: text
182+
:copyable: false
183+
184+
|--users
185+
|--001.json
186+
|--002.json
187+
...
188+
189+
The following ``path`` syntax uses a regular expression with the
190+
``int`` attribute type to specify the number of digits in the
191+
filename:
192+
193+
.. code-block:: sh
194+
:copyable: false
195+
196+
/users/{user_id int:\\d{3}}

0 commit comments

Comments
 (0)