Skip to content

Commit 79106c3

Browse files
DOCSP-11143 doc for collStats (#47)
DOCSP-11143 minor fix
1 parent 9d0a83c commit 79106c3

File tree

3 files changed

+151
-6
lines changed

3 files changed

+151
-6
lines changed

source/reference/pipeline/aggr-pipeline.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ this section describe the alternate syntax that {+data-lake-short+}
1313
introduces for the following :manual:`pipeline
1414
</core/aggregation-pipeline/>` stages:
1515

16+
- :ref:`adl-collstats-stage`
1617
- :ref:`adl-lookup-stage`
1718
- :ref:`adl-out-stage`
1819

@@ -21,6 +22,7 @@ introduces for the following :manual:`pipeline
2122
.. toctree::
2223
:titlesonly:
2324

25+
/reference/pipeline/collstats
2426
/reference/pipeline/lookup-stage
2527
/reference/pipeline/out
2628
/reference/pipeline/sql
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
.. _adl-collstats-stage:
2+
3+
==============
4+
``$collStats``
5+
==============
6+
7+
.. default-domain:: mongodb
8+
9+
.. contents:: On this page
10+
:local:
11+
:backlinks: none
12+
:depth: 2
13+
:class: singlecol
14+
15+
``$collStats`` returns statistics for a given collection. ``$collstats`` must
16+
be the first stage in the aggregation pipeline. For more information,
17+
see :manual:`$collStats </reference/operator/aggregation/collStats/>`. In
18+
{+data-lake+}, ``$collStats`` can only be used to retrieve information about the
19+
partitions for a given collection or view.
20+
21+
.. _adl-collstats-syntax:
22+
23+
Syntax
24+
------
25+
26+
In {+adl+}, :manual:`$collStats </reference/operator/aggregation/collStats/>`
27+
accepts an empty document. It does not support any of the optional fields
28+
supported by the MongoDB server and returns an error if an unsupported option
29+
is specified.
30+
31+
.. code-block:: sh
32+
33+
db.<collection-name>|<view-name>.aggregate([{ "$collStats" : {} }])
34+
35+
.. _adl-collstats-output:
36+
37+
Output
38+
------
39+
40+
``$collStats`` returns the following fields in the document for each partition:
41+
42+
.. list-table::
43+
:header-rows: 1
44+
:widths: 20 10 70
45+
46+
* - Field
47+
- Type
48+
- Description
49+
50+
* - ``ns``
51+
- string
52+
- The namespace of the current collection or view in the format
53+
``[database].[collection|view]``.
54+
55+
* - ``partition``
56+
- document
57+
- The details about the partition such as the source, format, size, and
58+
:ref:`partition attributes <datalake-path-attribute-types>`, if any.
59+
60+
* - ``partition.format``
61+
- string
62+
- The format of the file. Value can be any of the
63+
:ref:`data-lake-data-formats` for data in |s3| bucket or ``MONGO`` for
64+
data in the |service| cluster.
65+
66+
* - ``partition.attributes``
67+
- document
68+
- The :ref:`partition attributes <datalake-path-attribute-types>` for this
69+
partition defined in the
70+
:datalakeconf:`~databases.[n].collections.[n].dataSources.[n].path` for
71+
|s3| partitions. An empty document indicates that there are no partition
72+
attributes in the partition's data source.
73+
74+
* - ``partition.size``
75+
- int
76+
- The size of the partition.
77+
78+
* - ``partition.source``
79+
- string
80+
- The source for the partition. The value can be one of the following:
81+
82+
- The path to the file on |s3|.
83+
- The cluster name for partitions on |service|.
84+
85+
.. _adl-collstats-egs:
86+
87+
Examples
88+
--------
89+
90+
The following example shows :manual:`$collStats
91+
</reference/operator/aggregation/collStats/>` syntax for retrieving the
92+
partitions from a ``s3Db.abc`` collection with 3 files in an |s3|
93+
{+data-lake-store+}:
94+
95+
.. code-block:: sh
96+
97+
use s3Db
98+
db.abc.aggregate([ {$collStats: {}} ])
99+
100+
The preceding command returns the following output:
101+
102+
.. code-block:: json
103+
:copyable: false
104+
105+
{ "ns" : "s3Db.abc", "partition" : { "format" : "JSON", "attributes" : { "year" : NumberLong(2018) }, "size" : 139, "source" : "s3://my-bucket/s3Db/abc/2018/1.json?delimiter=%2F&region=us-east-1" } }
106+
{ "ns" : "s3Db.abc", "partition" : { "format" : "JSON", "attributes" : { "year" : NumberLong(2017) }, "size" : 124, "source" : "s3://my-bucket/s3Db/abc/2017/1.json?delimiter=%2F&region=us-east-1" } }
107+
{ "ns" : "s3Db.abc", "partition" : { "format" : "JSON", "attributes" : { "year" : NumberLong(2017) }, "size" : 130, "source" : "s3://my-bucket/s3Db/abc/2017/2.json?delimiter=%2F&region=us-east-1" } }
108+
109+
The following example shows :manual:`$collStats
110+
</reference/operator/aggregation/collStats/>` syntax for retrieving the
111+
partitions from the ``atlasDb.sampleColl`` collection in the |service| cluster
112+
named ``mySandboxCluster``:
113+
114+
.. code-block:: sh
115+
116+
use atlasDb
117+
db.sampleColl.aggregate([ {$collStats: {}} ])
118+
119+
The preceding command returns the following output:
120+
121+
.. code-block:: json
122+
:copyable: false
123+
124+
{ "ns" : "atlasDb.sampleColl", "partition" : { "format" : "MONGO", "attributes" : { }, "size" : 94362191, "source" : "mySandboxCluster" } }
125+
126+
.. _adl-collstats-errors:
127+
128+
Errors
129+
------
130+
131+
An error similar to the following is returned if the :manual:`collStats
132+
</reference/operator/aggregation/collStats/>` argument document contains
133+
any of the options allowed by the MongoDB server but not by {+adl+}.
134+
135+
.. code-block:: json
136+
:copyable: false
137+
138+
{
139+
"ok" : 0,
140+
"errmsg" : "$collStats param 'latencyStats' is not valid for Data Lake, correlationID = 1622929884a47d16f4888a1c",
141+
"code" : 9,
142+
"codeName" : "FailedToParse"
143+
}

source/supported-unsupported/mql-support.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -124,17 +124,17 @@ Diagnostic Commands
124124
* ``indexSizes``
125125

126126
The following fields are added to the response. You can use
127-
these fields to verify what S3 partitions are being used to
127+
these fields to verify what partitions are being used to
128128
populate a collection.
129129

130130
**partitions.format**
131-
The file format of the S3 partition.
131+
The file format of the partition.
132132

133133
**partitions.attributes**
134-
The filtering attributes of the S3 partition.
134+
The filtering attributes of the partition.
135135

136-
**partitions.url**
137-
The URL of the backing data of the partition.
136+
**partitions.source**
137+
The |s3| URL or |service| cluster name, which is the backing data of the partition.
138138

139139
**partitions.size**
140140
The size, in bytes, of the partition.
@@ -154,7 +154,7 @@ Diagnostic Commands
154154
"partitions": [{
155155
"format": <file format>,
156156
"attributes": <filtering attributes>,
157-
"url": <url to the backing data of the partition>,
157+
"source": <S3 url or Atlas cluster name>,
158158
"size": <size, in bytes, of the partition>
159159
}, ...],
160160
"partitionCount": <number of partitions>,

0 commit comments

Comments
 (0)