Skip to content

Commit b8fb4ae

Browse files
DOCSP-14317 Add SQL schema generation for wildcard collections (#124)
* DOCSP-14317 Add SQL schema generation for wildcard collections * DOCSP-14317 updates for review feedback
1 parent 07382ee commit b8fb4ae

File tree

4 files changed

+97
-79
lines changed

4 files changed

+97
-79
lines changed

source/admin/query-with-sql.txt

Lines changed: 47 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -10,54 +10,64 @@ Querying with SQL
1010

1111
{+adl+} supports SQL format queries through the :ref:`JDBC driver
1212
<jdbc-driver>` for {+adl+} and using the :ref:`adl-sql-stage`
13-
:ref:`aggregation pipeline <adl-aggregation-pipeline>` stage. To support
14-
SQL format queries, {+adl+} automatically creates a |json| schema that maps
15-
to a relational schema of columns, tables, and databases for all new
16-
collections, except wildcard (``*``) collections, and views in the {+dl+}
17-
storage configuration. To learn more about the schema, see
13+
:ref:`aggregation pipeline <adl-aggregation-pipeline>` stage. To
14+
support SQL format queries, {+adl+} automatically creates a |json|
15+
schema that maps to a relational schema of columns, tables, and
16+
databases for all new collections and views in the {+dl+} storage
17+
configuration. To learn more about the schema, see
1818
:ref:`sql-schema-format`.
1919

20-
{+dl+} automatically generates a schema for a new non-wildcard collection or
21-
view in the storage configuration when you:
20+
{+dl+} automatically generates a schema for a collection or view in the
21+
storage configuration when you:
2222

2323
- :ref:`Create <dl-create-collection-views-cmd>` the collection or view
2424
in the storage configuration.
2525

26-
- :ref:`Rename <dl-rename-collection-cmd>` a collection or view that does not
27-
already have a schema. If you rename a collection or view that already has
28-
a schema, the schema is also renamed. {+dl+} does not generate a new schema
29-
for a renamed collection or view if it already exists.
26+
- :ref:`Rename <dl-rename-collection-cmd>` a collection or view that
27+
does not already have a schema. If you rename a collection or view
28+
that already has a schema, the schema is also renamed. {+dl+} does
29+
not generate a new schema for a renamed collection or view if it
30+
already exists.
3031

3132
- :ref:`Set <datalake-setstorageconfig>` the storage configuration.
3233

34+
In addition, for wildcard (``*``) collections, {+dl+} generates a
35+
schema when it discovers the collections in the :ref:`namespace catalog
36+
<manage-ns-catalog-cli>` for the wildcard (``*``) collections.
37+
3338
.. include:: /includes/fact-schema-for-existing-collections.rst
3439

35-
By default, {+dl+} samples data from only one randomly selected document in
36-
your non-wildcard collection or view to generate a |json| schema. If your
37-
collection or view contains polymorphic data, you can provide a larger
38-
sampling size to {+dl+} to generate a new schema or you can manually
39-
construct and set the schema.
40-
41-
You can manually generate schemas for all collections and views using the
42-
:ref:`sqlgenerateschema-cmd` command, set or update the schema for your
43-
collections or views using the :ref:`sqlsetschema-cmd` command, and view
44-
the stored schema using the :ref:`sqlgetschema-cmd` command.
45-
46-
Once the SQL schema is set up, you can query your {+adl+} collections or views
47-
through the :ref:`JDBC driver <jdbc-driver>` for {+adl+} and using the
48-
:ref:`adl-sql-stage` :ref:`aggregation pipeline <adl-aggregation-pipeline>`
49-
stage.
50-
51-
You can manually delete a schema for a collection or view by running the
52-
:ref:`sqlsetschema-cmd` command with an empty schema document. {+dl+}
53-
automatically removes the schema for a collection or view when you:
54-
55-
- :ref:`Drop the collection or view <dl-drop-collection-views-cmd>` from the
56-
storage configuration.
57-
- :ref:`Modify the storage configuration <datalake-setstorageconfig>` to
58-
remove the collection or view from the storage configuration.
59-
- :ref:`Drop the database <dl-drop-database-cmd>` that contains the collection
60-
or view from the storage configuration.
40+
By default, {+dl+} samples data from only one randomly selected
41+
document in your collection or view to generate a |json| schema. If
42+
your collection or view contains polymorphic data, you can provide a
43+
larger sampling size to {+dl+} to generate a new schema or you can
44+
manually construct and set the schema.
45+
46+
You can manually generate schemas for all collections and views using
47+
the :ref:`sqlgenerateschema-cmd` command, set or update the schema for
48+
your collections or views using the :ref:`sqlsetschema-cmd` command,
49+
and view the stored schema using the :ref:`sqlgetschema-cmd` command.
50+
51+
Once the SQL schema is set up, you can query your {+adl+} collections
52+
or views through the :ref:`JDBC driver <jdbc-driver>` for {+adl+} and
53+
using the :ref:`adl-sql-stage` :ref:`aggregation pipeline
54+
<adl-aggregation-pipeline>` stage.
55+
56+
You can manually delete a schema for a collection or view by running
57+
the :ref:`sqlsetschema-cmd` command with an empty schema document.
58+
{+dl+} automatically removes the schema for a collection or view when
59+
you:
60+
61+
- :ref:`Drop the collection or view <dl-drop-collection-views-cmd>`
62+
from the storage configuration.
63+
- :ref:`Modify the storage configuration <datalake-setstorageconfig>`
64+
to remove the collection or view from the storage configuration.
65+
- :ref:`Drop the database <dl-drop-database-cmd>` that contains the
66+
collection or view from the storage configuration.
67+
68+
In addition, for a wildcard (``*``) collection, {+dl+} deletes the
69+
schema when it discovers that the collection has been removed from
70+
the :ref:`namespace catalog <manage-ns-catalog-cli>`.
6171

6272
.. toctree::
6373
:titlesonly:
Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
.. note::
22

33
{+dl+} automatically generates schemas for only new collections and
4-
views in the :ref:`storage configuration <datalake-configuration-file>`.
5-
Existing :manual:`namespaces </reference/limits/#faq-dev-namespace>`
6-
will not have auto-generated schemas. If you want {+dl+} to automatically
7-
generate schemas for your existing non-wildcard collections and views in
8-
the storage configuration, :ref:`remove <datalake-setstorageconfig>` the :datalakeconf:`databases` in your {+dl+} storage configuration and then
9-
:ref:`update <datalake-setstorageconfig>` your {+dl+} storage
4+
views in the :ref:`storage configuration
5+
<datalake-configuration-file>` or :ref:`namespace catalog
6+
<manage-ns-catalog-cli>`. Existing :manual:`namespaces
7+
</reference/limits/#faq-dev-namespace>` will not have auto-generated
8+
schemas. If you want {+dl+} to automatically generate schemas for
9+
your existing collections and views in the storage configuration,
10+
:ref:`remove <datalake-setstorageconfig>` the
11+
:datalakeconf:`databases` in your {+dl+} storage configuration and
12+
then :ref:`update <datalake-setstorageconfig>` your {+dl+} storage
1013
configuration with the old configuration.

source/reference/cli/sql/sqlgenerateschema.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ more about the fields in the output, see
142142
:ref:`sqlgenerateschema-output`.
143143

144144
.. code-block:: json
145+
:copyable: false
145146

146147
{
147148
"ok" : 1,
@@ -218,6 +219,7 @@ more about the fields in the output, see
218219
:ref:`sqlgenerateschema-output`.
219220

220221
.. code-block:: json
222+
:copyable: false
221223

222224
{
223225
"ok" : 1,

source/reference/format/sql-schema-format.txt

Lines changed: 39 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -19,29 +19,31 @@ SQL Schema Format
1919
Overview
2020
--------
2121

22-
To query data using SQL, {+adl+} needs to be aware of the schema for that
23-
data. {+dl+} automatically generates a |json| schema for all new collections,
24-
except wildcard collections, and views. To learn more about auto-generated
22+
To query data using SQL, {+adl+} needs to be aware of the schema for
23+
that data. {+dl+} automatically generates a |json| schema for all new
24+
collections and views. To learn more about auto-generated
2525
schemas, see :ref:`query-with-sql`.
2626

27-
By default, {+dl+} samples data from a single document in your collection
28-
or view to generate a |json| schema. If your collection or view contains
29-
polymorphic data, you can provide a larger sampling size to {+dl+} when
30-
manually :ref:`generating <sqlgenerateschema-cmd>` the schema.
31-
32-
{+dl+} maps |json| schemas to relational schemas. MongoDB's :manual:`flexible
33-
schema model </core/data-modeling-introduction/>` allows a given field to
34-
contain data of multiple types, while relational databases restrict columns to
35-
a single data type. The following sections describe the fields supported in
36-
the |json| schema, the |bson| types that are supported in a relational schema,
37-
and how {+dl+} resolves conflicts for polymorphic fields when mapped to
27+
By default, {+dl+} samples data from a single document in your
28+
collection or view to generate a |json| schema. If your collection or
29+
view contains polymorphic data, you can provide a larger sampling size
30+
to {+dl+} when manually :ref:`generating <sqlgenerateschema-cmd>` the
31+
schema.
32+
33+
{+dl+} maps |json| schemas to relational schemas. MongoDB's
34+
:manual:`flexible schema model </core/data-modeling-introduction/>`
35+
allows a given field to contain data of multiple types, while
36+
relational databases restrict columns to a single data type. The
37+
following sections describe the fields supported in the |json| schema,
38+
the |bson| types that are supported in a relational schema, and how
39+
{+dl+} resolves conflicts for polymorphic fields when mapped to
3840
relational schema.
3941

4042
|json| Schema Format
4143
--------------------
4244

43-
The schema for a collection is a document with two fields: ``jsonSchema``
44-
and ``version``.
45+
The schema for a collection is a document with two fields:
46+
``jsonSchema`` and ``version``.
4547

4648
.. code-block:: json
4749

@@ -50,9 +52,9 @@ and ``version``.
5052
"jsonSchema" : {}
5153
}
5254

53-
The ``version`` field represents the version of the schema format used by
54-
the document and the value is always ``1``. The ``jsonSchema`` field is
55-
a document that describes the schema of the :manual:`namespace
55+
The ``version`` field represents the version of the schema format used
56+
by the document and the value is always ``1``. The ``jsonSchema`` field
57+
is a document that describes the schema of the :manual:`namespace
5658
</reference/limits/#faq-dev-namespace>`.
5759

5860
.. _sql-json-schema-fields:
@@ -77,18 +79,18 @@ Supported BSON Types
7779
{+dl+} only supports the following |bson| types when mapping |json|
7880
schema to relational schema:
7981

80-
- ``double``
81-
- ``string``
82-
- ``object``
8382
- ``array``
8483
- ``binData``
85-
- ``objectId``
8684
- ``bool``
8785
- ``date``
88-
- ``null``
86+
- ``decimal``
87+
- ``double``
8988
- ``int``
9089
- ``long``
91-
- ``decimal``
90+
- ``null``
91+
- ``object``
92+
- ``objectId``
93+
- ``string``
9294

9395
Other types are ignored in the relational schema. Fields with
9496
composite types, such as objects and arrays, are handled specially.
@@ -97,8 +99,8 @@ Object Fields
9799
~~~~~~~~~~~~~
98100

99101
Object fields are flattened such that each nested field maps to its
100-
own column in the relational schema. For example, consider the following
101-
``eg`` collection:
102+
own column in the relational schema. For example, consider the
103+
following ``eg`` collection:
102104

103105
.. code-block:: json
104106

@@ -237,12 +239,13 @@ are representations of the above schema:
237239
Type Conversion Conflicts
238240
-------------------------
239241

240-
MongoDB's :manual:`flexible schema model </core/data-modeling-introduction/>`
241-
allows a given field to contain data of multiple types, while relational
242-
databases restrict columns to a single data type. When {+dl+} maps the
243-
|json| schema to relational schema, type conflicts can occur if a field
244-
is polymorphic. There are two main categories of type conversion conflicts
245-
that might occur when there are multiple data types:
242+
MongoDB's :manual:`flexible schema model
243+
</core/data-modeling-introduction/>` allows a given field to contain
244+
data of multiple types, while relational databases restrict columns to
245+
a single data type. When {+dl+} maps the |json| schema to relational
246+
schema, type conflicts can occur if a field is polymorphic. There are
247+
two main categories of type conversion conflicts that might occur when
248+
there are multiple data types:
246249

247250
- Conflicts between scalar types
248251
- Conflicts involving composite types like documents and arrays
@@ -281,9 +284,9 @@ Document Conflicts
281284
##################
282285

283286
When a conflict occurs involving a document, {+dl+} displays the fields
284-
of the document type as separate columns using dot notation. For example,
285-
consider a collection named ``conflict`` that contains the following
286-
documents:
287+
of the document type as separate columns using dot notation. For
288+
example, consider a collection named ``conflict`` that contains the
289+
following documents:
287290

288291
.. code-block:: javascript
289292
:copyable: false

0 commit comments

Comments
 (0)