Skip to content

Commit 714f41d

Browse files
Chris Choschmalliso
authored andcommitted
DOCSP-19155: port timeseries migration usage example (#184)
* DOCSP-19155: port timeseries migration usage example
1 parent ebf7c3f commit 714f41d

File tree

10 files changed

+206
-43
lines changed

10 files changed

+206
-43
lines changed

source/sink-connector.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,17 @@ Overview
2323
--------
2424

2525
This section focuses on the **{+sink-connector+}**.
26-
The {+sink-connector+} is a {+kc+} connector that reads data from {+ak+} and
26+
The {+sink-connector+} is a {+kc+} connector that reads data from {+ak+} and
2727
writes data to MongoDB.
2828

2929
Configuration Properties
3030
------------------------
3131

32-
To learn about configuration options for your sink connector, see the
32+
To learn about configuration options for your sink connector, see the
3333
:ref:`Configuration Properties <kafka-sink-configuration-properties>` section.
3434

3535
Fundamentals
3636
------------
3737

38-
To learn how features of the sink connector work and how to configure them, see the
38+
To learn how features of the sink connector work and how to configure them, see the
3939
:ref:`Fundamentals <kafka-sink-fundamentals>` section.

source/sink-connector/configuration-properties/post-processors.txt

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -123,12 +123,6 @@ Settings
123123
| **Description:**
124124
| The class that specifies the ``WriteModelStrategy`` the connector should
125125
use for :manual:`Bulk Writes </core/bulk-write-operations/index.html>`.
126-
127-
.. seealso::
128-
129-
For information on how to create your own strategy, see the tutorial
130-
on :doc:`Write Strategies </tutorials/write-strategies/>`.
131-
132126
|
133127
| **Default**:
134128

source/tutorials.txt

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,6 @@ Tutorials
66
:titlesonly:
77
:maxdepth: 1
88

9-
Set up Sink and Source Connectors on Docker </tutorials/set-up-on-docker>
10-
Set up a Development Environment for Customization </tutorials/set-up-development-environment>
11-
Use Built-In and Custom Write Strategies </tutorials/write-strategies>
12-
Use Built-In and Custom Post-processors </tutorials/post-processors>
13-
Handle Errors in the Sink and Source Connectors </tutorials/handle-errors>
14-
Replicate data with the Change Data Capture Handler </tutorials/replicate-with-cdc>
15-
16-
asdf
9+
Replicate Data with the Change Data Capture Handler </tutorials/replicate-with-cdc>
10+
Migrate an Existing Collection to a Time Series Collection </tutorials/migrate-time-series>
11+

source/tutorials/handle-errors.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
.. _tutorial-migrate-time-series:
2+
3+
==========================================================
4+
Migrate an Existing Collection to a Time Series Collection
5+
==========================================================
6+
7+
.. default-domain:: mongodb
8+
9+
In this tutorial, you can learn how to convert an existing MongoDB
10+
collection to a **time series collection** using the {+mkc+}.
11+
12+
Time series collections efficiently store sequences of measurements
13+
over a period of time. Time series data consists of measurement data collected
14+
over time, metadata that describes the measurement, and the time of the
15+
measurement.
16+
17+
You can configure the source connector to read your existing MongoDB
18+
collection and the sink connector to write Kafka topic data into a MongoDB
19+
time series collection.
20+
21+
To learn more about MongoDB time series collections, see the MongoDB
22+
manual page on :manual:`Time Series Collections </core/timeseries-collections/>`.
23+
24+
Scenario
25+
--------
26+
27+
Suppose you accumulated stock price data in a MongoDB collection and have
28+
the following needs:
29+
30+
- More efficient storage of the price data
31+
- Maintain the ability to analyze stock performance over time using
32+
aggregation operators
33+
34+
After reading about MongoDB time series collections, you decide to migrate
35+
your existing collection into a time series one. Learn how to perform this
36+
migration in the following sections.
37+
38+
Steps to Migrate to a Time Series Collection
39+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
40+
41+
To migrate an existing MongoDB collection to a time series collection, you
42+
need to perform the following tasks:
43+
44+
1) :ref:`Identify the time field <time-series-identify-field>` in the existing
45+
stock price data document.
46+
2) :ref:`Configure a source connector <time-series-source-config>` to copy
47+
the existing collection data to a Kafka topic.
48+
3) :ref:`Configure a sink connector <time-series-sink-config>` to copy the
49+
Kafka topic data to the time series collection.
50+
4) :ref:`Verify the connector migrated the data <time-series-verify-collection>` to the time
51+
series collection.
52+
53+
.. _time-series-identify-field:
54+
55+
Identify the Time Field
56+
~~~~~~~~~~~~~~~~~~~~~~~
57+
58+
Before you create a time series collection, you need to identify the
59+
**time field**. The time field is the document field that MongoDB uses to
60+
distinguish the time of the measurement. The value of this field can be
61+
a string, integer, or ISO date. Make sure to set the
62+
``timeseries.timefield.auto.convert`` setting to instruct the connector to
63+
automatically convert the value to a date.
64+
65+
The following document shows the format of stock price data documents in
66+
the existing MongoDB collection:
67+
68+
.. code-block:: javascript
69+
:copyable: false
70+
71+
{
72+
tx_time: 2021-07-12T05:20:35Z,
73+
symbol: 'WSV',
74+
company_name: 'WORRIED SNAKEBITE VENTURES',
75+
price: 21.22,
76+
_id: ObjectId("...")
77+
}
78+
79+
For this scenario, assume you stored these documents in a collection named
80+
``PriceData`` in the ``Stocks`` database.
81+
82+
You identify that the ``tx_time`` field distinguishes the time of the
83+
measurements, and specify it as your time field in your sink connector
84+
configuration.
85+
86+
Learn how to set your time field and field conversion in the
87+
:ref:`time series configuration guide <time-series-sink-config>`.
88+
89+
.. _time-series-source-config:
90+
91+
Configure the Source Connector
92+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93+
94+
To copy data from the ``PriceData`` MongoDB collection data and publish it
95+
to the ``marketdata.Stocks.PriceData`` Kafka topic, create a source
96+
connector with the following configuration:
97+
98+
.. code-block:: json
99+
100+
{
101+
"name": "mongo-source-marketdata",
102+
"config": {
103+
"tasks.max":"1",
104+
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
105+
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
106+
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
107+
"publish.full.document.only":"true",
108+
"connection.uri":"<your connection uri>",
109+
"topic.prefix":"marketdata",
110+
"database":"Stocks",
111+
"collection":"PriceData",
112+
"copy.existing":"true"
113+
}}
114+
115+
.. note::
116+
117+
If you insert documents into a collection during the copying process,
118+
the connector inserts them after the process is complete.
119+
120+
After you start your source connector with the preceding configuration,
121+
the connector starts the copying process. Once the process is complete,
122+
you should see the following message in the log:
123+
124+
.. code-block:: none
125+
:copyable: false
126+
127+
Finished copying existing data from the collection(s).
128+
129+
Your data from the ``PriceData`` MongoDB collection is now available in
130+
the ``marketdata.Stocks.PriceData`` Kafka topic.
131+
132+
.. _time-series-sink-config:
133+
134+
Configure the Sink Connector
135+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136+
137+
To consume data from the ``marketdata.Stocks.PriceData`` Kafka topic and write
138+
it to a time series collection named ``StockDataMigrate`` in a database
139+
named ``Stocks``, you can create the following source connector configuration:
140+
141+
.. code-block:: json
142+
:emphasize-lines: 12-14
143+
144+
{
145+
"name": "mongo-sink-marketdata",
146+
"config": {
147+
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
148+
"tasks.max":"1",
149+
"topics":"marketdata.Stocks.PriceData",
150+
"connection.uri":"<your connection uri>",
151+
"database":"Stocks",
152+
"collection":"StockDataMigrate",
153+
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
154+
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
155+
"timeseries.timefield":"tx_time",
156+
"timeseries.timefield.auto.convert":"true",
157+
"timeseries.timefield.auto.convert.date.format":"yyyy-MM-dd'T'HH:mm:ss'Z'"
158+
}}
159+
160+
.. tip::
161+
162+
The sink connector configuration above uses the time field date
163+
format converter. Alternatively, you can use the ``TimestampConverter``
164+
Single Message Transform (SMT) to convert the ``tx_time`` field from a
165+
``String`` to an ``ISODate``. When using the ``TimestampConverter`` SMT,
166+
you must define a schema for the data in the Kafka topic.
167+
168+
For information on how to use the ``TimestampConverter`` SMT, see the
169+
`TimestampConverter <https://docs.confluent.io/platform/current/connect/transforms/timestampconverter.html#timestampconverter>`__
170+
Confluent documentation.
171+
172+
After your sink connector finishes processing the topic data, the documents
173+
in the ``StockDataMigrate`` time series collection contain the ``tx_time``
174+
field with an ``ISODate`` type value.
175+
176+
.. _time-series-verify-collection:
177+
178+
Verify the Collection Data
179+
~~~~~~~~~~~~~~~~~~~~~~~~~~
180+
181+
By this step, your time series collection should contain all the market data
182+
from your ``PriceData`` collection. The following shows the format of the
183+
documents in the ``StockDataMigrate`` time series collection:
184+
185+
.. code-block:: javascript
186+
187+
{
188+
tx_time: 2021-07-12T20:05:35.000Z,
189+
symbol: 'WSV',
190+
company_name: 'WORRIED SNAKEBITE VENTURES',
191+
price: 21.22,
192+
_id: ObjectId("...")
193+
}
194+
195+
To learn how to verify a collection is of type **timeseries**, see the
196+
instructions on how to :manual:`Check if a Collection is of Type Time Series </core/timeseries-collections/#check-if-a-collection-is-of-type-time-series>`
197+
in the MongoDB manual.
198+

source/tutorials/post-processors.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.

source/tutorials/set-up-development-environment.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.

source/tutorials/set-up-on-docker.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.

source/tutorials/write-strategies.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.

source/whats-new.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,8 @@ Sink Connector
5555

5656
- Added support for :ref:`automatic time-series collection creation <sink-configuration-time-series>`
5757
in MongoDB 5.0 to efficiently store sequences of measurements over a period
58-
of time
58+
of time. Learn how to configure connectors to :ref:`<tutorial-migrate-time-series>`.
59+
5960
- Improved the error logging for bulk write exceptions
6061

6162
Source Connector

0 commit comments

Comments
 (0)