|
2 | 2 | Kafka and Kafka Connect
|
3 | 3 | =======================
|
4 | 4 |
|
5 |
| -asdf |
| 5 | +.. default-domain:: mongodb |
| 6 | + |
| 7 | +.. contents:: On this page |
| 8 | + :local: |
| 9 | + :backlinks: none |
| 10 | + :depth: 1 |
| 11 | + :class: singlecol |
| 12 | + |
| 13 | +Overview |
| 14 | +~~~~~~~~ |
| 15 | + |
| 16 | +In this guide, you can learn the following foundational information about Apache |
| 17 | +Kafka and Kafka Connect: |
| 18 | + |
| 19 | +- What Apache Kafka and Kafka Connect are |
| 20 | +- What problems Apache Kafka and Kafka Connect solve |
| 21 | +- Why Apache Kafka and Kafka Connect are useful |
| 22 | +- How data moves through an Apache Kafka and Kafka Connect pipeline |
| 23 | + |
| 24 | +Apache Kafka |
| 25 | +~~~~~~~~~~~~ |
| 26 | + |
| 27 | +Apache Kafka is an open source publish/subscribe messaging system. Apache Kafka |
| 28 | +provides a flexible, **fault tolerant**, and **horizontally scalable** system to |
| 29 | +move data throughout datastores and applications. A system is fault tolerant |
| 30 | +if the system can continue operating even if certain components of the |
| 31 | +system stop working. A system is horizontally scalable if the system can be |
| 32 | +expanded to handle larger workloads by adding more machines rather than by |
| 33 | +improving a machine's hardware. |
| 34 | + |
| 35 | +For more information on Apache Kafka, see the following resources: |
| 36 | + |
| 37 | +- `Confluent "What is Apache Kafka?" Page <https://www.confluent.io/what-is-apache-kafka/>`__ |
| 38 | +- `Apache Kafka Official Documentation <https://kafka.apache.org/documentation/>`__ |
| 39 | + |
| 40 | +Kafka Connect |
| 41 | +~~~~~~~~~~~~~ |
| 42 | + |
| 43 | +Kafka Connect is a component of Apache Kafka that solves the problem of |
| 44 | +connecting Apache Kafka to datastores such as MongoDB. Kafka Connect solves this |
| 45 | +problem by providing the following resources: |
| 46 | + |
| 47 | +- A fault tolerant runtime for transferring data to and from datastores. |
| 48 | +- A framework for the Apache Kafka community to share solutions for |
| 49 | + connecting Apache Kafka to different datastores. |
| 50 | + |
| 51 | +The Kafka Connect framework defines an API for developers to write reusable |
| 52 | +**connectors**. Connectors enable Kafka Connect deployments |
| 53 | +to interact with a specific datastore as a data source or a data sink. The |
| 54 | +MongoDB Kafka Connector is one of these connectors. |
| 55 | + |
| 56 | +For more information on Kafka Connect, see the following resources: |
| 57 | + |
| 58 | +- `Confluent Kafka Connect Page <https://docs.confluent.io/platform/current/connect/index.html>`__ |
| 59 | +- `Apache Kafka Official Documentation, Kafka Connect Guide <https://kafka.apache.org/documentation/#connect>`__ |
| 60 | +- `Apache Foundation Video Walk-Through of the Kafka Connect Framework <https://www.youtube.com/watch?v=EXviLqXFoQI>`__ |
| 61 | + |
| 62 | +.. tip:: Use Kafka Connect instead of Producer/Consumer Clients when Connecting to Datastores |
| 63 | + |
| 64 | + While you could write your own application to connect Apache Kafka to a |
| 65 | + specific datastore using producer and consumer clients, Kafka Connect may be |
| 66 | + a better fit for you. Here are some reasons to use Kafka Connect: |
| 67 | + |
| 68 | + - Kafka Connect has a fault tolerant distributed architecture to ensure a |
| 69 | + reliable pipeline. |
| 70 | + - There are a large number of community maintained connectors for connecting |
| 71 | + Apache Kafka to popular datastores like MongoDB, PostgreSQL, and MySQL using the |
| 72 | + Kafka Connect framework. This reduces the amount of boilerplate code you need to |
| 73 | + write and maintain to manage database connections, error handling, |
| 74 | + dead-letter queue integration, and other problems involved in connecting Apache Kafka |
| 75 | + with a datastore. |
| 76 | + - You have the option to use a managed Kafka Connect cluster from Confluent. |
| 77 | + |
| 78 | +Diagram |
| 79 | +~~~~~~~ |
| 80 | + |
| 81 | +The following diagram shows how information flows through an example data pipeline |
| 82 | +built with Apache Kafka and Kafka Connect. The example pipeline uses a MongoDB |
| 83 | +cluster as a data source, and a MongoDB cluster as a data sink. |
| 84 | + |
| 85 | +<TODO: Update the image to version that has gone through design department> |
| 86 | + |
| 87 | +.. figure:: /includes/figures/connect-data-flow.png |
| 88 | + :alt: Dataflow diagram of Kafka Connect deployment. |
| 89 | + |
| 90 | +All connectors and datastores in the example pipeline are optional, and you can |
| 91 | +swap them out for whatever connectors and datastores you need for your deployment. |
0 commit comments