|
2 | 2 | <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="50%" />
|
3 | 3 | </p>
|
4 | 4 |
|
5 |
| -# **data-diff** |
| 5 | +<h1 align="center"> |
| 6 | +data-diff |
| 7 | +</h1> |
| 8 | + |
| 9 | +<h2 align="center"> |
| 10 | +Develop dbt models faster by testing as you code. |
| 11 | +</h2> |
| 12 | +<h4 align="center"> |
| 13 | +See how every change to dbt code affects the data produced in the modified model and downstream. |
| 14 | +</h4> |
| 15 | +<br> |
6 | 16 |
|
7 | 17 | ## What is `data-diff`?
|
8 |
| -data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale. |
9 | 18 |
|
10 |
| -## Documentation |
| 19 | +data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code. |
11 | 20 |
|
12 |
| -[**🗎 Documentation website**](https://docs.datafold.com/os_diff/about) - our detailed documentation has everything you need to start diffing. |
| 21 | +<div align="center"> |
13 | 22 |
|
14 |
| -### Databases we support |
| 23 | + |
15 | 24 |
|
16 |
| -- PostgreSQL >=10 |
17 |
| -- MySQL |
18 |
| -- Snowflake |
19 |
| -- BigQuery |
20 |
| -- Redshift |
21 |
| -- Oracle |
22 |
| -- Presto |
23 |
| -- Databricks |
24 |
| -- Trino |
25 |
| -- Clickhouse |
26 |
| -- Vertica |
27 |
| -- DuckDB >=0.6 |
28 |
| -- SQLite (coming soon) |
| 25 | +</div> |
29 | 26 |
|
30 |
| -For their corresponding connection strings, check out our [detailed table](https://docs.datafold.com/os_diff/databases_we_support). |
| 27 | +<br> |
31 | 28 |
|
32 |
| -#### Looking for a database not on the list? |
33 |
| -If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list. |
34 |
| - |
35 |
| -## Use cases |
36 |
| - |
37 |
| -### Diff Tables Between Databases |
38 |
| -#### Quickly identify issues when moving data between databases |
39 |
| - |
40 |
| -<p align="center"> |
41 |
| - <img alt="diff2" src="https://user-images.githubusercontent.com/1799931/196754998-a88c0a52-8751-443d-b052-26c03d99d9e5.png" /> |
42 |
| -</p> |
43 |
| - |
44 |
| -### Diff Tables Within a Database |
45 |
| -#### Improve code reviews by identifying data problems you don't have tests for |
46 |
| -<p align="center"> |
47 |
| - <a href=https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2 target="_blank"> |
48 |
| - <img alt="Intro to Diff" src="https://user-images.githubusercontent.com/1799931/196576582-d3535395-12ef-40fd-bbbb-e205ccae1159.png" width="50%" height="50%" /> |
49 |
| - </a> |
50 |
| -</p> |
51 |
| - |
52 |
| - |
53 |
| - |
54 |
| - |
55 |
| -## Get started |
56 |
| - |
57 |
| -### Installation |
58 |
| - |
59 |
| -#### First, install `data-diff` using `pip`. |
| 29 | +## Getting Started |
60 | 30 |
|
| 31 | +**Install `data-diff`** |
61 | 32 | ```
|
62 | 33 | pip install data-diff
|
63 | 34 | ```
|
64 | 35 |
|
65 |
| -#### Then, install one or more driver(s) specific to the database(s) you want to connect to. |
66 |
| - |
67 |
| -- `pip install 'data-diff[mysql]'` |
68 |
| - |
69 |
| -- `pip install 'data-diff[postgresql]'` |
70 |
| - |
71 |
| -- `pip install 'data-diff[snowflake]'` |
72 |
| - |
73 |
| -- `pip install 'data-diff[presto]'` |
74 |
| - |
75 |
| -- `pip install 'data-diff[oracle]'` |
76 |
| - |
77 |
| -- `pip install 'data-diff[trino]'` |
78 |
| - |
79 |
| -- `pip install 'data-diff[clickhouse]'` |
80 |
| - |
81 |
| -- `pip install 'data-diff[vertica]'` |
82 |
| - |
83 |
| -- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/ |
84 |
| - |
85 |
| -_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._ |
86 |
| - |
87 |
| -### Run your first diff |
88 |
| - |
89 |
| -Once you've installed `data-diff`, you can run it from the command line. |
90 |
| - |
| 36 | +**Update a few lines in your `dbt_project.yml`** |
91 | 37 | ```
|
92 |
| -data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS] |
| 38 | +#dbt_project.yml |
| 39 | +vars: |
| 40 | + data_diff: |
| 41 | + prod_database: my_database |
| 42 | + prod_schema: my_default_schema |
93 | 43 | ```
|
94 | 44 |
|
95 |
| -Be sure to read [the docs](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line) for detailed instructions how to build one of these commands depending on your database setup. |
96 |
| - |
97 |
| -#### Code Example: Diff Tables Between Databases |
98 |
| -Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres. |
| 45 | +**Run your first data diff!** |
99 | 46 |
|
100 | 47 | ```
|
101 |
| -data-diff \ |
102 |
| - postgresql://<username>:'<password>'@localhost:5432/<database> \ |
103 |
| - <table> \ |
104 |
| - "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \ |
105 |
| - <TABLE> \ |
106 |
| - -k activity_id \ |
107 |
| - -c activity \ |
108 |
| - -w "event_timestamp < '2022-10-10'" |
| 48 | +dbt run && data-diff --dbt |
109 | 49 | ```
|
110 | 50 |
|
111 |
| -#### Code Example: Diff Tables Within a Database |
| 51 | +We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details. |
112 | 52 |
|
113 |
| -Here's a code example from [the video](https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2), where we compare data between two Snowflake tables within one database. |
| 53 | +Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started! |
114 | 54 |
|
115 |
| -``` |
116 |
| -data-diff \ |
117 |
| - "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \ |
118 |
| - <SCHEMA_2>.<TABLE_2> \ |
119 |
| - -k org_id \ |
120 |
| - -c created_at -c is_internal \ |
121 |
| - -w "org_id != 1 and org_id < 2000" \ |
122 |
| - -m test_results_%t \ |
123 |
| - --materialize-all-rows \ |
124 |
| - --table-write-limit 10000 |
125 |
| -``` |
126 |
| - |
127 |
| -In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like. |
128 |
| - |
129 |
| -### We're here to help! |
130 |
| - |
131 |
| -We know that in some cases, the data-diff command can become long and dense. And maybe you're new to the command line. |
| 55 | +<br><br> |
132 | 56 |
|
133 |
| -* We're here to help [on slack](https://locallyoptimistic.slack.com/archives/C03HUNGQV0S) if you have ANY questions as you use `data-diff` in your workflow. |
134 |
| -* You can also post a question in [GitHub Discussions](https://github.com/datafold/data-diff/discussions). |
| 57 | +### Diffing between databases |
135 | 58 |
|
| 59 | +Check out our [documentation](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md) if you're looking to compare data across databases (for example, between Postgres and Snowflake). |
136 | 60 |
|
137 |
| -To get a Slack invite - [click here](https://locallyoptimistic.com/community/) |
| 61 | +<br> |
138 | 62 |
|
139 |
| -## How to Use |
| 63 | +## Contributors |
140 | 64 |
|
141 |
| -* [How to use from the shell (or: command-line)](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line) |
142 |
| -* [How to use from Python](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_python) |
143 |
| -* [How to use with TOML configuration file](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_toml) |
144 |
| -* [Usage Analytics & Data Privacy](https://docs.datafold.com/os_diff/usage_analytics_data_privacy) |
145 |
| - |
146 |
| -## How to Contribute |
147 |
| -* Feel free to open an issue or contribute to the project by working on an existing issue. |
148 |
| -* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started. |
149 |
| - |
150 |
| -Big thanks to everyone who contributed so far: |
| 65 | +We thank everyone who contributed so far! |
151 | 66 |
|
152 | 67 | <a href="https://github.com/datafold/data-diff/graphs/contributors">
|
153 | 68 | <img src="https://contributors-img.web.app/image?repo=datafold/data-diff" />
|
154 | 69 | </a>
|
155 | 70 |
|
156 |
| -## Technical Explanation |
| 71 | +<br> |
| 72 | + |
| 73 | +## Analytics |
| 74 | + |
| 75 | +* [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md) |
157 | 76 |
|
158 |
| -Check out this [technical explanation](https://docs.datafold.com/os_diff/technical_explanation) of how data-diff works. |
| 77 | +<br> |
159 | 78 |
|
160 | 79 | ## License
|
161 | 80 |
|
|
0 commit comments