Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Commit 6b3d000

Browse files
authored
Merge pull request #551 from leoebfolsom/update-readme
focus readme on the dbt use case
2 parents cfd941f + 7a01ce9 commit 6b3d000

File tree

1 file changed

+35
-93
lines changed

1 file changed

+35
-93
lines changed

README.md

Lines changed: 35 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -4,134 +4,76 @@
44

55
# **data-diff**
66

7-
## What is `data-diff`?
8-
data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables.
9-
10-
## Documentation
11-
12-
[**🗎 Documentation**](https://docs.datafold.com/guides/os_data_diff) - our detailed documentation has everything you need to start diffing.
7+
<h2 align="center">
8+
Develop dbt models faster by testing as you code.
9+
</h2>
10+
<h4 align="center">
11+
See how every change to dbt code affects the data produced in the modified model and downstream.
12+
</h4>
13+
<br>
1314

14-
### Databases we support
15+
## What is `data-diff`?
1516

16-
- PostgreSQL >=10
17-
- MySQL
18-
- Snowflake
19-
- BigQuery
20-
- Redshift
21-
- Oracle
22-
- Presto
23-
- Databricks
24-
- Trino
25-
- Clickhouse
26-
- Vertica
27-
- DuckDB >=0.6
28-
- SQLite (coming soon)
17+
data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code.
2918

30-
For their corresponding connection strings, check out our [detailed table](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md).
19+
<div align="center">
3120

32-
#### Looking for a database not on the list?
33-
If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list.
21+
![development_testing_gif](https://user-images.githubusercontent.com/1799931/236354286-d1d044cf-2168-4128-8a21-8c8ca7fd494c.gif)
3422

35-
## Get started
23+
</div>
3624

37-
### Installation
25+
<br>
3826

39-
#### First, install `data-diff` using `pip`.
27+
## Getting Started
4028

29+
**Install `data-diff`**
4130
```
4231
pip install data-diff
4332
```
4433

45-
#### Then, install one or more driver(s) specific to the database(s) you want to connect to.
46-
47-
- `pip install 'data-diff[mysql]'`
48-
49-
- `pip install 'data-diff[postgresql]'`
50-
51-
- `pip install 'data-diff[snowflake]'`
52-
53-
- `pip install 'data-diff[presto]'`
54-
55-
- `pip install 'data-diff[oracle]'`
56-
57-
- `pip install 'data-diff[trino]'`
58-
59-
- `pip install 'data-diff[clickhouse]'`
60-
61-
- `pip install 'data-diff[vertica]'`
62-
63-
- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/
64-
65-
_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._
66-
67-
### Run your first diff
68-
69-
Once you've installed `data-diff`, you can run it from the command line.
70-
34+
**Update a few lines in your `dbt_project.yml`**
7135
```
72-
data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
36+
#dbt_project.yml
37+
vars:
38+
data_diff:
39+
prod_database: my_database
40+
prod_schema: my_default_schema
7341
```
7442

75-
Be sure to read [the docs](https://docs.datafold.com/reference/open_source/cli) for detailed instructions how to build one of these commands depending on your database setup.
76-
77-
#### Code Example: Diff Tables Between Databases
78-
Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres.
43+
**Run your first data diff!**
7944

8045
```
81-
data-diff \
82-
postgresql://<username>:'<password>'@localhost:5432/<database> \
83-
<table> \
84-
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
85-
<TABLE> \
86-
-k activity_id \
87-
-c activity \
88-
-w "event_timestamp < '2022-10-10'"
46+
dbt run && data-diff --dbt
8947
```
9048

91-
#### Code Example: Diff Tables Within a Database
49+
We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details.
9250

93-
```
94-
data-diff \
95-
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \
96-
<SCHEMA_2>.<TABLE_2> \
97-
-k org_id \
98-
-c created_at -c is_internal \
99-
-w "org_id != 1 and org_id < 2000" \
100-
-m test_results_%t \
101-
--materialize-all-rows \
102-
--table-write-limit 10000
103-
```
104-
105-
In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like.
51+
Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started!
10652

107-
### We're here to help!
53+
<br><br>
10854

109-
We're here to help! Please post any questions in [GitHub Discussions](https://github.com/datafold/data-diff/discussions).
55+
### Diffing between databases
11056

111-
## How to Use
57+
Check out our [documentation](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md) if you're looking to compare data across databases (for example, between Postgres and Snowflake).
11258

113-
* [Examples with dbt, joindiff, and hashdiff](https://docs.datafold.com/reference/open_source/cli#examples)
114-
* [Examples with Python](https://data-diff.readthedocs.io/en/latest/python-api.html)
115-
* [How to use with TOML configuration file](https://docs.datafold.com/reference/open_source/cli#toml-config-file)
59+
<br>
11660

117-
## How to Contribute
118-
* Feel free to open an issue or contribute to the project by working on an existing issue.
119-
* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started.
120-
* To add a new database driver, check out [docs](https://github.com/datafold/data-diff/blob/master/docs/new-database-driver-guide.rst).
61+
## Contributors
12162

122-
Big thanks to everyone who contributed so far:
63+
We thank everyone who contributed so far!
12364

12465
<a href="https://github.com/datafold/data-diff/graphs/contributors">
12566
<img src="https://contributors-img.web.app/image?repo=datafold/data-diff" />
12667
</a>
12768

128-
## Technical Explanation
129-
130-
Check out this [technical explanation](https://github.com/datafold/data-diff/blob/master/docs/technical-explanation.md) of how data-diff works.
69+
<br>
13170

13271
## Analytics
72+
13373
* [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md)
13474

75+
<br>
76+
13577
## License
13678

13779
This project is licensed under the terms of the [MIT License](https://github.com/datafold/data-diff/blob/master/LICENSE).

0 commit comments

Comments
 (0)