Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Commit d958f69

Browse files
committed
Use proper name: pgsql,postgres -> PostgreSQL
SQLAlchemy uses "postgresql://" so now we do too.
1 parent 4b670fa commit d958f69

File tree

10 files changed

+42
-42
lines changed

10 files changed

+42
-42
lines changed

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ also find us in `#tools-data-diff` in the [Locally Optimistic Slack.][slack]**
77
**data-diff** is a command-line tool and Python library to efficiently diff
88
rows across two different databases.
99

10-
* ⇄ Verifies across [many different databases][dbs] (e.g. Postgres -> Snowflake)
10+
* ⇄ Verifies across [many different databases][dbs] (e.g. PostgreSQL -> Snowflake)
1111
* 🔍 Outputs [diff of rows](#example-command-and-output) in detail
1212
* 🚨 Simple CLI/API to create monitoring and alerts
1313
* 🔥 Verify 25M+ rows in <10s, and 1B+ rows in ~5min.
@@ -28,7 +28,7 @@ comparing every row.
2828

2929
**†:** The implementation for downloading all rows that `data-diff` and
3030
`count(*)` is compared to is not optimal. It is a single Python multi-threaded
31-
process. The performance is fairly driver-specific, e.g. Postgres' performs 10x
31+
process. The performance is fairly driver-specific, e.g. PostgreSQL's performs 10x
3232
better than MySQL.
3333

3434
## Table of Contents
@@ -45,7 +45,7 @@ better than MySQL.
4545
## Common use-cases
4646

4747
* **Verify data migrations.** Verify that all data was copied when doing a
48-
critical data migration. For example, migrating from Heroku Postgres to Amazon RDS.
48+
critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS.
4949
* **Verifying data pipelines.** Moving data from a relational database to a
5050
warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline.
5151
* **Alerting and maintaining data integrity SLOs.** You can create and monitor
@@ -63,13 +63,13 @@ better than MySQL.
6363

6464
## Example Command and Output
6565

66-
Below we run a comparison with the CLI for 25M rows in Postgres where the
66+
Below we run a comparison with the CLI for 25M rows in PostgreSQL where the
6767
right-hand table is missing single row with `id=12500048`:
6868

6969
```
7070
$ data-diff \
71-
postgres://postgres:password@localhost/postgres rating \
72-
postgres://postgres:password@localhost/postgres rating_del1 \
71+
postgresql://user:password@localhost/database rating \
72+
postgresql://user:password@localhost/database rating_del1 \
7373
--bisection-threshold 100000 \ # for readability, try default first
7474
--bisection-factor 6 \ # for readability, try default first
7575
--update-column timestamp \
@@ -111,7 +111,7 @@ $ data-diff \
111111

112112
| Database | Connection string | Status |
113113
|---------------|-----------------------------------------------------------------------------------------|--------|
114-
| Postgres | `postgres://user:password@hostname:5432/database` | 💚 |
114+
| PostgreSQL | `postgresql://user:password@hostname:5432/database` | 💚 |
115115
| MySQL | `mysql://user:password@hostname:5432/database` | 💚 |
116116
| Snowflake | `snowflake://user:password@account/database/SCHEMA?warehouse=WAREHOUSE&role=role` | 💚 |
117117
| Oracle | `oracle://username:password@hostname/database` | 💛 |
@@ -140,9 +140,9 @@ Requires Python 3.7+ with pip.
140140

141141
```pip install data-diff```
142142

143-
or when you need extras like mysql and postgres
143+
or when you need extras like mysql and postgresql
144144

145-
```pip install "data-diff[mysql,pgsql]"```
145+
```pip install "data-diff[mysql,postgresql]"```
146146

147147
# How to use
148148

@@ -185,7 +185,7 @@ logging.basicConfig(level=logging.INFO)
185185

186186
from data_diff import connect_to_table, diff_tables
187187

188-
table1 = connect_to_table("postgres:///", "table_name", "id")
188+
table1 = connect_to_table("postgresql:///", "table_name", "id")
189189
table2 = connect_to_table("mysql:///", "table_name", "id")
190190

191191
for different_row in diff_tables(table1, table2):
@@ -201,11 +201,11 @@ In this section we'll be doing a walk-through of exactly how **data-diff**
201201
works, and how to tune `--bisection-factor` and `--bisection-threshold`.
202202

203203
Let's consider a scenario with an `orders` table with 1M rows. Fivetran is
204-
replicating it contionously from Postgres to Snowflake:
204+
replicating it contionously from PostgreSQL to Snowflake:
205205

206206
```
207207
┌─────────────┐ ┌─────────────┐
208-
Postgres │ │ Snowflake │
208+
PostgreSQL │ │ Snowflake │
209209
├─────────────┤ ├─────────────┤
210210
│ │ │ │
211211
│ │ │ │
@@ -233,7 +233,7 @@ of the table. Then it splits the table into `--bisection-factor=10` segments of
233233

234234
```
235235
┌──────────────────────┐ ┌──────────────────────┐
236-
Postgres │ │ Snowflake │
236+
PostgreSQL │ │ Snowflake │
237237
├──────────────────────┤ ├──────────────────────┤
238238
│ id=1..100k │ │ id=1..100k │
239239
├──────────────────────┤ ├──────────────────────┤
@@ -281,7 +281,7 @@ are the same except `id=100k..200k`:
281281

282282
```
283283
┌──────────────────────┐ ┌──────────────────────┐
284-
Postgres │ │ Snowflake │
284+
PostgreSQL │ │ Snowflake │
285285
├──────────────────────┤ ├──────────────────────┤
286286
│ checksum=0102 │ │ checksum=0102 │
287287
├──────────────────────┤ mismatch! ├──────────────────────┤
@@ -306,7 +306,7 @@ and compare them in memory in **data-diff**.
306306

307307
```
308308
┌──────────────────────┐ ┌──────────────────────┐
309-
Postgres │ │ Snowflake │
309+
PostgreSQL │ │ Snowflake │
310310
├──────────────────────┤ ├──────────────────────┤
311311
│ id=100k..110k │ │ id=100k..110k │
312312
├──────────────────────┤ ├──────────────────────┤
@@ -337,7 +337,7 @@ If you pass `--stats` you'll see e.g. what % of rows were different.
337337
queries.
338338
* Consider increasing the number of simultaneous threads executing
339339
queries per database with `--threads`. For databases that limit concurrency
340-
per query, e.g. Postgres/MySQL, this can improve performance dramatically.
340+
per query, e.g. PostgreSQL/MySQL, this can improve performance dramatically.
341341
* If you are only interested in _whether_ something changed, pass `--limit 1`.
342342
This can be useful if changes are very rare. This is often faster than doing a
343343
`count(*)`, for the reason mentioned above.
@@ -419,7 +419,7 @@ Now you can insert it into the testing database(s):
419419
```shell-session
420420
# It's optional to seed more than one to run data-diff(1) against.
421421
$ poetry run preql -f dev/prepare_db.pql mysql://mysql:[email protected]:3306/mysql
422-
$ poetry run preql -f dev/prepare_db.pql postgres://postgres:[email protected]:5432/postgres
422+
$ poetry run preql -f dev/prepare_db.pql postgresql://postgres:[email protected]:5432/postgres
423423
424424
# Cloud databases
425425
$ poetry run preql -f dev/prepare_db.pql snowflake://<uri>
@@ -430,7 +430,7 @@ $ poetry run preql -f dev/prepare_db.pql bigquery:///<project>
430430
**5. Run **data-diff** against seeded database**
431431

432432
```bash
433-
poetry run python3 -m data_diff postgres://postgres:Password1@localhost/postgres rating postgres://postgres:Password1@localhost/postgres rating_del1 --verbose
433+
poetry run python3 -m data_diff postgresql://postgres:Password1@localhost/postgres rating postgresql://postgres:Password1@localhost/postgres rating_del1 --verbose
434434
```
435435

436436
# License

data_diff/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ def diff_tables(
5656
"""Efficiently finds the diff between table1 and table2.
5757
5858
Example:
59-
>>> table1 = connect_to_table('postgres:///', 'Rating', 'id')
59+
>>> table1 = connect_to_table('postgresql:///', 'Rating', 'id')
6060
>>> list(diff_tables(table1, table1))
6161
[]
6262

data_diff/database.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ def _inner():
4040
return dec
4141

4242

43-
@import_helper("pgsql")
44-
def import_postgres():
43+
@import_helper("postgresql")
44+
def import_postgresql():
4545
import psycopg2
4646
import psycopg2.extras
4747

@@ -427,7 +427,7 @@ def close(self):
427427
TIMESTAMP_PRECISION_POS = 20 # len("2022-06-03 12:24:35.") == 20
428428

429429

430-
class Postgres(ThreadedDatabase):
430+
class PostgreSQL(ThreadedDatabase):
431431
DATETIME_TYPES = {
432432
"timestamp with time zone": TimestampTZ,
433433
"timestamp without time zone": Timestamp,
@@ -451,16 +451,16 @@ def __init__(self, host, port, user, password, *, database, thread_count, **kw):
451451
super().__init__(thread_count=thread_count)
452452

453453
def _convert_db_precision_to_digits(self, p: int) -> int:
454-
# Subtracting 2 due to wierd precision issues in Postgres
454+
# Subtracting 2 due to wierd precision issues in PostgreSQL
455455
return super()._convert_db_precision_to_digits(p) - 2
456456

457457
def create_connection(self):
458-
postgres = import_postgres()
458+
pg = import_postgresql()
459459
try:
460-
c = postgres.connect(**self.args)
460+
c = pg.connect(**self.args)
461461
# c.cursor().execute("SET TIME ZONE 'UTC'")
462462
return c
463-
except postgres.OperationalError as e:
463+
except pg.OperationalError as e:
464464
raise ConnectError(*e.args) from e
465465

466466
def quote(self, s: str):
@@ -722,9 +722,9 @@ def _parse_type(
722722
return UnknownColType(type_repr)
723723

724724

725-
class Redshift(Postgres):
725+
class Redshift(PostgreSQL):
726726
NUMERIC_TYPES = {
727-
**Postgres.NUMERIC_TYPES,
727+
**PostgreSQL.NUMERIC_TYPES,
728728
"double": Float,
729729
"real": Float,
730730
}
@@ -1005,7 +1005,7 @@ def match_path(self, dsn):
10051005

10061006

10071007
MATCH_URI_PATH = {
1008-
"postgres": MatchUriPath(Postgres, ["database?"], help_str="postgres://<user>:<pass>@<host>/<database>"),
1008+
"postgresql": MatchUriPath(PostgreSQL, ["database?"], help_str="postgresql://<user>:<pass>@<host>/<database>"),
10091009
"mysql": MatchUriPath(MySQL, ["database?"], help_str="mysql://<user>:<pass>@<host>/<database>"),
10101010
"oracle": MatchUriPath(Oracle, ["database?"], help_str="oracle://<user>:<pass>@<host>/<database>"),
10111011
"mssql": MatchUriPath(MsSQL, ["database?"], help_str="mssql://<user>:<pass>@<host>/<database>"),
@@ -1034,7 +1034,7 @@ def connect_to_uri(db_uri: str, thread_count: Optional[int] = 1) -> Database:
10341034
Note: For non-cloud databases, a low thread-pool size may be a performance bottleneck.
10351035
10361036
Supported schemes:
1037-
- postgres
1037+
- postgresql
10381038
- mysql
10391039
- mssql
10401040
- oracle

docs/index.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Introduction
1111
**Data-diff** is a command-line tool and Python library to efficiently diff
1212
rows across two different databases.
1313

14-
⇄ Verifies across many different databases (e.g. *Postgres* -> *Snowflake*) !
14+
⇄ Verifies across many different databases (e.g. *PostgreSQL* -> *Snowflake*) !
1515

1616
🔍 Outputs diff of rows in detail
1717

@@ -32,11 +32,11 @@ Requires Python 3.7+ with pip.
3232

3333
pip install data-diff
3434

35-
or when you need extras like mysql and postgres:
35+
or when you need extras like mysql and postgresql:
3636

3737
::
3838

39-
pip install "data-diff[mysql,pgsql]"
39+
pip install "data-diff[mysql,postgresql]"
4040

4141

4242
How to use from Python
@@ -50,7 +50,7 @@ How to use from Python
5050
5151
from data_diff import connect_to_table, diff_tables
5252
53-
table1 = connect_to_table("postgres:///", "table_name", "id")
53+
table1 = connect_to_table("postgresql:///", "table_name", "id")
5454
table2 = connect_to_table("mysql:///", "table_name", "id")
5555
5656
for sign, columns in diff_tables(table1, table2):

poetry.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ parameterized = "*"
4747
# When adding, update also: README + dev deps just above
4848
preql = ["preql"]
4949
mysql = ["mysql-connector-python"]
50-
pgsql = ["psycopg2"]
50+
postgresql = ["psycopg2"]
5151
snowflake = ["snowflake-connector-python"]
5252
presto = ["presto-python-client"]
5353
oracle = ["cx_Oracle"]

tests/common.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
logging.basicConfig(level=logging.INFO)
77

88
TEST_MYSQL_CONN_STRING: str = "mysql://mysql:Password1@localhost/mysql"
9-
TEST_POSTGRES_CONN_STRING: str = None
9+
TEST_POSTGRESQL_CONN_STRING: str = None
1010
TEST_SNOWFLAKE_CONN_STRING: str = None
1111
TEST_BIGQUERY_CONN_STRING: str = None
1212
TEST_REDSHIFT_CONN_STRING: str = None
@@ -26,7 +26,7 @@
2626
CONN_STRINGS = {
2727
db.BigQuery: TEST_BIGQUERY_CONN_STRING,
2828
db.MySQL: TEST_MYSQL_CONN_STRING,
29-
db.Postgres: TEST_POSTGRES_CONN_STRING,
29+
db.PostgreSQL: TEST_POSTGRESQL_CONN_STRING,
3030
db.Snowflake: TEST_SNOWFLAKE_CONN_STRING,
3131
db.Redshift: TEST_REDSHIFT_CONN_STRING,
3232
db.Oracle: TEST_ORACLE_CONN_STRING,

tests/test_database.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def test_md5_to_int(self):
2222
class TestConnect(unittest.TestCase):
2323
def test_bad_uris(self):
2424
self.assertRaises(ValueError, connect_to_uri, "p")
25-
self.assertRaises(ValueError, connect_to_uri, "postgres:///bla/foo")
25+
self.assertRaises(ValueError, connect_to_uri, "postgresql:///bla/foo")
2626
self.assertRaises(ValueError, connect_to_uri, "snowflake://erez:erez27Snow@bya42734/xdiffdev/TEST1")
2727
self.assertRaises(
2828
ValueError, connect_to_uri, "snowflake://erez:erez27Snow@bya42734/xdiffdev/TEST1?warehouse=ha&schema=dup"

tests/test_database_types.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
}
5353

5454
DATABASE_TYPES = {
55-
db.Postgres: {
55+
db.PostgreSQL: {
5656
# https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-INT
5757
"int": [
5858
# "smallint", # 2 bytes

tests/test_normalize_fields.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
logger = logging.getLogger()
1515

1616
DATE_TYPES = {
17-
db.Postgres: ["timestamp({p}) with time zone", "timestamp({p}) without time zone"],
17+
db.PostgreSQL: ["timestamp({p}) with time zone", "timestamp({p}) without time zone"],
1818
db.MySQL: ["datetime({p})", "timestamp({p})"],
1919
db.Snowflake: ["timestamp({p})", "timestamp_tz({p})", "timestamp_ntz({p})"],
2020
db.BigQuery: ["timestamp", "datetime"],

0 commit comments

Comments
 (0)