Use proper name: pgsql,postgres -> PostgreSQL

erezsh · erezsh · commit d958f6995d31 · 2022-06-22T10:00:49.000+02:00
SQLAlchemy uses "postgresql://" so now we do too.
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ also find us in `#tools-data-diff` in the [Locally Optimistic Slack.][slack]**
 **data-diff** is a command-line tool and Python library to efficiently diff
 rows across two different databases.
 
-* ⇄  Verifies across [many different databases][dbs] (e.g. Postgres -> Snowflake)
+* ⇄  Verifies across [many different databases][dbs] (e.g. PostgreSQL -> Snowflake)
 * 🔍 Outputs [diff of rows](#example-command-and-output) in detail
 * 🚨 Simple CLI/API to create monitoring and alerts
 * 🔥 Verify 25M+ rows in <10s, and 1B+ rows in ~5min.
@@ -28,7 +28,7 @@ comparing every row.
 
 **†:** The implementation for downloading all rows that `data-diff` and
 `count(*)` is compared to is not optimal. It is a single Python multi-threaded
-process. The performance is fairly driver-specific, e.g. Postgres' performs 10x
+process. The performance is fairly driver-specific, e.g. PostgreSQL's performs 10x
 better than MySQL.
 
 ## Table of Contents
@@ -45,7 +45,7 @@ better than MySQL.
 ## Common use-cases
 
 * **Verify data migrations.** Verify that all data was copied when doing a
-  critical data migration. For example, migrating from Heroku Postgres to Amazon RDS.
+  critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS.
 * **Verifying data pipelines.** Moving data from a relational database to a
   warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline.
 * **Alerting and maintaining data integrity SLOs.** You can create and monitor
@@ -63,13 +63,13 @@ better than MySQL.
 
 ## Example Command and Output
 
-Below we run a comparison with the CLI for 25M rows in Postgres where the
+Below we run a comparison with the CLI for 25M rows in PostgreSQL where the
 right-hand table is missing single row with `id=12500048`:
 
 ```
 $ data-diff \
-    postgres://postgres:password@localhost/postgres rating \
-    postgres://postgres:password@localhost/postgres rating_del1 \
+    postgresql://user:password@localhost/database rating \
+    postgresql://user:password@localhost/database rating_del1 \
     --bisection-threshold 100000 \ # for readability, try default first
     --bisection-factor 6 \ # for readability, try default first
     --update-column timestamp \
@@ -111,7 +111,7 @@ $ data-diff \
 
 | Database      | Connection string                                                                       | Status |
 |---------------|-----------------------------------------------------------------------------------------|--------|
-| Postgres      | `postgres://user:password@hostname:5432/database`                                       |  💚    |
+| PostgreSQL    | `postgresql://user:password@hostname:5432/database`                                     |  💚    |
 | MySQL         | `mysql://user:password@hostname:5432/database`                                          |  💚    |
 | Snowflake     | `snowflake://user:password@account/database/SCHEMA?warehouse=WAREHOUSE&role=role`       |  💚    |
 | Oracle        | `oracle://username:password@hostname/database`                                          |  💛    |
@@ -140,9 +140,9 @@ Requires Python 3.7+ with pip.
 
 ```pip install data-diff```
 
-or when you need extras like mysql and postgres
+or when you need extras like mysql and postgresql
 
-```pip install "data-diff[mysql,pgsql]"```
+```pip install "data-diff[mysql,postgresql]"```
 
 # How to use
 
@@ -185,7 +185,7 @@ logging.basicConfig(level=logging.INFO)
 
 from data_diff import connect_to_table, diff_tables
 
-table1 = connect_to_table("postgres:///", "table_name", "id")
+table1 = connect_to_table("postgresql:///", "table_name", "id")
 table2 = connect_to_table("mysql:///", "table_name", "id")
 
 for different_row in diff_tables(table1, table2):
@@ -201,11 +201,11 @@ In this section we'll be doing a walk-through of exactly how **data-diff**
 works, and how to tune `--bisection-factor` and `--bisection-threshold`.
 
 Let's consider a scenario with an `orders` table with 1M rows. Fivetran is
-replicating it contionously from Postgres to Snowflake:
+replicating it contionously from PostgreSQL to Snowflake:
 
 ```
 ┌─────────────┐                        ┌─────────────┐
-│  Postgres   │                        │  Snowflake  │
+│ PostgreSQL  │                        │  Snowflake  │
 ├─────────────┤                        ├─────────────┤
 │             │                        │             │
 │             │                        │             │
@@ -233,7 +233,7 @@ of the table. Then it splits the table into `--bisection-factor=10` segments of
 
 ```
 ┌──────────────────────┐              ┌──────────────────────┐
-│       Postgres       │              │      Snowflake       │
+│     PostgreSQL       │              │      Snowflake       │
 ├──────────────────────┤              ├──────────────────────┤
 │      id=1..100k      │              │      id=1..100k      │
 ├──────────────────────┤              ├──────────────────────┤
@@ -281,7 +281,7 @@ are the same except `id=100k..200k`:
 
 ```
 ┌──────────────────────┐              ┌──────────────────────┐
-│       Postgres       │              │      Snowflake       │
+│     PostgreSQL       │              │      Snowflake       │
 ├──────────────────────┤              ├──────────────────────┤
 │    checksum=0102     │              │    checksum=0102     │
 ├──────────────────────┤   mismatch!  ├──────────────────────┤
@@ -306,7 +306,7 @@ and compare them in memory in **data-diff**.
 
 ```
 ┌──────────────────────┐              ┌──────────────────────┐
-│       Postgres       │              │      Snowflake       │
+│     PostgreSQL       │              │      Snowflake       │
 ├──────────────────────┤              ├──────────────────────┤
 │    id=100k..110k     │              │    id=100k..110k     │
 ├──────────────────────┤              ├──────────────────────┤
@@ -337,7 +337,7 @@ If you pass `--stats` you'll see e.g. what % of rows were different.
   queries.
 * Consider increasing the number of simultaneous threads executing
   queries per database with `--threads`. For databases that limit concurrency
-  per query, e.g. Postgres/MySQL, this can improve performance dramatically.
+  per query, e.g. PostgreSQL/MySQL, this can improve performance dramatically.
 * If you are only interested in _whether_ something changed, pass `--limit 1`.
   This can be useful if changes are very rare. This is often faster than doing a
   `count(*)`, for the reason mentioned above.
@@ -419,7 +419,7 @@ Now you can insert it into the testing database(s):
 ```shell-session
 # It's optional to seed more than one to run data-diff(1) against.
 $ poetry run preql -f dev/prepare_db.pql mysql://mysql:Password1@127.0.0.1:3306/mysql
-$ poetry run preql -f dev/prepare_db.pql postgres://postgres:Password1@127.0.0.1:5432/postgres
+$ poetry run preql -f dev/prepare_db.pql postgresql://postgres:Password1@127.0.0.1:5432/postgres
 
 # Cloud databases
 $ poetry run preql -f dev/prepare_db.pql snowflake://<uri>
@@ -430,7 +430,7 @@ $ poetry run preql -f dev/prepare_db.pql bigquery:///<project>
 **5. Run **data-diff** against seeded database**
 
 ```bash
-poetry run python3 -m data_diff postgres://postgres:Password1@localhost/postgres rating postgres://postgres:Password1@localhost/postgres rating_del1 --verbose
+poetry run python3 -m data_diff postgresql://postgres:Password1@localhost/postgres rating postgresql://postgres:Password1@localhost/postgres rating_del1 --verbose
 ```
 
 # License
diff --git a/data_diff/__init__.py b/data_diff/__init__.py
@@ -56,7 +56,7 @@ def diff_tables(
     """Efficiently finds the diff between table1 and table2.
 
     Example:
-        >>> table1 = connect_to_table('postgres:///', 'Rating', 'id')
+        >>> table1 = connect_to_table('postgresql:///', 'Rating', 'id')
         >>> list(diff_tables(table1, table1))
         []
 
diff --git a/data_diff/database.py b/data_diff/database.py
@@ -40,8 +40,8 @@ def _inner():
     return dec
 
 
-@import_helper("pgsql")
-def import_postgres():
+@import_helper("postgresql")
+def import_postgresql():
     import psycopg2
     import psycopg2.extras
 
@@ -427,7 +427,7 @@ def close(self):
 TIMESTAMP_PRECISION_POS = 20  # len("2022-06-03 12:24:35.") == 20
 
 
-class Postgres(ThreadedDatabase):
+class PostgreSQL(ThreadedDatabase):
     DATETIME_TYPES = {
         "timestamp with time zone": TimestampTZ,
         "timestamp without time zone": Timestamp,
@@ -451,16 +451,16 @@ def __init__(self, host, port, user, password, *, database, thread_count, **kw):
         super().__init__(thread_count=thread_count)
 
     def _convert_db_precision_to_digits(self, p: int) -> int:
-        # Subtracting 2 due to wierd precision issues in Postgres
+        # Subtracting 2 due to wierd precision issues in PostgreSQL
         return super()._convert_db_precision_to_digits(p) - 2
 
     def create_connection(self):
-        postgres = import_postgres()
+        pg = import_postgresql()
         try:
-            c = postgres.connect(**self.args)
+            c = pg.connect(**self.args)
             # c.cursor().execute("SET TIME ZONE 'UTC'")
             return c
-        except postgres.OperationalError as e:
+        except pg.OperationalError as e:
             raise ConnectError(*e.args) from e
 
     def quote(self, s: str):
@@ -722,9 +722,9 @@ def _parse_type(
         return UnknownColType(type_repr)
 
 
-class Redshift(Postgres):
+class Redshift(PostgreSQL):
     NUMERIC_TYPES = {
-        **Postgres.NUMERIC_TYPES,
+        **PostgreSQL.NUMERIC_TYPES,
         "double": Float,
         "real": Float,
     }
@@ -1005,7 +1005,7 @@ def match_path(self, dsn):
 
 
 MATCH_URI_PATH = {
-    "postgres": MatchUriPath(Postgres, ["database?"], help_str="postgres://<user>:<pass>@<host>/<database>"),
+    "postgresql": MatchUriPath(PostgreSQL, ["database?"], help_str="postgresql://<user>:<pass>@<host>/<database>"),
     "mysql": MatchUriPath(MySQL, ["database?"], help_str="mysql://<user>:<pass>@<host>/<database>"),
     "oracle": MatchUriPath(Oracle, ["database?"], help_str="oracle://<user>:<pass>@<host>/<database>"),
     "mssql": MatchUriPath(MsSQL, ["database?"], help_str="mssql://<user>:<pass>@<host>/<database>"),
@@ -1034,7 +1034,7 @@ def connect_to_uri(db_uri: str, thread_count: Optional[int] = 1) -> Database:
     Note: For non-cloud databases, a low thread-pool size may be a performance bottleneck.
 
     Supported schemes:
-    - postgres
+    - postgresql
     - mysql
     - mssql
     - oracle
diff --git a/docs/index.rst b/docs/index.rst
@@ -11,7 +11,7 @@ Introduction
 **Data-diff** is a command-line tool and Python library to efficiently diff
 rows across two different databases.
 
-⇄  Verifies across many different databases (e.g. *Postgres* -> *Snowflake*) !
+⇄  Verifies across many different databases (e.g. *PostgreSQL* -> *Snowflake*) !
 
 🔍 Outputs diff of rows in detail
 
@@ -32,11 +32,11 @@ Requires Python 3.7+ with pip.
 
     pip install data-diff
 
-or when you need extras like mysql and postgres:
+or when you need extras like mysql and postgresql:
 
 ::
 
-    pip install "data-diff[mysql,pgsql]"
+    pip install "data-diff[mysql,postgresql]"
 
 
 How to use from Python
@@ -50,7 +50,7 @@ How to use from Python
 
     from data_diff import connect_to_table, diff_tables
 
-    table1 = connect_to_table("postgres:///", "table_name", "id")
+    table1 = connect_to_table("postgresql:///", "table_name", "id")
     table2 = connect_to_table("mysql:///", "table_name", "id")
 
     for sign, columns in diff_tables(table1, table2):
diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -47,7 +47,7 @@ parameterized = "*"
 # When adding, update also: README + dev deps just above
 preql = ["preql"]
 mysql = ["mysql-connector-python"]
-pgsql = ["psycopg2"]
+postgresql = ["psycopg2"]
 snowflake = ["snowflake-connector-python"]
 presto = ["presto-python-client"]
 oracle = ["cx_Oracle"]
diff --git a/tests/common.py b/tests/common.py
@@ -6,7 +6,7 @@
 logging.basicConfig(level=logging.INFO)
 
 TEST_MYSQL_CONN_STRING: str = "mysql://mysql:Password1@localhost/mysql"
-TEST_POSTGRES_CONN_STRING: str = None
+TEST_POSTGRESQL_CONN_STRING: str = None
 TEST_SNOWFLAKE_CONN_STRING: str = None
 TEST_BIGQUERY_CONN_STRING: str = None
 TEST_REDSHIFT_CONN_STRING: str = None
@@ -26,7 +26,7 @@
 CONN_STRINGS = {
     db.BigQuery: TEST_BIGQUERY_CONN_STRING,
     db.MySQL: TEST_MYSQL_CONN_STRING,
-    db.Postgres: TEST_POSTGRES_CONN_STRING,
+    db.PostgreSQL: TEST_POSTGRESQL_CONN_STRING,
     db.Snowflake: TEST_SNOWFLAKE_CONN_STRING,
     db.Redshift: TEST_REDSHIFT_CONN_STRING,
     db.Oracle: TEST_ORACLE_CONN_STRING,
diff --git a/tests/test_database.py b/tests/test_database.py
@@ -22,7 +22,7 @@ def test_md5_to_int(self):
 class TestConnect(unittest.TestCase):
     def test_bad_uris(self):
         self.assertRaises(ValueError, connect_to_uri, "p")
-        self.assertRaises(ValueError, connect_to_uri, "postgres:///bla/foo")
+        self.assertRaises(ValueError, connect_to_uri, "postgresql:///bla/foo")
         self.assertRaises(ValueError, connect_to_uri, "snowflake://erez:erez27Snow@bya42734/xdiffdev/TEST1")
         self.assertRaises(
             ValueError, connect_to_uri, "snowflake://erez:erez27Snow@bya42734/xdiffdev/TEST1?warehouse=ha&schema=dup"
diff --git a/tests/test_database_types.py b/tests/test_database_types.py
@@ -52,7 +52,7 @@
 }
 
 DATABASE_TYPES = {
-    db.Postgres: {
+    db.PostgreSQL: {
         # https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-INT
         "int": [
             # "smallint",  # 2 bytes
diff --git a/tests/test_normalize_fields.py b/tests/test_normalize_fields.py
@@ -14,7 +14,7 @@
 logger = logging.getLogger()
 
 DATE_TYPES = {
-    db.Postgres: ["timestamp({p}) with time zone", "timestamp({p}) without time zone"],
+    db.PostgreSQL: ["timestamp({p}) with time zone", "timestamp({p}) without time zone"],
     db.MySQL: ["datetime({p})", "timestamp({p})"],
     db.Snowflake: ["timestamp({p})", "timestamp_tz({p})", "timestamp_ntz({p})"],
     db.BigQuery: ["timestamp", "datetime"],

Original file line number	Diff line number	Diff line change
`@@ -52,7 +52,7 @@`
`52`	`52`	`}`
`53`	`53`
`54`	`54`	`DATABASE_TYPES = {`
`55`		`- db.Postgres: {`
	`55`	`+ db.PostgreSQL: {`
`56`	`56`	`# https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-INT`
`57`	`57`	`"int": [`
`58`	`58`	`# "smallint", # 2 bytes`