Dependencies: Stop bundling client libraries #526

santi · 2024-04-04T07:51:57Z

When installing some of our community modules through extras, we usually bundle a client library for interacting with the service provided from the container. In some cases this is nice, as there is only a single, official client library available for that service (E.g. qdrant-client).

However, in many cases there is no single client library that makes for the best choice. Some examples are sqlalchemy in combination with other drivers for connecting to databses, where you have the choice between multiple drivers (such as pyodbc and pymssql for SQL Server, pg8000 and psycopg for Postgres and so on). Many use cases don't even use sqlalchemy for their database connection, and we are thus forcing an install of one or more useless packages in their builds.

Therefore I suggest that we remove all dependencies for client libraries in our extras and put the responsibility for connecting to the containers with their chosen framework on the users of tc-python. We will still keep client libraries as part of our dev dependencies, so that we can provide usage examples in our doctests and other documentation, but simply installing testcontainers-python with some extras will no longer give you a large bundle of dependencies you might not need.

Where possible, we will still include and maintain methods such as get_connection_string (and variants, where we should probably standardize on a naming convention for consistency) for easy usage with client libraries. Example usage with client libraries also become more important, but in those examples we always show imports of the client library used, so it should be quite self-explanatory that you need a client to connect to a service.

This work has already been partly started, and contain useful discussion around the issue. See here for the work done on removing sqlalchemy from Postgres.

The text was updated successfully, but these errors were encountered:

bearrito · 2024-04-05T00:48:08Z

I had an approach for the recently added nats module, ultimately we just included the standard client. I'll throw it out as an idea. Namely if you want to run the test using the semi-canonical client it works.

NO_NATS_CLIENT = True
try:
    from nats import connect as nats_connect
    from nats.aio.client import Client as NATSClient

    NO_NATS_CLIENT = False
except ImportError:
    pass
    
    
 @pytest.mark.skipif(NO_NATS_CLIENT, reason="No NATS Client Available")
async def test_pubsub():  
  ... elided ...

Couple other thoughts

One thing I see the Postgres stuff doing is execing into the containers and running commands. It isn't obvious all the services have this sort of ability, and if they do if the required libraries are included in their respective containers.
How do you test that the connstring is correct?
Maybe folks have better tooling and dev workflows than me, but I think a contributor would be less likely to have good code coverage if they are implemented as doc tests.

santi · 2024-04-05T07:15:08Z

I had an approach for the recently added nats module, ultimately we just included the standard client. I'll throw it out as an idea. Namely if you want to run the test using the semi-canonical client it works.
NO_NATS_CLIENT = True
try:
    from nats import connect as nats_connect
    from nats.aio.client import Client as NATSClient

    NO_NATS_CLIENT = False
except ImportError:
    pass
    
    
 @pytest.mark.skipif(NO_NATS_CLIENT, reason="No NATS Client Available")
async def test_pubsub():  
  ... elided ...

I still think we should include the client libraries as dev dependencies, to make it easier to test in general and to test that common usage patterns are working as intended. However, all dev dependencies should be optional, so that they are not installed automatically when installing the testcontainers package itself.

Couple other thoughts

One thing I see the Postgres stuff doing is execing into the containers and running commands. It isn't obvious all the services have this sort of ability, and if they do if the required libraries are included in their respective containers.

Yupp, and if the images come bundled with tools intended for readiness checking I think we can utilize that more to actually perform readiness checking. Parsing logs are okay, but not always ideal, especially if there is no message where it is abundantly clear that the message is meant to indicate that the container is ready.

How do you test that the connstring is correct?

I think we misunderstood each other a little here. I still would like to include client libraries as dev dependencies, so that they can be used for tests.

Maybe folks have better tooling and dev workflows than me, but I think a contributor would be less likely to have good code coverage if they are implemented as doc tests.

Agreed. doctests in my opinion is only for showing usage of the classes/modules, and the doctests themselves are only there to ensure that the example actually works. With little syntax highlighting and help from your editor, the doctests often end up invalid, and the doctests help prevent that

…ault wait timeout for `wait_for_logs` (#525) Removes usage of `sqlalchemy`, as part of the work described in #526. - Adds default timeout to the `wait_for_logs` waiting strategy, the same timeout used by default in the `wait_container_is_ready` strategy. - Changes wait strategy for `mysql` container to wait for logs indicating that the DB engine is ready to accept connections (MySQL performs a restart as part of its startup procedure, so the logs will always appear twice. - Add More tests for different `mysql` and `mariadb` versions to ensure consistency in wait strategy. - Remove x86 emulation for ARM devices for MariaDB, as it MariaDB images support ARM architectures already. - Change wait strategy for `oracle-free`, as the images produce a consistent `DATABASE IS READY TO USE!` log message on startup. Next steps will be to remove `sqlalchemy` as a bundled dependency entirely, but I have not included it in this PR as I consider it a bigger change than just changing wait strategies as an internal implementation detail. I plan to do this as part of a bigger rework where i remove the `DbContainer` class and standardize configuration hooks and wait strategies across containers (not just DB containers, all containers in need of a configuration and readiness step). See #527 for WIP. --------- Co-authored-by: David Ankin <[email protected]>

tharwan · 2024-10-21T12:25:38Z

On my machine (😅) the removal of the client library for testing the connection to the container makes the tests unreliable/broken.

I have to very simple fixtures:

@pytest.fixture(scope="module")
def db_container():
    postgres = PostgresContainer("postgres:14.7")
    # we set the time zone to equal to the prod system
    postgres.with_env("TZ", "Europe/Berlin")
    postgres.with_env("PGTZ", "Europe/Berlin")

    with postgres:
        logger.info("setup db container")
        yield postgres
        logger.info("teardown db container")


@pytest.fixture(scope="module")
def db_engine(db_container: PostgresContainer):
    time.sleep(5)  # TODO why is this a problem?
    engine = create_engine(db_container.get_connection_url())
    setup_events()
    Base.metadata.create_all(engine)
    return engine

the tests fail almost 100% of the time without the sleep in the second fixture. They also don't fail if I set a breakpoint (i.e. doing the sleep manually).

Since the script in the PostgresContainer._connect function is entirely run inside the container, I assume that on my setup it takes more time than usual for the network to ready. So running things in the container works before it can be accessed externally.

I guess the best workaround is to have my own subclass that uses sqlalchemy in its own _connect implementation.

alexanderankin · 2024-10-21T12:42:28Z

@tharwan what version are you on? latest version changed back to pg_isready over watching for logs due to issues with localization. previously also there was an issue where the python implementation was watching for the ready message on both stdout and stderr, we changed it then to look for it on both (container will log first to stderr during setup step then logs to stdout on regular startup - the bug was that it accepted the one during setup and then tests would fail because it hadn't started up after setting up - the fix was to add an argument to watch for BOTH vs OR).

as a side note, in the setup method i usually run migrations - so that the database is confirmed ready before tests begin, like spring boot does for example

tharwan · 2024-10-21T13:24:22Z

@alexanderankin you are correct I was not on latest, but I assumed I was. I ran 4.5.1.

tharwan · 2024-10-21T13:28:21Z

But I am getting the same with 4.8.2

alexanderankin · 2024-10-21T13:52:47Z

here is my reproducer, trying without sleep now

steps

# mkdir tharwan-pg-issue
# cd !$
# python -m venv .venv && . .venv/bin/activate
# pip install pytest testcontainers sqlalchemy psycopg2-binary

import time
from logging import getLogger

import pytest
from sqlalchemy import Connection, Engine, create_engine
from sqlalchemy.sql import text
from testcontainers.postgres import PostgresContainer

logger = getLogger("test")


@pytest.fixture(scope="module")
def db_container():
    postgres = PostgresContainer("postgres:16-alpine")
    # we set the time zone to equal to the prod system
    postgres.with_env("TZ", "Europe/Berlin")
    postgres.with_env("PGTZ", "Europe/Berlin")

    with postgres:
        logger.info("setup db container")
        yield postgres
        logger.info("teardown db container")


@pytest.fixture(scope="module")
def db_engine(db_container: PostgresContainer) -> Engine:
    time.sleep(5)  # TODO why is this a problem?
    engine = create_engine(db_container.get_connection_url())
    # setup_events()
    # Base.metadata.create_all(engine)
    return engine


def test_it(db_engine: Engine):
    connection: Connection = db_engine.engine.connect()
    connection.execute(text("create table if not exists example(id serial not null primary key, name varchar(150) unique);"))
    connection.execute(text("insert into example(name) values ('hello'), ('world');"))
    rows = connection.execute(text("select * from example"))
    rows = list(rows)
    assert len(rows) == 2
    for each_row in rows:
        print(each_row)

alexanderankin · 2024-10-21T14:13:08Z

okay so i inserted sleep statements into this file

here - https://github.com/docker-library/postgres/blob/0f80628c99c56c96db59a6c774db4fca4b27f5cd/docker-entrypoint.sh#L334

and here - https://github.com/docker-library/postgres/blob/0f80628c99c56c96db59a6c774db4fca4b27f5cd/docker-entrypoint.sh#L351

seems like pg_isready suffers from the same as the thing - and the logs fix that was implemented for people using postgres variants and other locales breaks postgres startup again.

but we have already implemented the fix for pg_isready - we pass the --host argument to force it to talk over tcp. the temporary server only listens on sockets. so this distinguishes the first and second phases of startup.

so idk, this is all my hypotheses tested and no new changes seem to be necessary/possible

santi added the 🚀 enhancement label Apr 4, 2024

santi self-assigned this Apr 4, 2024

santi mentioned this issue Apr 4, 2024

fix(dependencies): remove usage of sqlalchemy in DB extras. Add default wait timeout for wait_for_logs #525

Merged

santi mentioned this issue Apr 5, 2024

fix(vault): add support for HashiCorp Vault container #366

Merged

LyndonFan mentioned this issue Apr 17, 2024

Bug: Database fields not escaped when building URL #547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependencies: Stop bundling client libraries #526

Dependencies: Stop bundling client libraries #526

santi commented Apr 4, 2024

bearrito commented Apr 5, 2024 •

edited

Loading

santi commented Apr 5, 2024

tharwan commented Oct 21, 2024

alexanderankin commented Oct 21, 2024 •

edited

Loading

tharwan commented Oct 21, 2024

tharwan commented Oct 21, 2024

alexanderankin commented Oct 21, 2024 •

edited

Loading

alexanderankin commented Oct 21, 2024 •

edited

Loading

Dependencies: Stop bundling client libraries #526

Dependencies: Stop bundling client libraries #526

Comments

santi commented Apr 4, 2024

bearrito commented Apr 5, 2024 • edited Loading

santi commented Apr 5, 2024

tharwan commented Oct 21, 2024

alexanderankin commented Oct 21, 2024 • edited Loading

tharwan commented Oct 21, 2024

tharwan commented Oct 21, 2024

alexanderankin commented Oct 21, 2024 • edited Loading

alexanderankin commented Oct 21, 2024 • edited Loading

bearrito commented Apr 5, 2024 •

edited

Loading

alexanderankin commented Oct 21, 2024 •

edited

Loading

alexanderankin commented Oct 21, 2024 •

edited

Loading

alexanderankin commented Oct 21, 2024 •

edited

Loading