Skip to content

Commit 729e657

Browse files
authored
[ENH] Simplify auth and correctly overwrite singleton tenant+db (chroma-core#1970)
## Reviewer notes - I would recommend avoiding the diff and reviewing the end state as if it were fresh code. Trying to understand the diff will lead to madness. - Crossed-out notes left for posterity but outdated. - <s>`ServerAuthenticationProvider` returns `None` if it fails but `ServerAuthorizationProvider` raises an error -- should we standardize this behavior? - The FastAPI boilerplate in `chromadb/server/fastapi/__init__.py` is a little unwieldy. I'm fine with it since this file is mostly boilerplate and we very infrequently update it, but I'm open to ideas for how to make it cleaner. - However, everything should probably be async yeah? - `ServerAuthenticationProvider` now has the job of overwriting singleton tenant and database if the setting is specified and it's possible. This leads to a bit of weirdness where we have to call into this twice -- once in the authn/authz flow and then again in the core request flow once authn/authz is complete. Two questions about this: - Should the overwrite functionality live somewhere else? `ServerAuthenticationProvider` is the arbiter of user identity so it seems like a natural place in theory.. - Should `authenticate_and_authorize_or_raise()` just return the new tenant and db if they exist? That feels like a weird mix of responsibilities but would mean we only need to call the method once. The method is very very cheap though.</s> ## Description of changes ### Summary and end state - This PR takes away two minor functionalities from users. Other than these, it does not reduce user abilities. - If anyone overrode `ClientAuthProtocolAdapter` to periodically re-authenticate their clients (unlikely imo) they are no longer able to do so. If someone tells us they have actually been doing this it will be easy to add re-auth functionality back. - We don’t offer a baked-in auth scheme which requires re-authentication. - Users now must store auth info (e.g. usernames and passwords) in files instead of passing them directly as config to their `Client` or `Server`. I’m open to adding this back. - Client auth is simpler. It now all takes place in a single configurable `ClientAuthProvider` class. - Server authn is simpler. It now all takes place in a single configurable `ServerAuthnProvider`. - Server authz is simpler. It now all takes place in a single configurable `ServerAuthzProvider`. - Server authn and authz are now explicitly executed as part of each API handler. This is also where we overwrite request tenant and db if we configured to. See `authenticate_and_authorize_or_raise` in `chromadb/server/fastapi/__init__.py`. - Overwriting singleton tenant and db from auth is now much clearer and well-tested. ### Overall changes - Organized `Settings`. - Wrote or cleaned up docstrings for all auth-related classes. - Best-effort clean up of every formatting and linting error in every file I touched. - Combined `chroma_server_auth_token_transport_header` and `chroma_client_server_auth_transport_header` into `chroma_auth_token_transport_header`. - Deleted `AuthenticationError` as it was unused. - Deleted the registry (`registry.py`). It’s a pattern we should either have across our entire codebase or nowhere — as it stands, it only served to make the code harder to read while barely simplifying users’ configs. - Renamed everything server-side to “authn” or “authz”. Nothing server-side has only “auth” in it now unless it applies to both (e.g. `chroma_auth_token_transport_header`, `chroma_server_auth_ignore_paths`). - I eliminated all server middleware. I would like to use it, but that would require doing deep request inspection (especially once we enable e.g. collection-level auth). This would mean we essentially maintain a completely separate schema of paths and request fields — it seems best in the short, medium, and long runs to explicitly do authentication and authorization as part of handler bodies. ### Client-side - Deleted `ClientAuthConfigurationProvider` as it was unused. - Deleted `ClientAuthCredentialsProvider` as it was extremely small and only ever used by `ClientAuthProvider`. Folded its functionality into `ClientAuthProvider`. - Deleted `AuthInfoType` as it was doing nothing (we only supported `AuthInfoType.HEADER` in practice). - Deleted `ClientAuthResponse` as it was just a dict for headers. Replaced with `ClientAuthHeaders` type. - Deleted `ClientAuthProtocolAdapter` as it was just a complicated way to call `ClientAuthProvider.authenticate()` and inject headers. Folded this functionality into `ClientAuthProvider` and the FastAPI client directly. - Deleted `TokenAuthHeader` as it was only used in one place (`TokenAuthClientProvider.authenticate()`). Folded it into said place. ### Server-side authn - Configuration - Deleted `chroma_server_auth_configuration_provider` as it was unused. - Deleted `chroma_server_auth_configuration_file` as it was unused. - Deleted `ServerAuthConfigurationProvider` as it was unused. - This leaves no configuration besides the creds file. This is fine. There’s no configuration for our built-in authn providers. If users create their own, they can configure them as they’d like with files or env vars. - Credentials - Deleted `chroma_server_auth_credentials` — we now require users to set up auth through a credentials file. - Deleted `HtpasswdConfigurationServerAuthCredentialsProvider` and `TokenConfigServerAuthCredentialsProvider` as they are unused now that `chroma_server_auth_credentials` is gone. - With `HtpasswdConfigurationServerAuthCredentialsProvider` gone, the only subclass of `HtpasswdServerAuthCredentialsProvider` was `HtpasswdFileServerAuthCredentialsProvider` so I combined them. - User identity + request representation - Deleted `UserIdentity` as there was only one instance of it (`SimpleUserIdentity`). - Renamed `SimpleUserIdenity` to `UserIdentity` and made it a `dataclass`. - If anyone needs to, they can subclass this and add whatever they want. - Deleted `AuthorizationRequestContext`. It was just a wrapper around the request and nothing else. - Request and Response representation - Deleted `ServerAuthenticationRequest` as it was only ever used to wrap a dict lookup in several layers of indirection. Replaced with raw starlette `Headers`. - Deleted the `ServerAuthenticationResponse` abstraction as it was only ever used as a `SimpleServerAuthenticationResponse` and `FastAPIServerAuthenticationResponse`, two subclasses which contained the same data. - Replaced all instances of `ServerAuthenticationResponse`s with an `Optional[UserIdentity]` which is all they really were. - Middleware - Deleted `ChromaAuthMiddleware`, `FastAPIChromaAuthMiddleware` and `FastAPIChromaAuthMiddlewareWrapper` in favor of explicit authn within request handlers. - `ServerAuthProviders`, `AbstractCredentials`, and `ServerAuthCredentialsProviders` - We previously had two each of `SecretStrAbstractCreds`, `ServerAuthProvider`, and `ServerAuthCredentialsProvider`: one `Basic` and one `Token`. (So we had `BasicAuthCredentials` + `TokenAuthCredentials`, `BasicAuthServerProvider` + `TokenAuthServerProvider`, etc.) You could only ever use all `Basic` or all `Token`; they didn’t mix and match as they had a lot of shared assumptions about e.g. which keys existed in which dicts. - To make things extra confusing, the `BasicAuthServerCredentialsProvider` was actually called `HtpasswdServerAuthCredentialsProvider` and lived in `providers.py`. - I deleted all the `Credentials` classes and `AuthCredentialsProvider` classes. Folded all this functionality into two auth providers: `BasicAuthServerProvider` and `TokenAuthServerProvider`. - Deleted `chroma_server_auth_credentials_provider` since we no longer have separate `ServerAuthCredentialsProvider`s. - Added functionality to `ServerAuthenticationProvider` to get a user’s singleton tenant and database if they exist and if the relevant setting is `True`. ### Server-side authz - Deleted `chroma_server_authz_config` as it seems like nobody was using it. - Deleted `chroma_server_authz_ignore_paths` since it is pretty much the same as `chroma_server_auth_ignore_paths`. We now use `chroma_server_auth_ignore_paths` for both. - Deleted `AuthzUser` as it was identical to `UserIdentity` except for one unused field. - Deleted `ChromaAuthzMiddleware`, `FastAPIChromaAuthzMiddleware` and `FastAPIChromaAuthzMiddlewareWrapper` in favor of explicit authz within request handlers. - Deleted `AuthzResourceTypes` as it only ever contained data which was trivially derivable from the action being taken and was only used to print. - Modified `AuthzResource` to explicitly have all the fields it may need. We can extend it in the future, or users can override it. - Deleted `AuthzAction` as it was only ever used to print an error string and only ever contained information included in `AuthzResourceActions`. - Renamed `AuthzResourceActions` to `AuthzAction` since it’s just an enum representing the set of actions we might want to run authz checks on. - Deleted `AuthorizationContext` as it was just a wrapper around some args. We pass args directly now. - Deleted `LocalUserConfigAuthorizationConfigurationProvider` as all it did was load a config file and it was only ever used in `SimpleRBACAuthorizationProvider`. Folded its functionality (opening a file) into `SimpleRBACAuthorizationProvider`. - Deleted all the dynamic auth stuff. We now explicitly do authn and authz in request handlers. ### Testing - Added tests for the new `ServerAuthenticationProvider` behavior (deciding which paths to ignore for auth based on config, and finding a user’s singleton tenant and database if `chroma_overwrite_singleton_tenant_database_access_from_auth` = `True`. - Added tests for the newly supported multiple-usernames-and-passwords-in-htpasswd-file flow. - Split out everything to manage token + RBAC config for tests into `chromadb/test/auth/strategies.py`. Open to a different home for these utilities. - Simplified the token authn and rbac authz tests. - Wrote a new property test for `ServerAuthenticationProvider's` new functionality to specify a tenant and database to overwrite those passed by the user. We want to make sure this is done correctly on all current and future API endpoints. ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?*
1 parent f44d9e8 commit 729e657

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+3014
-3547
lines changed

Diff for: .github/workflows/chroma-integration-test.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,4 @@ jobs:
3737
- name: Install test dependencies
3838
run: python -m pip install -r requirements.txt && python -m pip install -r requirements_dev.txt
3939
- name: Integration Test
40-
run: bin/integration-test ${{ matrix.testfile }}
40+
run: bin/python-integration-test ${{ matrix.testfile }}

Diff for: .github/workflows/chroma-client-integration-test.yml renamed to .github/workflows/chroma-python-client-integration-test.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Chroma Client Integration Tests
1+
name: Chroma Python Client Integration Tests
22

33
on:
44
push:
@@ -27,5 +27,5 @@ jobs:
2727
python-version: ${{ matrix.python }}
2828
- name: Install test dependencies
2929
run: python -m pip install -r requirements.txt && python -m pip install -r requirements_dev.txt
30-
- name: Test
30+
- name: Python Client Test
3131
run: clients/python/integration-test.sh
+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: Chroma Typescript Client Integration Tests
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
pull_request:
8+
branches:
9+
- main
10+
- '**'
11+
workflow_dispatch:
12+
13+
jobs:
14+
test:
15+
timeout-minutes: 90
16+
runs-on: ubuntu-latest
17+
steps:
18+
- name: Checkout
19+
uses: actions/checkout@v3
20+
- name: TS Client Test
21+
run: bin/ts-integration-test

Diff for: .gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ go/bin/
66
go/**/testdata/
77
go/coordinator/bin/
88

9+
clients/js/package-lock.json
10+
911
*.log
1012

1113
**/data__nogit
@@ -19,6 +21,8 @@ index_data
1921
chroma/
2022
chroma_test_data/
2123
server.htpasswd
24+
server.authz
25+
server.authn
2226

2327
.venv
2428
venv

Diff for: .vscode/settings.json

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
{
22
"git.ignoreLimitWarning": true,
33
"editor.rulers": [
4-
88
4+
79
55
],
66
"editor.formatOnSave": true,
77
"python.formatting.provider": "black",
8+
"python.formatting.blackArgs": [
9+
"--line-length",
10+
"79"
11+
],
812
"files.exclude": {
913
"**/__pycache__": true,
1014
"**/.ipynb_checkpoints": true,

Diff for: bin/python-integration-test

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/usr/bin/env bash
2+
3+
set -e
4+
5+
export CHROMA_PORT=8000
6+
7+
function cleanup {
8+
docker compose -f docker-compose.test.yml down --rmi local --volumes
9+
}
10+
11+
trap cleanup EXIT
12+
13+
docker compose -f docker-compose.test.yml up --build -d
14+
15+
export CHROMA_INTEGRATION_TEST_ONLY=1
16+
export CHROMA_API_IMPL=chromadb.api.fastapi.FastAPI
17+
export CHROMA_SERVER_HOST=localhost
18+
export CHROMA_SERVER_HTTP_PORT=8000
19+
export CHROMA_SERVER_NOFILE=65535
20+
21+
echo testing: python -m pytest "$@"
22+
python -m pytest "$@"
23+
24+
docker compose down

Diff for: bin/integration-test renamed to bin/ts-integration-test

+10-23
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,7 @@
22

33
set -e
44

5-
export CHROMA_PORT=8000
6-
75
function cleanup {
8-
docker compose -f docker-compose.test.yml down --rmi local --volumes
96
rm server.htpasswd .chroma_env
107
}
118

@@ -15,25 +12,22 @@ function setup_auth {
1512
basic)
1613
docker run --rm --entrypoint htpasswd httpd:2 -Bbn admin admin > server.htpasswd
1714
cat <<EOF > .chroma_env
18-
CHROMA_SERVER_AUTH_CREDENTIALS_FILE="/chroma/server.htpasswd"
19-
CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.providers.HtpasswdFileServerAuthCredentialsProvider"
20-
CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.basic.BasicAuthServerProvider"
15+
CHROMA_SERVER_AUTHN_CREDENTIALS_FILE="/chroma/server.htpasswd"
16+
CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.basic_authn.BasicAuthenticationServerProvider"
2117
EOF
2218
;;
2319
token)
2420
cat <<EOF > .chroma_env
25-
CHROMA_SERVER_AUTH_CREDENTIALS="test-token"
26-
CHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER="AUTHORIZATION"
27-
CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
28-
CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
21+
CHROMA_AUTH_TOKEN_TRANSPORT_HEADER="AUTHORIZATION"
22+
CHROMA_SERVER_AUTHN_CREDENTIALS="test-token"
23+
CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.token_authn.TokenAuthenticationServerProvider"
2924
EOF
3025
;;
3126
xtoken)
3227
cat <<EOF > .chroma_env
33-
CHROMA_SERVER_AUTH_CREDENTIALS="test-token"
34-
CHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER="X_CHROMA_TOKEN"
35-
CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
36-
CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
28+
CHROMA_AUTH_TOKEN_TRANSPORT_HEADER="X_CHROMA_TOKEN"
29+
CHROMA_SERVER_AUTHN_CREDENTIALS="test-token"
30+
CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.token_authn.TokenAuthenticationServerProvider"
3731
EOF
3832
;;
3933
*)
@@ -45,25 +39,18 @@ EOF
4539

4640
trap cleanup EXIT
4741

48-
docker compose -f docker-compose.test.yml up --build -d
49-
5042
export CHROMA_INTEGRATION_TEST_ONLY=1
5143
export CHROMA_API_IMPL=chromadb.api.fastapi.FastAPI
5244
export CHROMA_SERVER_HOST=localhost
45+
export CHROMA_PORT=8000
5346
export CHROMA_SERVER_HTTP_PORT=8000
5447
export CHROMA_SERVER_NOFILE=65535
5548

56-
echo testing: python -m pytest "$@"
57-
python -m pytest "$@"
58-
5949
cd clients/js
60-
6150
# moved off of yarn to npm to fix issues with jackspeak/cliui/string-width versions #1314
6251
npm install
63-
npm run test:run
64-
65-
docker compose down
6652
cd ../..
53+
6754
for auth_type in basic token xtoken; do
6855
echo "Testing $auth_type auth"
6956
setup_auth "$auth_type"

Diff for: chromadb/__init__.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import logging
33
from chromadb.api.client import Client as ClientCreator
44
from chromadb.api.client import AdminClient as AdminClientCreator
5-
from chromadb.auth.token import TokenTransportHeader
5+
from chromadb.auth.token_authn import TokenTransportHeader
66
import chromadb.config
77
from chromadb.config import DEFAULT_DATABASE, DEFAULT_TENANT, Settings
88
from chromadb.api import AdminAPI, ClientAPI
@@ -246,9 +246,9 @@ def CloudClient(
246246
# Always use SSL for cloud
247247
settings.chroma_server_ssl_enabled = enable_ssl
248248

249-
settings.chroma_client_auth_provider = "chromadb.auth.token.TokenAuthClientProvider"
249+
settings.chroma_client_auth_provider = "chromadb.auth.token_authn.TokenAuthClientProvider"
250250
settings.chroma_client_auth_credentials = api_key
251-
settings.chroma_client_auth_token_transport_header = (
251+
settings.chroma_auth_token_transport_header = (
252252
TokenTransportHeader.X_CHROMA_TOKEN.name
253253
)
254254

Diff for: chromadb/api/fastapi.py

+8-23
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,6 @@
3333
from chromadb.auth import (
3434
ClientAuthProvider,
3535
)
36-
from chromadb.auth.providers import RequestsClientAuthProtocolAdapter
37-
from chromadb.auth.registry import resolve_provider
3836
from chromadb.config import DEFAULT_DATABASE, DEFAULT_TENANT, Settings, System
3937
from chromadb.telemetry.opentelemetry import (
4038
OpenTelemetryClient,
@@ -114,33 +112,20 @@ def __init__(self, system: System):
114112
default_api_path=system.settings.chroma_server_api_default_path,
115113
)
116114

115+
self._session = requests.Session()
116+
117117
self._header = system.settings.chroma_server_headers
118-
if (
119-
system.settings.chroma_client_auth_provider
120-
and system.settings.chroma_client_auth_protocol_adapter
121-
):
122-
self._auth_provider = self.require(
123-
resolve_provider(
124-
system.settings.chroma_client_auth_provider, ClientAuthProvider
125-
)
126-
)
127-
self._adapter = cast(
128-
RequestsClientAuthProtocolAdapter,
129-
system.require(
130-
resolve_provider(
131-
system.settings.chroma_client_auth_protocol_adapter,
132-
RequestsClientAuthProtocolAdapter,
133-
)
134-
),
135-
)
136-
self._session = self._adapter.session
137-
else:
138-
self._session = requests.Session()
139118
if self._header is not None:
140119
self._session.headers.update(self._header)
141120
if self._settings.chroma_server_ssl_verify is not None:
142121
self._session.verify = self._settings.chroma_server_ssl_verify
143122

123+
if system.settings.chroma_client_auth_provider:
124+
self._auth_provider = self.require(ClientAuthProvider)
125+
_headers = self._auth_provider.authenticate()
126+
for header, value in _headers.items():
127+
self._session.headers[header] = value.get_secret_value()
128+
144129
@trace_method("FastAPI.heartbeat", OpenTelemetryGranularity.OPERATION)
145130
@override
146131
def heartbeat(self) -> int:

0 commit comments

Comments
 (0)