Add pre-commit with ruff, pyproject.toml, gh lint/test actions (#124)

bsquizz · web-flow · commit e4df29d69102 · 2025-03-12T12:21:08.000-04:00
* Add pre-commit with ruff and related gh action

* Split lint/unit test into separate jobs

* Fix gh action syntax

* Fix action version

* Remove un-needed packages, switch to pyproject.toml

* Tweak pyproject.toml

* Update README, run pre-commit
diff --git a/.github/workflows/gh-actions.yml b/.github/workflows/gh-actions.yml
@@ -0,0 +1,46 @@
+name: gh-actions
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v4
+
+    - name: Setup python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.12'
+
+    - name: Run pre-commit
+      uses: pre-commit/action@v3.0.1
+
+  unit-tests:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v4
+
+    - name: Setup python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.12'
+        cache: 'pipenv'
+
+    - name: Install pipenv
+      run: curl https://raw.githubusercontent.com/pypa/pipenv/master/get-pipenv.py | python
+
+    - name: Install dependencies
+      run: pipenv install --system --deploy --dev
+
+    - name: Run pytest
+      run: |
+        pytest -v -s
diff --git a/.gitignore b/.gitignore
@@ -174,4 +174,3 @@ data/
 
 # ignore local docker settings
 docker-compose-maas.yml
-
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,18 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-added-large-files
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.9.9
+    hooks:
+      - id: ruff
+        name: lint with ruff
+      - id: ruff
+        name: sort imports with ruff
+        args: [--select, I, --fix]
+      - id: ruff-format
+        name: format with ruff
diff --git a/.vscode/launch.json b/.vscode/launch.json
@@ -37,4 +37,4 @@
             "justMyCode": false
         }
     ]
-}
+}
diff --git a/Dockerfile b/Dockerfile
@@ -53,4 +53,3 @@ EXPOSE 8000
 
 ENV PATH="$APP_ROOT/.venv/bin:$PATH"
 CMD ["flask", "run", "--host=0.0.0.0", "--port=8000"]
-
diff --git a/Pipfile b/Pipfile
@@ -30,11 +30,10 @@ pypdf2 = "*"
 scikit-learn = "*"
 
 [dev-packages]
-black = "*"
-isort = "*"
-flake8 = "*"
 ipython = "*"
 pytest = "*"
+ruff = "*"
+pre-commit = "*"
 
 [requires]
 python_version = "3.12"
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -20,6 +20,11 @@ Each agent is intended to answer questions related to a set of documents known a
   - [With Docker Compose](#with-docker-compose)
     - [Using huggingface text-embeddings-inference server to host embedding model (deprecated)](#using-huggingface-text-embeddings-inference-server-to-host-embedding-model-deprecated)
   - [Without Docker Compose](#without-docker-compose)
+- [Developer Guide](#developer-guide)
+  - [Install development packages](#install-development-packages)
+  - [Using pre-commit](#using-pre-commit)
+  - [Debugging in VSCode](#debugging-in-vscode)
+- [Mac Development Tips](#mac-development-tips)
 - [Synchronizing Documents from S3](#synchronizing-documents-from-s3)
   - [Continuous synchronization](#continuous-synchronization)
 - [Deploying to OpenShift](#deploying-to-openshift)
@@ -142,52 +147,32 @@ A development/test environment can be set up with or without docker compose. In
 
 The docker compose file offers an easy way to spin up all components. [ollama](https://ollama.com) is used to host the LLM and embedding model. For utilization of your GPU, refer to the comments in the compose file to see which configurations to uncomment on the 'ollama' container. Postgres persists the data, and pgadmin allows you to query the database.
 
-You will need Docker version 27.5.1 on Fedora 40 and 41 to be able to use docker compose (not docker-compose) and for that You will need to reinstall latest docker version from the [fedora docker repo](https://docs.docker.com/engine/install/fedora/#install-using-the-repository) or follow the instructions here. 
+1. First, install Docker: [Follow the official guide for your OS](https://docs.docker.com/engine/install/)
 
-Docker 27.5.1 is confirmed working with macOS 15.3.
+     - NOTE: Currently, the compose file does not work with `podman`.
 
-To get the correct version of docker, add the repo:
+2. On Linux, be sure to run through the [postinstall steps](https://docs.docker.com/engine/install/linux-postinstall/)
 
-```text
-sudo dnf -y install dnf-plugins-core
-sudo dnf-3 config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
-```
-
-Install the packages:
-
-```text
-sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
-```
-
-Enable:
-
-```text
-sudo systemctl enable --now docker
-```
-
-Run through the postinstall steps https://docs.docker.com/engine/install/linux-postinstall/
-
-
-1. Create the directory which will house the local environment data:
+3. Create the directory which will house the local environment data:
 
     ```text
     mkdir data
     ```
 
-1. Invoke docker compose (postgres data will persist in `data/postgres`):
+4. Invoke docker compose (postgres data will persist in `data/postgres`):
 
     ```text
     docker compose up --build
     ```
 
-1. Pull the mistral LLM and nomic embedding model (data will persist in `data/ollama`):
+5. Pull the mistral LLM and nomic embedding model (data will persist in `data/ollama`):
 
     ```text
     docker exec tangerine-ollama ollama pull mistral
     docker exec tangerine-ollama ollama pull nomic-embed-text
     ```
 
-1. Access the API on port `8000`
+6. Access the API on port `8000`
 
    ```sh
    curl -XGET 127.0.0.1:8000/api/agents
@@ -196,7 +181,7 @@ Run through the postinstall steps https://docs.docker.com/engine/install/linux-p
    }
    ```
 
-1. (optional) Follow these steps to start the [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend#with-docker-compose)
+7. (optional) Follow these steps to start the [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend#with-docker-compose)
 
 Note: You can access pgadmin at localhost:5050.
 
@@ -315,14 +300,53 @@ to use this to test different embedding models that are not supported by ollama,
 
 1. (optional) Follow these steps to start the [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend#without-docker-compose)
 
-## Debugging in VSCode
+## Developer Guide
+
+### Install development packages
+
+If desiring to make contributions, be sure to install the development packages:
+
+```sh
+pipenv install --dev
+```
+
+### Using pre-commit
+
+This project uses pre-commit to handle formatting and linting.
+
+- Before pushing a commit, you can run:
+
+  ```sh
+  pre-commit run --all
+  ```
+
+  and if it fails, check for changes the tool has made to your files.
+
+- Alternatively, you can add pre-commit as a git hook with:
+
+  ```sh
+  pre-commit install
+  ```
+
+  and pre-commit will automatically be invoked every time you create a commit.
+
+### Debugging in VSCode
 
 Run postgres and ollama either locally or in containers. Don't run the backend container. Click on "Run & Debug" in the left menu and then run the "Debug Tangerine Backend" debug target. You can now set breakpoints and inspect runtime state.
 
 There's a second debug target for the unit tests if you want to run those in a debugger.
 
 ## Mac Development Tips
-Ollama running in Docker on Apple Silicon cannot make use of hardware acceleration. That means the LLM will be very slow to respond running in Docker, even on a very capable machine. However, running the model locally does make use of acceleration and is quite fast. If you are working on a Mac the best setup is to run the model through ollama locally and then the other deps like the database in Docker. The way the compose file is set up, the networking is all seemless. If you stop the ollama container and then ollama serve locally it will all just work together. You'll have the best local development setup if you combine the model running locally and tangerine-backend running in a debugger in VSCode with postgres and pgadmin running in Docker!
+
+Ollama running in Docker on Apple Silicon cannot make use of hardware acceleration. That means the LLM will be very slow to respond running in Docker, even on a very capable machine.
+
+However, running the ollama outside of Docker does make use of acceleration and is quite fast. If you are working on a Mac the best setup is to run the model through ollama locally and continue to run the other components (like the database) in Docker. The way the compose file is set up, the networking should allow this to work without issue.
+
+Comment out `ollama` from the compose file, or stop the ollama container. Invoke `ollama serve` on your shell. For an optimal developer experience:
+
+- run tangerine-backend in a debugger in VSCode
+- run ollama directly on your host
+- run postgres/pgadmin in Docker.
 
 ## Synchronizing Documents from S3
 
@@ -350,15 +374,15 @@ To do so you'll need to do the following:
    echo 'BUCKET=mybucket' >> .env
    ```
 
-5. Create an `s3.yaml` file that describes your agents and the documents they should ingest. See [s3-example.yaml](s3-example.yaml) for an example.
+1. Create an `s3.yaml` file that describes your agents and the documents they should ingest. See [s3-example.yaml](s3-example.yaml) for an example.
 
    If using docker compose, copy this config into your container:
 
    ```text
    docker cp s3.yaml tangerine-backend:/opt/app-root/src/s3.yaml
    ```
 
-6. Run the S3 sync job:
+1. Run the S3 sync job:
 
     - With docker compose:
 
diff --git a/connectors/llm/interface.py b/connectors/llm/interface.py
@@ -81,13 +81,13 @@ def _build_context(search_results: list[Document], content_char_limit: int = 0):
             }
         )
 
-        context += f"\n<<Search result {i+1}"
+        context += f"\n<<Search result {i + 1}"
         if "title" in metadata:
             title = metadata["title"]
             context += f", document title: '{title}'"
         limit = content_char_limit if content_char_limit else len(page_content)
         search_result = page_content[0:limit]
-        context += ">>\n\n" f"{search_result}\n\n" f"<<Search result {i+1} END>>\n"
+        context += f">>\n\n{search_result}\n\n<<Search result {i + 1} END>>\n"
 
     return context, search_metadata
 
@@ -185,7 +185,7 @@ def api_response_generator():
         for data in llm_response:
             yield f"data: {json.dumps(data)}\r\n"
         # final piece of content returned is the search metadata
-        yield f"data: {json.dumps({"search_metadata": search_metadata})}\r\n"
+        yield f"data: {json.dumps({'search_metadata': search_metadata})}\r\n"
 
     if stream:
         log.debug("streaming response...")
diff --git a/file_upload_cli.py b/file_upload_cli.py
@@ -41,9 +41,9 @@ def upload_files(source, directory_path, url, agent_id, html, bearer_token):
         )
 
         if response.status_code == 200:
-            print(f"Batch {i+1}/{num_batches} uploaded successfully.")
+            print(f"Batch {i + 1}/{num_batches} uploaded successfully.")
         else:
-            print(f"Error uploading batch {i+1}/{num_batches}: {response.text}")
+            print(f"Error uploading batch {i + 1}/{num_batches}: {response.text}")
 
 
 if __name__ == "__main__":
diff --git a/json/quality_detection_training.json b/json/quality_detection_training.json
@@ -48,4 +48,4 @@
     {"text": "[toc]", "label": "junk"},
     {"text": "", "label": "junk"}
 
-]
+]
diff --git a/pgadmin/pgadmin-servers.json b/pgadmin/pgadmin-servers.json
@@ -12,4 +12,4 @@
             "ConnectNow": true
         }
     }
-}
+}
diff --git a/pgadmin/pgpassfile b/pgadmin/pgpassfile
@@ -1 +1 @@
-postgres:5432:citrus:citrus:citrus
+postgres:5432:citrus:citrus:citrus
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,7 @@
+[tool.pytest.ini_options]
+addopts = ["--ignore=data/"]
+
+[tool.ruff]
+line-length = 100
+indent-width = 4
+target-version = "py312"
diff --git a/pytest.ini b/pytest.ini
diff --git a/setup.cfg b/setup.cfg

Original file line number	Diff line number	Diff line change
`@@ -174,4 +174,3 @@ data/`
`174`	`174`
`175`	`175`	`# ignore local docker settings`
`176`	`176`	`docker-compose-maas.yml`
`177`		`-`
Original file line number	Diff line number	Diff line change
`@@ -37,4 +37,4 @@`
`37`	`37`	`"justMyCode": false`
`38`	`38`	`}`
`39`	`39`	`]`
`40`		`-}`
	`40`	`+}`
Original file line number	Diff line number	Diff line change
`@@ -53,4 +53,3 @@ EXPOSE 8000`
`53`	`53`
`54`	`54`	`ENV PATH="$APP_ROOT/.venv/bin:$PATH"`
`55`	`55`	`CMD ["flask", "run", "--host=0.0.0.0", "--port=8000"]`
`56`		`-`
Original file line number	Diff line number	Diff line change
`@@ -41,9 +41,9 @@ def upload_files(source, directory_path, url, agent_id, html, bearer_token):`
`41`	`41`	`)`
`42`	`42`
`43`	`43`	`if response.status_code == 200:`
`44`		`- print(f"Batch {i+1}/{num_batches} uploaded successfully.")`
	`44`	`+ print(f"Batch {i + 1}/{num_batches} uploaded successfully.")`
`45`	`45`	`else:`
`46`		`- print(f"Error uploading batch {i+1}/{num_batches}: {response.text}")`
	`46`	`+ print(f"Error uploading batch {i + 1}/{num_batches}: {response.text}")`
`47`	`47`
`48`	`48`
`49`	`49`	`if __name__ == "__main__":`
Original file line number	Diff line number	Diff line change
`@@ -12,4 +12,4 @@`
`12`	`12`	`"ConnectNow": true`
`13`	`13`	`}`
`14`	`14`	`}`
`15`		`-}`
	`15`	`+}`
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-postgres:5432:citrus:citrus:citrus`
	`1`	`+postgres:5432:citrus:citrus:citrus`