Skip to content

kameshsampath/trino-localstack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Polaris Starter Kit with LocalStack on k3s

This starter kit provides a complete development environment for Apache Polaris with LocalStack integration running on k3s Kubernetes. It includes automated setup of PostgreSQL metastore, S3 integration via LocalStack, and all necessary configurations for immediate development use. The kit uses Kustomize for Kubernetes deployments and provides utilities for secure key generation and credential management.

Key features:

  • Automated k3s cluster setup with k3d
  • Integrated LocalStack for AWS S3 emulation
  • PostgreSQL metastore configuration
  • Ansible Playbooks for setup and configuration
  • Use trino to work with Iceberg

Prerequisites

Important Ensure the tools are downloaded and on your path before proceeding further with this tutorial.

Get the Sources

Clone the repository:

git clone https://github.com/kameshsampath/trino-localstack
cd trino-localstack

Set up environment variables:

export PROJECT_HOME="$PWD"
export KUBECONFIG="$PWD/.kube/config"
export K3D_CLUSTER_NAME=trino-localstack
export K3S_VERSION=v1.32.1-k3s1
export FEATURES_DIR="$PWD/k8s"

Going forward we will refer to the cloned sources folder as $PROJECT_HOME.

Python Environment Setup

Install the uv tool:

# Using pip
pip install uv

# Or using curl (Unix-like systems)
curl -LsSf https://astral.sh/uv/install.sh | sh

Set up Python environment:

# Pin python version
uv python pin 3.12
# Install and set up Python environment
uv venv
# On Unix-like systems
source .venv/bin/activate
# Install deps/packages
uv sync

Tip Use tools like direnv to make it easy setting environment variables

DNSmasq (Optional)

For seamless access of services with the local k3s cluster and host, we might need to add entries in /etc/hosts of the host. But using dnsmasq is a much cleaner and neater way.

Assuming you have dnsmasq installed, here is what is needed to set that up on macOS:

echo "address=/.localstack/127.0.0.1" >> $(brew --prefix)/etc/dnsmasq.conf
cat <<EOF | sudo tee /etc/resolver/localstack
nameserver 127.0.0.1
EOF

Prepare for Deployment

The following script will generate the required sensitive files from templates using Ansible:

ansible-playbook $PROJECT_HOME/polaris-forge-setup/prepare.yml

Create the Cluster

Run the cluster setup script:

$PROJECT_HOME/bin/setup.sh

Once the cluster is started, wait for the deployments to be ready:

ansible-playbook  $PROJECT_HOME/polaris-forge-setup/cluster_checks.yml --tags=bootstrap

The cluster will deploy localstack and postgresql. You can verify them as shown:

Deploy Apache Polaris

kubectl apply -k $PROJECT_HOME/k8s/polaris

Ensure all deployments and jobs have succeeded:

ansible-playbook  $PROJECT_HOME/polaris-forge-setup/cluster_checks.yml --tags polaris

Available Services

Service URL Default Credentials
Polaris UI http://localhost:18181 $PROJECT_HOME/k8s/polaris/.bootstrap-credentials.env
Adminer http://localhost:18080 PostgreSQL host will be: postgresql.polaris, check $FEATURES_DIR/postgresql.yaml for credentials
LocalStack http://localhost:14566 Use test/test for AWS credentials with Endpoint URL as http://localhost:14566

Setup Demo Catalog

The Polaris server does not yet have any catalogs. Run the following script to set up your first catalog, principal, principal role, catalog role, and grants.

Next, we will do the following:

  • Create s3 bucket
  • Create Catalog named polardb
  • Create Principal root with Principal Role admin
  • Create Catalog Role sudo, assign the role to Principal Role admin
  • Finally, grant the Catalog Role sudo to manage catalog via CATALOG_MANAGE_CONTENT role. This will make the principals with role admin able to manage the catalog.

Setup the environment variables,

# just avoid colliding with existing AWS profiles
unset AWS_PROFILE
export AWS_ENDPOINT_URL=http://localstack.localstack:4566
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_REGION=us-east-1
ansible-playbook $PROJECT_HOME/polaris-forge-setup/catalog_setup.yml

Trino

Create trino namespace and Polaris Principal Credentials polaris-env secret in it:

ansible-playbook $PROJECT_HOME/polaris-forge-setup/prepare.yml --tags=trino

Deploy Trino

kubectl apply -f $PROJECT_HOME/k8s/trino/trino.yaml

Wait for trino pods to be ready,

ansible-playbook  $PROJECT_HOME/polaris-forge-setup/cluster_checks.yml --tags=trino

Verify Setup

Using Trino

trino --server http://localhost:18080 

List all catalogs

show catalogs;
 Catalog 
---------
 iceberg 
 system  
 tpcds   
 tpch    
(4 rows)

Create a schema in iceberg named demo_db

create schema iceberg.demo_db;

Ensure the schema is created,

show schemas from iceberg;

Create a table named fruits in iceberg.demo_db,

create table iceberg.demo_db.fruits (
  id int,
  name varchar,
  season varchar
);

THIS COMMAND FAILS WITH ERROR

Query 20250309_122744_00005_kzjfk failed: Server error: SdkClientException: Unable to execute HTTP request: Connect to polardb.localstack.localstack:4566 [polardb.localstack.localstack/127.0.0.1] failed: Connection refused

Using notebooks

Generate the Juypter notebook to verify the setup,

ansible-playbook  $PROJECT_HOME/polaris-forge-setup/catalog_setup.yml --tags=verify

Run the $PROJECT_HOME/notebooks/verify_setup.ipynb to make sure you are able to create the namespace, table, and insert some data.

To double-check if we have all our iceberg files created and committed, open https://app.localstack.cloud/inst/default/resources/s3/polardb. You should see something as shown in the screenshots below:

Localstack

Important Default Instance URL is updated as shown

Catalog Catalog Metadata Catalog Data

Your local Apache Polaris environment is ready for use. Please explore it further using or connect it with other query engines/tools like Apache Spark, Trino, Risingwave, etc.

Troubleshooting

Polaris Purge and Bootstrap

Whenever there is a need to clean and do bootstrap again, run the following sequence of commands:

kubectl patch job polaris-purge -p '{"spec":{"suspend":false}}'

Wait for purge to complete:

kubectl logs -f -n polaris jobs/polaris-purge

Scale down bootstrap and then scale it up:

kubectl delete -k k8s/polaris/job
kubectl apply -k k8s/polaris/job

Wait for bootstrap to complete successfully:

kubectl logs -f -n polaris jobs/polaris-bootstrap

A successful bootstrap will have the following text in the log:

...
Realm 'POLARIS' successfully bootstrapped.
Bootstrap completed successfully.
...

Checking for pods and services in the polaris namespace should display:

NAME                           READY   STATUS      RESTARTS   AGE
pod/polaris-694ddbb476-m2trm   1/1     Running     0          13m
pod/polaris-bootstrap-tpkh4    0/1     Completed   0          13m
pod/postgresql-0               1/1     Running     0          100m

NAME                    TYPE           CLUSTER-IP     EXTERNAL-IP             PORT(S)          AGE
service/polaris         LoadBalancer   10.43.202.93   172.19.0.3,172.19.0.4   8181:32181/TCP   13m
service/postgresql      ClusterIP      10.43.182.31   <none>                  5432/TCP         100m
service/postgresql-hl   ClusterIP      None           <none>                  5432/TCP         100m

Checking Component Logs

You can use kubectl logs to inspect the logs of various components:

Polaris Server

# Check Polaris server logs
kubectl logs -f -n polaris deployment/polaris

Bootstrap and Purge Jobs

# Check bootstrap job logs
kubectl logs -f -n polaris jobs/polaris-bootstrap

# Check purge job logs
kubectl logs -f -n polaris jobs/polaris-purge

Database

# Check PostgreSQL logs
kubectl logs -f -n polaris statefulset/postgresql

LocalStack

# Check LocalStack logs
kubectl logs -f -n localstack deployment/localstack

Common Issues

  1. If Polaris server fails to start:

    # Check events in the namespace
    kubectl get events -n polaris --sort-by='.lastTimestamp'
    
    # Check Polaris pod status
    kubectl describe pod -n polaris -l app=polaris
  2. If LocalStack isn't accessible:

    # Check LocalStack service
    kubectl get svc -n localstack
    
    # Verify LocalStack endpoints
    kubectl exec -it -n localstack deployment/localstack -- aws --endpoint-url=http://localhost:4566 s3 ls
  3. If PostgreSQL connection fails:

    # Check PostgreSQL service
    kubectl get svc -n polaris postgresql-hl
    
    # Verify PostgreSQL connectivity
    kubectl exec -it -n polaris postgresql-0 -- pg_isready -h localhost

Cleanup

Cleanup the Polaris resources:

$PROJECT_HOME/polaris-forge-setup/catalog_cleanup.yml

Delete the whole cluster:

$PROJECT_HOME/bin/cleanup.sh

Related Projects and Tools

Core Components

  • Apache Polaris - Data Catalog and Governance Platform
  • PyIceberg - Python library to interact with Apache Iceberg
  • LocalStack - AWS Cloud Service Emulator
  • k3d - k3s in Docker
  • k3s - Lightweight Kubernetes Distribution

Development Tools

  • Docker - Container Platform
  • Kubernetes - Container Orchestration
  • Helm - Kubernetes Package Manager
  • kubectl - Kubernetes CLI
  • uv - Python Packaging Tool

Documentation

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

Reproducer for Trino issue

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages