Skip to content

Commit 2020228

Browse files
Merge pull request #1 from k-edge/DT-1179-localstack
DT-1171, DT-1179 localstack in docker-compose.yml and awslocal in req…
2 parents 713f51f + 560f73e commit 2020228

File tree

5 files changed

+70
-6
lines changed

5 files changed

+70
-6
lines changed

README.md

+47-5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,47 @@
1-
* [] add option for transformation
2-
* json -> parquet
3-
* parqeut -> json
4-
* [] add option for moving data
5-
* [] add option for copying data
1+
# Datalake Batch Processor
2+
3+
## Development
4+
5+
### LocalStack
6+
7+
> **_NOTE:_**
8+
> For easier development, we recommend using [awscli-local](https://github.com/localstack/awscli-local).
9+
> It will automatically recognize the LocalStack instance and use it as the default AWS endpoint.
10+
> Therefore, it has been added to the `requirements.txt`. The documentation below assumes the existence of the `awslocal` command.
11+
12+
You will find a `docker-compose.yml` file in the `localstack` directory.
13+
This file will start a LocalStack instance with a configured S3 bucket.
14+
15+
Before you run it, make sure the `localstack/init-aws.sh` script is executable:
16+
(see: [LocalStack Init Hooks](https://docs.localstack.cloud/references/init-hooks/))
17+
18+
```bash
19+
chmod +x ./localstack/init-aws.sh
20+
```
21+
22+
To test if LocalStack with the S3 instance is running, you can run the following command:
23+
24+
```bash
25+
awslocal s3api list-buckets
26+
```
27+
28+
which should print a list of all existing S3 buckets in your local environment as JSON:
29+
30+
```json
31+
{
32+
"Buckets": [
33+
{
34+
"Name": "my-bucket",
35+
"CreationDate": "2025-02-17T13:06:48+00:00"
36+
}
37+
]
38+
}
39+
```
40+
41+
You should also be able to list all the files in the bucket:
42+
43+
```bash
44+
aws --endpoint-url=http://localhost:4566 s3 ls s3://my-bucket/
45+
```
46+
47+
which should contain an example parquet files.

docker-compose.yml

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
services:
2+
localstack:
3+
container_name: "${LOCALSTACK_DOCKER_NAME:-localstack-main}"
4+
image: localstack/localstack
5+
ports:
6+
- "127.0.0.1:4566:4566" # LocalStack Gateway
7+
- "127.0.0.1:4510-4559:4510-4559" # external services port range
8+
environment:
9+
- DEBUG=1
10+
volumes:
11+
- "./example.parquet:/etc/localstack/example.parquet"
12+
- "./init-aws.sh:/etc/localstack/init/ready.d/init-aws.sh" # ready hook
13+
- "${LOCALSTACK_VOLUME_DIR:-./volume}:/var/lib/localstack"
14+
- "/var/run/docker.sock:/var/run/docker.sock"

example.parquet

2.44 KB
Binary file not shown.

init-aws.sh

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/sh
2+
3+
echo "Initializing localstack s3"
4+
awslocal s3 mb s3://my-bucket
5+
6+
echo "Put example file"
7+
awslocal s3 cp /etc/localstack/example.parquet s3://my-bucket/

requirements.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
pyspark
1+
pyspark
2+
awscli-local

0 commit comments

Comments
 (0)