File tree 5 files changed +70
-6
lines changed
5 files changed +70
-6
lines changed Original file line number Diff line number Diff line change 1
- * [ ] add option for transformation
2
- * json -> parquet
3
- * parqeut -> json
4
- * [ ] add option for moving data
5
- * [ ] add option for copying data
1
+ # Datalake Batch Processor
2
+
3
+ ## Development
4
+
5
+ ### LocalStack
6
+
7
+ > ** _ NOTE:_ **
8
+ > For easier development, we recommend using [ awscli-local] ( https://github.com/localstack/awscli-local ) .
9
+ > It will automatically recognize the LocalStack instance and use it as the default AWS endpoint.
10
+ > Therefore, it has been added to the ` requirements.txt ` . The documentation below assumes the existence of the ` awslocal ` command.
11
+
12
+ You will find a ` docker-compose.yml ` file in the ` localstack ` directory.
13
+ This file will start a LocalStack instance with a configured S3 bucket.
14
+
15
+ Before you run it, make sure the ` localstack/init-aws.sh ` script is executable:
16
+ (see: [ LocalStack Init Hooks] ( https://docs.localstack.cloud/references/init-hooks/ ) )
17
+
18
+ ``` bash
19
+ chmod +x ./localstack/init-aws.sh
20
+ ```
21
+
22
+ To test if LocalStack with the S3 instance is running, you can run the following command:
23
+
24
+ ``` bash
25
+ awslocal s3api list-buckets
26
+ ```
27
+
28
+ which should print a list of all existing S3 buckets in your local environment as JSON:
29
+
30
+ ``` json
31
+ {
32
+ "Buckets" : [
33
+ {
34
+ "Name" : " my-bucket" ,
35
+ "CreationDate" : " 2025-02-17T13:06:48+00:00"
36
+ }
37
+ ]
38
+ }
39
+ ```
40
+
41
+ You should also be able to list all the files in the bucket:
42
+
43
+ ``` bash
44
+ aws --endpoint-url=http://localhost:4566 s3 ls s3://my-bucket/
45
+ ```
46
+
47
+ which should contain an example parquet files.
Original file line number Diff line number Diff line change
1
+ services :
2
+ localstack :
3
+ container_name : " ${LOCALSTACK_DOCKER_NAME:-localstack-main}"
4
+ image : localstack/localstack
5
+ ports :
6
+ - " 127.0.0.1:4566:4566" # LocalStack Gateway
7
+ - " 127.0.0.1:4510-4559:4510-4559" # external services port range
8
+ environment :
9
+ - DEBUG=1
10
+ volumes :
11
+ - " ./example.parquet:/etc/localstack/example.parquet"
12
+ - " ./init-aws.sh:/etc/localstack/init/ready.d/init-aws.sh" # ready hook
13
+ - " ${LOCALSTACK_VOLUME_DIR:-./volume}:/var/lib/localstack"
14
+ - " /var/run/docker.sock:/var/run/docker.sock"
Original file line number Diff line number Diff line change
1
+ #! /bin/sh
2
+
3
+ echo " Initializing localstack s3"
4
+ awslocal s3 mb s3://my-bucket
5
+
6
+ echo " Put example file"
7
+ awslocal s3 cp /etc/localstack/example.parquet s3://my-bucket/
Original file line number Diff line number Diff line change 1
- pyspark
1
+ pyspark
2
+ awscli-local
You can’t perform that action at this time.
0 commit comments