Skip to content

Commit 8e1b1c4

Browse files
Merge pull request #27 from andre-marcos-perez/develop
Develop
2 parents 56af031 + 8d1db28 commit 8e1b1c4

15 files changed

+715
-109
lines changed

Diff for: .github/ISSUE_TEMPLATE/bug_report.md

+17-6
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,38 @@ assignees: 'andre-marcos-perez'
99

1010
## Introduction
1111

12-
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice parallel computing in distributed environments through our projects. :sparkles:
12+
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice
13+
parallel computing in distributed environments through our projects. :sparkles:
1314

1415
## Bug
1516

17+
Please fill the template below.
18+
1619
### Expected behaviour
1720

21+
*Describe the expected behaviour*
22+
1823
### Current behaviour
1924

25+
*Describe the current behaviour*
26+
2027
### Steps to reproduce
2128

22-
1. Step 1
23-
2. Step 2
24-
3. Step 3
29+
1. *Step 1*
30+
2. *Step 2*
31+
3. *Step 3*
2532

2633
### Possible solutions (optional)
2734

35+
*Add some solutions, if any*
36+
2837
### Comments (optional)
2938

39+
*Add some comments, if any*
40+
3041
### Checklist
3142

3243
Please provide the following:
3344

34-
- [] Docker Engine version:
35-
- [] Docker Compose version:
45+
- [] Docker Engine version: *Can be found using `docker version`, e.g.: 19.03.6*
46+
- [] Docker Compose version: *Can be found using `docker-compose version`, e.g.: 1.21.0*

Diff for: .github/ISSUE_TEMPLATE/feature_request.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,17 @@ assignees: 'andre-marcos-perez'
99

1010
## Introduction
1111

12-
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice parallel computing in distributed environments through our projects. :sparkles:
12+
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice
13+
parallel computing in distributed environments through our projects. :sparkles:
1314

1415
## Feature
1516

17+
Please fill the template below.
18+
1619
### Description
1720

21+
*Describe your feature request*
22+
1823
### Comments (optional)
24+
25+
*Add some comments, if any*

Diff for: .github/pull_request_template.md

+12-5
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,27 @@
11
## Introduction
22

3-
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice parallel computing in distributed environments through our projects. :sparkles:
3+
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice
4+
parallel computing in distributed environments through our projects. :sparkles:
45

56
## Pull Request
67

7-
### Description
8+
### Issue
9+
10+
- *Issue number with link, e.g.: [#22](https://github.com/andre-marcos-perez/spark-standalone-cluster-on-docker/issues/22)*
811

912
### Changes
1013

11-
- Change 1
12-
- Change 2
14+
- *High level description of change 1*
15+
- *High level description of change 2*
16+
- *...*
1317

1418
### Comments (optional)
1519

20+
*Add some comments, if any*
21+
1622
### Checklist
1723

1824
Please make sure to check the following:
1925

20-
- [] I have followed the steps in the [CONTRIBUTING.md](../CONTRIBUTING.md) file.
26+
- [] I have followed the steps in the [CONTRIBUTING.md](../CONTRIBUTING.md) file.
27+
- [] I am aware that pull requests that do not follow the rules will be automatically rejected.

Diff for: .github/workflows/ci.yml

+4-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: build
33
on:
44

55
schedule:
6-
- cron: '0 0/12 * * *'
6+
- cron: '0 0 * * *'
77

88
push:
99
branches: [ master ]
@@ -195,6 +195,7 @@ jobs:
195195
cd ${GITHUB_WORKSPACE}/build
196196
docker build \
197197
--build-arg build_date="$(date -u +'%Y-%m-%d')" \
198+
--build-arg scala_version="${SCALA_VERSION}" \
198199
--build-arg spark_version="${SPARK_VERSION}" \
199200
--build-arg jupyterlab_version="${JUPYTERLAB_VERSION}" \
200201
-f docker/jupyterlab/Dockerfile \
@@ -212,6 +213,7 @@ jobs:
212213
cd ${GITHUB_WORKSPACE}/build
213214
docker build \
214215
--build-arg build_date="$(date -u +'%Y-%m-%d')" \
216+
--build-arg scala_version="${SCALA_VERSION}" \
215217
--build-arg spark_version="${SPARK_VERSION}" \
216218
--build-arg jupyterlab_version="${JUPYTERLAB_VERSION}" \
217219
-f docker/jupyterlab/Dockerfile \
@@ -227,6 +229,7 @@ jobs:
227229
cd ${GITHUB_WORKSPACE}/build
228230
docker build \
229231
--build-arg build_date="$(date -u +'%Y-%m-%d')" \
232+
--build-arg scala_version="${SCALA_VERSION}" \
230233
--build-arg spark_version="${SPARK_VERSION}" \
231234
--build-arg jupyterlab_version="${JUPYTERLAB_VERSION}" \
232235
-f docker/jupyterlab/Dockerfile \

Diff for: CHANGELOG.md

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
## [v1.1.0](https://github.com/andre-marcos-perez/spark-standalone-cluster-on-docker/releases/tag/v1.1.0) (2020-08-09)
6+
7+
### Features
8+
9+
- Scala kernel for JupyterLab;
10+
- Jupyter notebook with Spark Scala API example.
11+
12+
### Repository
13+
14+
- Docs general improvements;
15+
- Pull request template refactored.
16+
17+
## [v1.0.0](https://github.com/andre-marcos-perez/spark-standalone-cluster-on-docker/releases/tag/v1.0.0) (2020-07-30)
18+
19+
### Tech Stack
20+
21+
- **Infra**
22+
- Python 3.7
23+
- Scala 2.12
24+
- Docker Engine 1.13.0+
25+
- Docker Compose 1.10.0+
26+
27+
- **Apps**
28+
- JupyterLab 2.1.4
29+
- Apache Spark 2.4.0, 2.4.4 and 3.0.0
30+
31+
### Features
32+
33+
- Docker compose file to build the cluster from your own machine;
34+
- Docker compose file to build the cluster from Docker Hub;
35+
- GitHub Workflow CI with Docker Hub to build the cluster daily.
36+
37+
### Repository
38+
39+
- Contributing rules;
40+
- GitHub templates for Bug Issue, Feature Request and Pull Request.
41+
42+
### Community
43+
44+
- Article on [Medium](https://towardsdatascience.com/apache-spark-cluster-on-docker-ft-a-juyterlab-interface-418383c95445).

Diff for: CONTRIBUTING.md

+12-11
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,25 @@
11
# Contributing
22

3-
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice distributed
4-
and parallel computing through our projects. Please follow the template bellow to contribute.
3+
Hi there, thanks for helping the project! We are doing our best to help the community to learn and practice
4+
parallel computing in distributed environments through our projects. :sparkles:
55

66
### Steps to contribute
77

8-
1. Fork the project;
9-
2. Create your feature branch, we use [gitflow](https://github.com/nvie/gitflow);
10-
3. Do your magic :rainbow:;
11-
4. Commit your changes;
12-
5. Push to your feature branch;
13-
6. Create a new pull request on the **develop** branch.
8+
1. Create an [issue](https://github.com/andre-marcos-perez/spark-standalone-cluster-on-docker/issues) to discuss features and bugs;
9+
2. Fork the project;
10+
3. Create your feature branch, we use [gitflow](https://github.com/nvie/gitflow);
11+
4. Do your magic :rainbow:;
12+
5. Commit your changes;
13+
6. Push to your feature branch;
14+
7. Create a new pull request from your the **develop** branch.
1415

1516
### Contributions ideas
1617

1718
- [] Microsoft Windows build script;
18-
- [x] DockerHub CI/CD integration;
19+
- [x] Docker Hub CI/CD integration;
1920
- [] Spark submit support;
20-
- [] JupyterLab Scala kernel;
21-
- [] Jupyter notebook with Apache Spark Scala API examples;
21+
- [x] JupyterLab Scala kernel;
22+
- [x] Jupyter notebook with Apache Spark Scala API examples;
2223
- [] JupyterLab R kernel;
2324
- [] Jupyter notebook with Apache Spark R API examples;
2425
- [] Test coverage.

Diff for: README.md

+56-54
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
11
# Apache Spark Standalone Cluster on Docker
22
> The project just got its [own article](https://towardsdatascience.com/apache-spark-cluster-on-docker-ft-a-juyterlab-interface-418383c95445) at Towards Data Science Medium blog! :sparkles:
33
4-
This project gives you an out-of-the-box **Apache Spark** cluster in standalone mode with a **JupyterLab** interface and a simulated **Apache Hadoop Distributed File System**, all built on top of **Docker**. Learn Apache Spark through its Python API, **PySpark**, by running the [Jupyter notebooks](build/workspace/pyspark.ipynb) with examples on how to read, process and write data.
4+
This project gives you an **Apache Spark** cluster in standalone mode with a **JupyterLab** interface built on top of **Docker**.
5+
Learn Apache Spark through its Scala and Python API (PySpark) by running the Jupyter [notebooks](build/workspace/) with examples on how to read, process and write data.
56

67
<p align="center"><img src="docs/image/cluster-architecture.png"></p>
78

89
![build](https://github.com/andre-marcos-perez/spark-standalone-cluster-on-docker/workflows/build/badge.svg?branch=master)
910
![jupyterlab-latest-version](https://img.shields.io/docker/v/andreper/jupyterlab/2.1.4-spark-3.0.0?color=yellow&label=jupyterlab-latest)
1011
![spark-latest-version](https://img.shields.io/docker/v/andreper/spark-master/3.0.0-hadoop-2.7?color=yellow&label=spark-latest)
1112
![docker-version](https://img.shields.io/badge/docker-v1.13.0%2B-blue)
12-
![docker-compose-version](https://img.shields.io/badge/docker--compose-v3.0%2B-blue)
13+
![docker-compose-file-version](https://img.shields.io/badge/docker--compose-v1.10.0%2B-blue)
14+
![spark-scala-api](https://img.shields.io/badge/spark%20api-scala-red)
15+
![spark-pyspark-api](https://img.shields.io/badge/spark%20api-pyspark-red)
1316

1417
## TL;DR
1518

@@ -21,105 +24,104 @@ docker-compose up
2124
## Contents
2225

2326
- [Quick Start](#quick-start)
24-
- [Tech Stack Version](#tech-stack-version)
25-
- [Contributing](#contributing)
27+
- [Tech Stack](#tech-stack)
2628
- [Docker Hub Metrics](#docker-hub-metrics)
29+
- [Contributing](#contributing)
2730
- [Contributors](#contributors)
2831

2932
## <a name="quick-start"></a>Quick Start
3033

3134
### Cluster overview
3235

33-
| Application | URL | Description |
34-
| -------------------------- | ---------------------------------------- | ---------------------------------------------------------- |
35-
| **JupyterLab** | [localhost:8888](http://localhost:8888/) | Cluster interface with PySpark built-in notebook |
36-
| **Apache Spark Master** | [localhost:8080](http://localhost:8080/) | Spark Master node |
37-
| **Apache Spark Worker I** | [localhost:8081](http://localhost:8081/) | Spark Worker node with 1 core and 512m of memory (default) |
38-
| **Apache Spark Worker II** | [localhost:8082](http://localhost:8082/) | Spark Worker node with 1 core and 512m of memory (default) |
36+
| Application | URL | Description |
37+
| ---------------------- | ---------------------------------------- | ----------------------------------------------------------- |
38+
| JupyterLab | [localhost:8888](http://localhost:8888/) | Cluster interface with Scala and PySpark built-in notebooks |
39+
| Apache Spark Master | [localhost:8080](http://localhost:8080/) | Spark Master node |
40+
| Apache Spark Worker I | [localhost:8081](http://localhost:8081/) | Spark Worker node with 1 core and 512m of memory (default) |
41+
| Apache Spark Worker II | [localhost:8082](http://localhost:8082/) | Spark Worker node with 1 core and 512m of memory (default) |
42+
43+
### Prerequisites
44+
45+
- Install [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/), check **infra** [supported versions](#tech-stack)
3946

40-
### Build from DockerHub
47+
### Build from Docker Hub
4148

42-
1. Install [Docker and Docker Compose](https://docs.docker.com/get-docker/), check **infra** [supported versions](#tech-stack-version);
43-
2. Download the source code or clone the repository;
44-
3. Edit the [docker compose](docker-compose.yml) file with your favorite tech stack version, check **apps** [supported versions](#tech-stack-version);
45-
4. Build the cluster;
49+
1. Download the source code or clone the repository;
50+
2. Edit the [docker compose](docker-compose.yml) file with your favorite tech stack version, check **apps** [supported versions](#tech-stack);
51+
3. Build the cluster;
4652

4753
```bash
4854
docker-compose up
4955
```
5056

51-
5. Run Apache Spark code using the provided [Jupyter notebook](build/workspace/pyspark.ipynb) with PySpark examples.
57+
4. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scala and PySpark examples;
58+
5. Stop the cluster by typing `ctrl+c`.
5259

5360
### Build from your local machine
5461

5562
> **Note**: Local build is currently only supported on Linux OS distributions.
5663
57-
1. Install [Docker and Docker Compose](https://docs.docker.com/get-docker/), check **infra** [supported versions](#tech-stack-version);
58-
2. Download the source code or clone the repository;
59-
3. Move to the build directory;
64+
1. Download the source code or clone the repository;
65+
2. Move to the build directory;
6066

6167
```bash
6268
cd build
6369
```
6470

65-
4. Edit the [build.yml](build/build.yml) file with your favorite tech stack version;
66-
5. Match those version on the [docker compose](build/docker-compose.yml) file;
67-
6. Make the build script executable;
68-
69-
```bash
70-
chmod +x build.sh
71-
```
72-
73-
7. Build the images;
71+
3. Edit the [build.yml](build/build.yml) file with your favorite tech stack version;
72+
4. Match those version on the [docker compose](build/docker-compose.yml) file;
73+
5. Build the images;
7474

7575
```bash
76-
./build.sh
76+
chmod +x build.sh ; ./build.sh
7777
```
7878

79-
8. Build the cluster;
79+
6. Build the cluster;
8080

8181
```bash
8282
docker-compose up
8383
```
8484

85-
9. Run Apache Spark code using the provided [Jupyter notebook](build/workspace/pyspark.ipynb) with PySpark examples.
85+
7. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scala and PySpark examples;
86+
8. Stop the cluster by typing `ctrl+c`.
8687

87-
## <a name="tech-stack-version"></a>Tech Stack Version
88+
## <a name="tech-stack"></a>Tech Stack
8889

8990
- Infrastructure
9091

91-
| App | Version |
92-
| ------------------ | ------------------ |
93-
| **Docker** | 1.13.0+ |
94-
| **Docker Compose** | 3.0+ |
92+
| Component | Version |
93+
| -------------- | ------- |
94+
| Docker Engine | 1.13.0+ |
95+
| Docker Compose | 1.10.0+ |
96+
| Python | 3.7 |
97+
| Scala | 2.12 |
98+
99+
- Jupyter Kernels
100+
101+
| Component | Version | Provider |
102+
| -------------- | ------- | ------------------------------- |
103+
| Python | 2.1.4 | [Jupyter](https://jupyter.org/) |
104+
| Scala | 0.10.0 | [Almond](https://almond.sh/) |
95105

96106
- Applications
97107

98-
| App | Version | Latest |
99-
| ------------------ | ---------------------- | ------------------ |
100-
| **Apache Spark** | 2.4.0 \| 2.4.4 \| 3.0.0 | 3.0.0 |
101-
| **Apache Hadoop** | 2.7 | 2.7 |
102-
| **JupyterLab** | 2.1.4 | 2.1.4 |
108+
| Component | Version | Docker Tag |
109+
| -------------- | ---------------------- | ---------------------------------------------------- |
110+
| Apache Spark | 2.4.0 \| 2.4.4 \| 3.0.0 | **\<spark-version>**-hadoop-2.7 |
111+
| JupyterLab | 2.1.4 | **\<jupyterlab-version>**-spark-**\<spark-version>** |
103112

104-
- Tech
113+
## <a name="docker-hub-metrics"></a>Docker Hub Metrics
105114

106-
| App | Version |
107-
| ------------------ | ------------------ |
108-
| **Python** | 3.7 |
109-
| **Scala** | 2.12 |
115+
| Image | Latest Version Size | Downloads |
116+
| -------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
117+
| [JupyterLab](https://hub.docker.com/r/andreper/jupyterlab) | ![docker-size](https://img.shields.io/docker/image-size/andreper/jupyterlab/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/jupyterlab) |
118+
| [Spark Master](https://hub.docker.com/r/andreper/spark-master) | ![docker-size](https://img.shields.io/docker/image-size/andreper/spark-master/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/spark-master) |
119+
| [Spark Worker](https://hub.docker.com/r/andreper/spark-worker) | ![docker-size](https://img.shields.io/docker/image-size/andreper/spark-worker/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/spark-worker) |
110120

111121
## <a name="contributing"></a>Contributing
112122

113123
We'd love some help. To contribute, please read [this file](CONTRIBUTING.md).
114124

115-
## <a name="docker-hub-metrics"></a>Docker Hub Metrics
116-
117-
| Image | Latest Version Size | Pulls |
118-
| ------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
119-
| **[JupyterLab](https://hub.docker.com/r/andreper/jupyterlab)** | ![docker-size](https://img.shields.io/docker/image-size/andreper/jupyterlab/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/jupyterlab) |
120-
| **[Spark Master](https://hub.docker.com/r/andreper/spark-master)** | ![docker-size](https://img.shields.io/docker/image-size/andreper/spark-master/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/spark-master) |
121-
| **[Spark Worker](https://hub.docker.com/r/andreper/spark-worker)** | ![docker-size](https://img.shields.io/docker/image-size/andreper/spark-worker/latest) | ![docker-pull](https://img.shields.io/docker/pulls/andreper/spark-worker) |
122-
123125
## <a name="contributors"></a>Contributors
124126

125127
- **André Perez** - [dekoperez](https://twitter.com/dekoperez) - [email protected]

0 commit comments

Comments
 (0)