You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-19
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
1
# Apache Spark Standalone Cluster on Docker
2
+
2
3
> The project just got its [own article](https://towardsdatascience.com/apache-spark-cluster-on-docker-ft-a-juyterlab-interface-418383c95445) at Towards Data Science Medium blog! :sparkles:
3
4
4
5
This project gives you an **Apache Spark** cluster in standalone mode with a **JupyterLab** interface built on top of **Docker**.
5
-
Learn Apache Spark through its Scala and Python API (PySpark) by running the Jupyter [notebooks](build/workspace/) with examples on how to read, process and write data.
6
+
Learn Apache Spark through its Scala, Python (PySpark) and R (SparkR) API by running the Jupyter [notebooks](build/workspace/) with examples on how to read, process and write data.
| Apache Spark Worker I |[localhost:8081](http://localhost:8081/)| Spark Worker node with 1 core and 512m of memory (default) |
43
+
| Apache Spark Worker II |[localhost:8082](http://localhost:8082/)| Spark Worker node with 1 core and 512m of memory (default) |
42
44
43
45
### Prerequisites
44
46
@@ -54,7 +56,7 @@ docker-compose up
54
56
docker-compose up
55
57
```
56
58
57
-
4. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scalaand PySpark examples;
59
+
4. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scala, PySpark and SparkR examples;
58
60
5. Stop the cluster by typing `ctrl+c`.
59
61
60
62
### Build from your local machine
@@ -82,7 +84,7 @@ chmod +x build.sh ; ./build.sh
82
84
docker-compose up
83
85
```
84
86
85
-
7. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scalaand PySpark examples;
87
+
7. Run Apache Spark code using the provided Jupyter [notebooks](build/workspace/) with Scala, PySpark and SparkR examples;
86
88
8. Stop the cluster by typing `ctrl+c`.
87
89
88
90
## <aname="tech-stack"></a>Tech Stack
@@ -114,18 +116,20 @@ docker-compose up
114
116
115
117
> Apache Spark R API (SparkR) is only supported on version **2.4.4**. Full list can be found [here](https://cran.r-project.org/src/contrib/Archive/SparkR/).
Copy file name to clipboardExpand all lines: build/workspace/pyspark.ipynb
+14-7
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
"cell_type": "markdown",
5
5
"metadata": {},
6
6
"source": [
7
-
"# **PySpark**: The Spark Python API"
7
+
"# **PySpark**: The Apache Spark Python API"
8
8
]
9
9
},
10
10
{
@@ -33,7 +33,7 @@
33
33
"\n",
34
34
"+ **appName:** application name displayed at the [Spark Master Web UI](http://localhost:8080/);\n",
35
35
"+ **master:** Spark Master URL, same used by Spark Workers;\n",
36
-
"+ **config:** must be less than or equals to docker compose SPARK_WORKER_MEMORY config."
36
+
"+ **spark.executor.memory:** must be less than or equals to docker compose SPARK_WORKER_MEMORY config."
37
37
]
38
38
},
39
39
{
@@ -52,6 +52,13 @@
52
52
" getOrCreate()"
53
53
]
54
54
},
55
+
{
56
+
"cell_type": "markdown",
57
+
"metadata": {},
58
+
"source": [
59
+
"More confs for SparkSession object in standalone mode can be added using the **config** method. Checkout the API docs [here](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession)."
"+ **appName:** application name displayed at the [Spark Master Web UI](http://localhost:8080/);\n",
93
93
"+ **master:** Spark Master URL, same used by Spark Workers;\n",
94
-
"+ **config:** must be less than or equals to docker compose SPARK_WORKER_MEMORY config."
94
+
"+ **spark.executor.memory:** must be less than or equals to docker compose SPARK_WORKER_MEMORY config."
95
95
]
96
96
},
97
97
{
@@ -110,6 +110,13 @@
110
110
" getOrCreate()"
111
111
]
112
112
},
113
+
{
114
+
"cell_type": "markdown",
115
+
"metadata": {},
116
+
"source": [
117
+
"More confs for SparkSession object in standalone mode can be added using the **config** method. Checkout the API docs [here](https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/SparkSession.html)."
0 commit comments