Framework for simpler Spark Pipelines
For credentials:
Set up your $HOME/.pypirc file like this (replace password with real one):
[pypi]
username = __token__
password = pypi-AgEIcHlwaS5vcmcCJDU5YTg1ZDZjLTVhOWItNGZmMi1hMTBhLTgzZjVhMzBlYmJhOAACJXsicGVybWlzc2lvbnMiOiAidXNlciIsICJ2ZXJzaW9uIjogMX0AAAYgUAfdyImgcqvyNbLihu22g4Wp_2SYZvvJDx7iYNJpEUg
Then:
Increment version in VERSION file
Run make package
Run make devsetup
Install following PyCharm plugins:
-
Install SDKMan curl -s "https://get.sdkman.io" | bash
-
list Java versions available sdk list java
-
Install Java SDK
sdk install java 11.0.8.hs-adpt
-
Close your terminal window so it updates
-
- Check your install echo $JAVA_HOME java -version javac -version
-
Install Scala sdk list scala sdk install scala 2.12.12
-
Install Spark (Current version sdkman is still 2.x)
-
install homebrew /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
-
Install wget brew install wget
-
download spark
wget http://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
mkdir -p /usr/local/opt/spark
rm -r /usr/local/opt/spark/
mkdir -p /usr/local/opt/spark
tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /usr/local/opt/spark
cp -a /usr/local/opt/spark/spark-3.0.0-bin-hadoop3.2/ /usr/local/opt/spark/
rm -r /usr/local/opt/spark/spark-3.0.0-bin-hadoop3.2
- Update your
~/.bash_profile
to include SPARK_HOME path:
export SPARK_HOME="/usr/local/opt/spark"
export PATH="$SPARK_HOME/bin:$PATH"
-
Reload bash_profile
source ~/.bash_profile
-
Test Spark install spark-submit --class org.apache.spark.examples.SparkPi /usr/local/opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar