Description
Bug
Expected behaviour
Should be able to create spark context using code from the given examples
Current behaviour
The spark-master logs show the following error when trying to connect to the cluster from the notebook:
ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 6543101073799644159, local class serialVersionUID = 1574364215946805297
The root cause is the mismatch between the scala versions. Spark 3.0.0 is pre-built using Scala 2.12.10
while the containers are bundled with Scala 2.12.11
. This is a known issue.
Reference:
- https://stackoverflow.com/questions/45755881/unable-to-connect-to-remote-spark-master-ror-transportrequesthandler-error-wh
- https://almond.sh/docs/usage-spark
ammonite-spark handles loading Spark in a clever way, and does not rely on a specific Spark distribution. Because of that, you can use it with any Spark 2.x version. The only limitation is that the Scala version of Spark and the running Almond kernel must match, so make sure your kernel uses the same Scala version as your Spark cluster. Spark 2.0.x - 2.3.x requires Scala 2.11. Spark 2.4.x supports both Scala 2.11 and 2.12.
Steps to reproduce
- Use
docker-compose up -d
to bring up the cluster - Create a new Scala notebook and create Spark context using the following code:
import $ivy.`org.apache.spark::spark-sql:2.4.4`;
import org.apache.log4j.{Level, Logger};
Logger.getLogger("org").setLevel(Level.OFF);
import org.apache.spark.sql._
val spark = {
NotebookSparkSession.builder()
.master("spark://spark-master:7077")
.config("spark.executor.instances", "4")
.config("spark.executor.memory", "2g")
.getOrCreate()
}
Steps to check Scala version
- Jupyterlab Launcher shows the Scala version installed
- Launch a terminal on
spark-master
and fire up the spark shell using the command:bin/spark-shell
From my spark-master node:
# bin/spark-shell
20/09/05 10:07:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://d8b69de92ccc:4040
Spark context available as 'sc' (master = local[*], app id = local-1599300457142).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.0
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_265)
Type in expressions to have them evaluated.
Type :help for more information.
scala> ^C
# scala
Welcome to Scala 2.12.11 (OpenJDK 64-Bit Server VM, Java 1.8.0_265).
Type in expressions for evaluation. Or try :help.
scala> ^C
#
Possible solutions (optional)
Bundle the Spark 3.0.0
containers with Scala 2.12.10
and check for similar issues for other versions of Spark as well
Comments (optional)
I can help resolve this issue with a PR
Checklist
Please provide the following:
- Docker Engine version: 19.03.8
- Docker Compose version: 1.25.5