Skip to content

[BUG] unable to create spark context from jupyterlab #35

Closed
@sdabhi23

Description

@sdabhi23

Bug

Expected behaviour

Should be able to create spark context using code from the given examples

Current behaviour

The spark-master logs show the following error when trying to connect to the cluster from the notebook:

ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.

java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 6543101073799644159, local class serialVersionUID = 1574364215946805297

The root cause is the mismatch between the scala versions. Spark 3.0.0 is pre-built using Scala 2.12.10 while the containers are bundled with Scala 2.12.11. This is a known issue.

Reference:

  1. https://stackoverflow.com/questions/45755881/unable-to-connect-to-remote-spark-master-ror-transportrequesthandler-error-wh
  2. https://almond.sh/docs/usage-spark

    ammonite-spark handles loading Spark in a clever way, and does not rely on a specific Spark distribution. Because of that, you can use it with any Spark 2.x version. The only limitation is that the Scala version of Spark and the running Almond kernel must match, so make sure your kernel uses the same Scala version as your Spark cluster. Spark 2.0.x - 2.3.x requires Scala 2.11. Spark 2.4.x supports both Scala 2.11 and 2.12.

Steps to reproduce

  1. Use docker-compose up -d to bring up the cluster
  2. Create a new Scala notebook and create Spark context using the following code:
import $ivy.`org.apache.spark::spark-sql:2.4.4`;

import org.apache.log4j.{Level, Logger};
Logger.getLogger("org").setLevel(Level.OFF);

import org.apache.spark.sql._

val spark = {
  NotebookSparkSession.builder()
    .master("spark://spark-master:7077")
    .config("spark.executor.instances", "4")
    .config("spark.executor.memory", "2g")
    .getOrCreate()
}

Steps to check Scala version

  1. Jupyterlab Launcher shows the Scala version installed
  2. Launch a terminal on spark-master and fire up the spark shell using the command: bin/spark-shell

From my spark-master node:

# bin/spark-shell
20/09/05 10:07:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://d8b69de92ccc:4040
Spark context available as 'sc' (master = local[*], app id = local-1599300457142).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_265)
Type in expressions to have them evaluated.
Type :help for more information.

scala> ^C
# scala
Welcome to Scala 2.12.11 (OpenJDK 64-Bit Server VM, Java 1.8.0_265).
Type in expressions for evaluation. Or try :help.

scala> ^C
#

Possible solutions (optional)

Bundle the Spark 3.0.0 containers with Scala 2.12.10 and check for similar issues for other versions of Spark as well

Comments (optional)

I can help resolve this issue with a PR

Checklist

Please provide the following:

  • Docker Engine version: 19.03.8
  • Docker Compose version: 1.25.5

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions