Skip to content

import wget fails in jupyter notebook (not installed) #65

Closed
@ToddG

Description

@ToddG

Bug

Expected behaviour

Current behaviour

import wget fails with:

ModuleNotFoundError: No module named 'wget'

Steps to reproduce

  1. Step 1
git clone (this repo)
docker-compose up
  1. Step 2

open JupyterLab at localhost:8888

  1. Step 3

Follow instructions:

In [1]:

from pyspark.sql import SparkSession

spark = SparkSession.\
        builder.\
        appName("pyspark-notebook").\
        master("spark://spark-master:7077").\
        config("spark.executor.memory", "512m").\
        getOrCreate()

Learn and practice Apache Spark using PySpark
In [2]:

import wget

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
wget.download(url)

import wget fails with

ModuleNotFoundError: No module named 'wget'

Possible solutions (optional)

  • add apt install pip3 to base Dockerfile.
  • I tried the above, but am getting errors pulling down the scala deb from https://www.lightbend.com/

Which brings me to another question...why are you using a bespoke scala image stuffed on some random server? Attempts to rebuild the docker/base/Dockerfile are failing b/c (I think) the scala deb is no longer there:

Processing triggers for libc-bin (2.28-10) ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   243    0   243    0     0    682      0 --:--:-- --:--:-- --:--:--   680

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...E: Sub-process Popen returned an error code (2)
E: Encountered a section with no Package: header
E: Problem with MergeList /scala.deb
E: The package lists or status file could not be parsed or opened.

The command '/bin/sh -c mkdir -p ${shared_workspace}/data &&     mkdir -p /usr/share/man/man1 &&     apt-get update -y &&     apt-get install -y curl python3 r-base &&     ln -s /usr/bin/python3 /usr/bin/python &&     curl https://downloads.lightbend.com/scala/${scala_version}/scala-${scala_version}.deb -k -o scala.deb &&     apt install -y ./scala.deb &&     rm -rf scala.deb /var/lib/apt/lists/*' returned a non-zero code: 100

Add some solutions, if any

Comments (optional)

Add some comments, if any

Checklist

Please provide the following:

  • Docker Engine version: *20.10.1"
  • Docker Compose version: 1.25.0

Client: Docker Engine - Community
Version: 20.10.1
API version: 1.41
Go version: go1.13.15
Git commit: 831ebea
Built: Tue Dec 15 04:34:58 2020
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.1
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: f001486
Built: Tue Dec 15 04:32:52 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0

docker-compose version 1.25.0, build unknown
docker-py version: 4.1.0
CPython version: 3.8.5
OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions