一、问题

(在Alpine 上装好spark, 运行 pyspark)

bash-4.4# pyspark
Python 3.6.5 (default, May 30 2019, 09:48:14)
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "/spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File "/spark/python/pyspark/__init__.py", line 41, in <module>
    from pyspark.context import SparkContext
  File "/spark/python/pyspark/context.py", line 28, in <module>
    from pyspark import accumulators
  File "/spark/python/pyspark/accumulators.py", line 98, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/spark/python/pyspark/serializers.py", line 58, in <module>
    import zlib
ModuleNotFoundError: No module named 'zlib'
>>>

 

二、解决办法

先升级Alpine,再安装pip(安装pip和这错没关系)

bash-4.4# apk add --update py-pip

or

apk update
apk add py-pip

 

fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz

fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/6) Installing libbz2 (1.0.6-r6)
(2/6) Installing expat (2.2.5-r0)
(3/6) Installing gdbm (1.13-r1)
(4/6) Installing python2 (2.7.15-r1)
(5/6) Installing py-setuptools (39.1.0-r0)
(6/6) Installing py2-pip (10.0.1-r0)
Executing busybox-1.28.4-r0.trigger

 

检查,已经正常运行:

bash-4.4# pyspark
Python 2.7.15 (default, Aug 16 2018, 14:17:09)
[GCC 6.4.0] on linux2
...........................................
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.2
      /_/

Using Python version 2.7.15 (default, Aug 16 2018 14:17:09)
SparkContext available as sc, HiveContext available as sqlContext.
>>>

 

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐