为什么在Alpine Linux上安装Pandas会花费很多时间

我注意到,使用基本操作系统Alpine与CentOS或Debian在Docker容器中安装Pandas和Numpy(它的依赖项)需要花费更长的时间。我在下面创建了一个小测试来演示时差。除了Alpine用来更新和下载构建依赖项以安装Pandas和Numpy的几秒钟之外,为什么setup.py所需的时间比Debian的安装要多70倍?

是否有任何方法可以使用Alpine作为基础映像来加快安装速度,或者有另一个与Alpine大小相当的基础映像更适合用于Pandas和Numpy等软件包?

FROM python:3.6.4-slim-jessie

RUN pip install pandas

[PandasDockerTest] time docker build -t debian-pandas -f Dockerfile.debian . --no-cache

Sending build context to Docker daemon 3.072kB

Step 1/2 : FROM python:3.6.4-slim-jessie

---> 43431c5410f3

Step 2/2 : RUN pip install pandas

---> Running in 2e4c030f8051

Collecting pandas

Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)

Collecting numpy>=1.9.0 (from pandas)

Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)

Collecting pytz>=2011k (from pandas)

Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)

Collecting python-dateutil>=2 (from pandas)

Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)

Collecting six>=1.5 (from python-dateutil>=2->pandas)

Downloading six-1.11.0-py2.py3-none-any.whl

Installing collected packages: numpy, pytz, six, python-dateutil, pandas

Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0

Removing intermediate container 2e4c030f8051

---> a71e1c314897

Successfully built a71e1c314897

Successfully tagged debian-pandas:latest

docker build -t debian-pandas -f Dockerfile.debian . --no-cache 0.07s user 0.06s system 0% cpu 13.605 total

FROM python:3.6.4-alpine3.7

RUN apk --update add --no-cache g++

RUN pip install pandas

[PandasDockerTest] time docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache

Sending build context to Docker daemon 16.9kB

Step 1/3 : FROM python:3.6.4-alpine3.7

---> 4b00a94b6f26

Step 2/3 : RUN apk --update add --no-cache g++

---> Running in 4b0c32551e3f

fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz

fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz

fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz

fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz

(1/17) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)

(2/17) Installing libgcc (6.4.0-r5)

(3/17) Installing libstdc++ (6.4.0-r5)

(4/17) Installing binutils-libs (2.28-r3)

(5/17) Installing binutils (2.28-r3)

(6/17) Installing gmp (6.1.2-r1)

(7/17) Installing isl (0.18-r0)

(8/17) Installing libgomp (6.4.0-r5)

(9/17) Installing libatomic (6.4.0-r5)

(10/17) Installing pkgconf (1.3.10-r0)

(11/17) Installing mpfr3 (3.1.5-r1)

(12/17) Installing mpc1 (1.0.3-r1)

(13/17) Installing gcc (6.4.0-r5)

(14/17) Installing musl-dev (1.1.18-r3)

(15/17) Installing libc-dev (0.7.1-r0)

(16/17) Installing g++ (6.4.0-r5)

(17/17) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)

Executing busybox-1.27.2-r7.trigger

OK: 184 MiB in 50 packages

Removing intermediate container 4b0c32551e3f

---> be26c3bf4e42

Step 3/3 : RUN pip install pandas

---> Running in 36f6024e5e2d

Collecting pandas

Downloading pandas-0.22.0.tar.gz (11.3MB)

Collecting python-dateutil>=2 (from pandas)

Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)

Collecting pytz>=2011k (from pandas)

Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)

Collecting numpy>=1.9.0 (from pandas)

Downloading numpy-1.14.1.zip (4.9MB)

Collecting six>=1.5 (from python-dateutil>=2->pandas)

Downloading six-1.11.0-py2.py3-none-any.whl

Building wheels for collected packages: pandas, numpy

Running setup.py bdist_wheel for pandas: started

Running setup.py bdist_wheel for pandas: still running...

Running setup.py bdist_wheel for pandas: still running...

Running setup.py bdist_wheel for pandas: still running...

Running setup.py bdist_wheel for pandas: still running...

Running setup.py bdist_wheel for pandas: still running...

Running setup.py bdist_wheel for pandas: still running...

Running setup.py bdist_wheel for pandas: finished with status 'done'

Stored in directory: /root/.cache/pip/wheels/e8/ed/46/0596b51014f3cc49259e52dff9824e1c6fe352048a2656fc92

Running setup.py bdist_wheel for numpy: started

Running setup.py bdist_wheel for numpy: still running...

Running setup.py bdist_wheel for numpy: still running...

Running setup.py bdist_wheel for numpy: still running...

Running setup.py bdist_wheel for numpy: finished with status 'done'

Stored in directory: /root/.cache/pip/wheels/9d/cd/e1/4d418b16ea662e512349ef193ed9d9ff473af715110798c984

Successfully built pandas numpy

Installing collected packages: six, python-dateutil, pytz, numpy, pandas

Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0

Removing intermediate container 36f6024e5e2d

---> a93c59e6a106

Successfully built a93c59e6a106

Successfully tagged alpine-pandas:latest

docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache 0.54s user 0.33s system 0% cpu 16:08.47 total

回答:

基于Debian的映像仅python pip用于安装以下.whl格式的软件包:

  Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)

Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)

WHL格式是一种比每次都从源代码重新构建更快,更可靠的安装Python软件的方法。WHL文件仅需移动到要安装的目标系统上的正确位置,而源分发需要在安装之前进行构建。

轮包pandasnumpy基于高山平台,图像不支持。这就是为什么python

pip在构建过程中使用它们进行安装时,我们总是从alpine的源文件中编译它们:

  Downloading pandas-0.22.0.tar.gz (11.3MB)

Downloading numpy-1.14.1.zip (4.9MB)

我们可以在图像构建过程中看到以下内部容器:

/ # ps aux

PID USER TIME COMMAND

1 root 0:00 /bin/sh -c pip install pandas

7 root 0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas

21 root 0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n

496 root 0:00 sh

660 root 0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri

661 root 0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump

662 root 0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ

663 root 0:00 ps aux

如果我们Dockerfile稍作修改:

FROM python:3.6.4-alpine3.7

RUN apk add --no-cache g++ wget

RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl

RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl

我们得到以下错误:

Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl

---> Running in 0faea63e2bda

pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.

The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1

不幸的是,在pandasAlpine映像上安装的唯一方法是等到构建完成。

当然pandas,例如,如果要在CI中使用Alpine映像,最好的方法是将其编译一次,将其推送到任何注册表中,然后将其用作满足您需要的基础映像。

如果您想使用Alpine图像,pandas可以拉我的nickgryg / alpine-pandas

docker图像。这是pandas在Alpine平台上预编译的python图像。它应该可以节省您的时间。

以上是 为什么在Alpine Linux上安装Pandas会花费很多时间 的全部内容, 来源链接: utcz.com/qa/426433.html

回到顶部