如何使用Docker在Spark容器中运行Scala代码?

我使用以下Dockerfile创建了一个Spark容器:

FROM ubuntu:16.04

RUN apt-get update -y && apt-get install -y \

default-jdk \

nano \

wget && \

apt-get clean && \

rm -rf /var/lib/apt/lists/*

RUN useradd --create-home --shell /bin/bash ubuntu

ENV HOME /home/ubuntu

ENV SPARK_VERSION 2.4.3

ENV HADOOP_VERSION 2.6

ENV MONGO_SPARK_VERSION 2.2.0

ENV SCALA_VERSION 2.11

WORKDIR ${HOME}

ENV SPARK_HOME ${HOME}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}

ENV PATH ${PATH}:${SPARK_HOME}/bin

COPY files/times.json /home/ubuntu/times.json

COPY files/README.md /home/ubuntu/README.md

COPY files/examples.scala /home/ubuntu/examples.scala

COPY files/initDocuments.scala /home/ubuntu/initDocuments.scala

RUN chown -R ubuntu:ubuntu /home/ubuntu/*

USER ubuntu

# get spark

RUN wget http://apache.mirror.digitalpacific.com.au/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \

tar xvf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz

RUN rm -fv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz

我还有两个用Scala编程语言编写的文件,这对我来说听起来很新。问题在于容器只知道Java,而没有安装任何其他命令。有什么方法可以在容器上没有安装任何程序的情况下运行Scala?

文件名是examples.scalainitDocuments.scala。这是initDocuments.scala文件:

import com.mongodb.spark._

import com.mongodb.spark.config._

import org.bson.Document

val rdd = MongoSpark.load(sc)

if (rdd.count<1){

val t = sc.textFile("times.json")

val converted = t.map((tuple)=>Document.parse(tuple))

converted.saveToMongoDB(WriteConfig(Map("uri"->"mongodb://mongodb/spark.times")))

println("Documents inserted.")

} else {

println("Database 'spark' collection 'times' is not empty. Maybe you've loaded a data into the collection previously ? skipping process. ")

}

System.exit(0);

我也尝试了以下方法,但不起作用。

spark-shell --conf "spark.mongodb.input.uri=mongodb://mongodb:27017/spark.times" --conf "spark.mongodb.output.uri=mongodb://mongodb/spark.output" --packages org.mongodb.spark:mongo-spark-connector_${SCALA_VERSION}:${MONGO_SPARK_VERSION} -i ./initDocuments.scala

Ivy Default Cache set to: /home/ubuntu/.ivy2/cache

The jars for the packages stored in: /home/ubuntu/.ivy2/jars

:: loading settings :: url = jar:file:/home/ubuntu/spark-2.4.3-bin-hadoop2.6/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml

org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency

:: resolving dependencies :: org.apache.spark#spark-submit-parent-d0f95242-e9b9-4d49-8dde-42afc7c55e9a;1.0

confs: [default]

You probably access the destination server through a proxy server that is not well configured.

You probably access the destination server through a proxy server that is not well configured.

You probably access the destination server through a proxy server that is not well configured.

You probably access the destination server through a proxy server that is not well configured.

:: resolution report :: resolve 40879ms :: artifacts dl 0ms

:: modules in use:

---------------------------------------------------------------------

| | modules || artifacts |

| conf | number| search|dwnlded|evicted|| number|dwnlded|

---------------------------------------------------------------------

| default | 1 | 0 | 0 | 0 || 0 | 0 |

---------------------------------------------------------------------

:: problems summary ::

:::: WARNINGS

Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom

Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar

Host dl.bintray.com not found. url=https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom

Host dl.bintray.com not found. url=https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar

module not found: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0

==== local-m2-cache: tried

file:/home/ubuntu/.m2/repository/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom

-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:

file:/home/ubuntu/.m2/repository/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar

==== local-ivy-cache: tried

/home/ubuntu/.ivy2/local/org.mongodb.spark/mongo-spark-connector_2.11/2.2.0/ivys/ivy.xml

-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:

/home/ubuntu/.ivy2/local/org.mongodb.spark/mongo-spark-connector_2.11/2.2.0/jars/mongo-spark-connector_2.11.jar

==== central: tried

https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom

-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:

https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar

==== spark-packages: tried

https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom

-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:

https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar

::::::::::::::::::::::::::::::::::::::::::::::

:: UNRESOLVED DEPENDENCIES ::

::::::::::::::::::::::::::::::::::::::::::::::

:: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0: not found

::::::::::::::::::::::::::::::::::::::::::::::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0: not found]

at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1306)

at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)

at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)

at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我试图使用以下命令来更改代理地址,但我认为我的使用情况没有很好的代理。如果有人可以帮助我运行配置良好的代理来解决我的下载问题,我将不胜感激。

export JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=yourserver -Dhttp.proxyPort=8080 -Dhttp.proxyUser=username -Dhttp.proxyPassword=password"

回答:

根据下面的错误消息

:: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0: not found

它指示该软件包丢失。检查当前可用的MongoDB Connector for Spark软件包,确认该软件包不再可用(已替换v2.2.6修补程序)。

您可以在sindbach / mongodb-spark-docker上查看带有Docker的MongoDB Spark连接器的更新示例。

附加信息: spark-shell是REPL(读取-评估-

打印循环)工具。它是程序员用于与框架进行交互的交互式外壳。您无需显式执行即可build执行。当您指定它的--packages参数时spark-

shell,它将自动获取程序包并将其包含在您的Shell环境中。

以上是 如何使用Docker在Spark容器中运行Scala代码? 的全部内容, 来源链接: utcz.com/qa/419843.html

回到顶部