连接到远程Spark主站-Java / Scala

Apache Spark在AWS中创建了一个3节点(1个主节点,2个工作人员)集群。我可以将作业从主服务器提交到群集,但是我无法使其在远程工作。

/* SimpleApp.scala */

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

object SimpleApp {

def main(args: Array[String]) {

val logFile = "/usr/local/spark/README.md" // Should be some file on your system

val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077")

val sc = new SparkContext(conf)

val logData = sc.textFile(logFile, 2).cache()

val numAs = logData.filter(line => line.contains("a")).count()

val numBs = logData.filter(line => line.contains("b")).count()

println(s"Lines with a: $numAs, Lines with b: $numBs")

sc.stop()

}

}

我可以从主人那里看到:

Spark Master at spark://ip-171-13-22-125.ec2.internal:7077

URL: spark://ip-171-13-22-125.ec2.internal:7077

REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)

因此,当我SimpleApp.scala从本地计算机执行时,它无法连接到Spark Master

2017-02-04 19:59:44,074 INFO  [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54)  [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077...

2017-02-04 19:59:44,166 WARN [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87) [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077

org.apache.spark.SparkException: Exception thrown in awaitResult

at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2]

at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2]

at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?]

at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]

但是,我知道如果将master设置为local,它会起作用,因为那样它将在本地运行。但是,我希望客户端连接到该远程主服务器。我该怎么做?Apache配置外观文件。我什至可以远程登录到该公共DNS和端口,还/etc/hosts为每个EC2实例配置了公共DNS和主机名。我希望能够向该远程主服务器提交作业,我还缺少什么?

回答:

要绑定主主机名/ IP,请转到您的spark安装conf目录(spark-2.0.2-bin-hadoop2.7 /

conf),然后使用以下命令创建spark-env.sh文件。

cp spark-env.sh.template spark-env.sh

在vi编辑器中打开spark-env.sh文件,并在下面添加带有主服务器主机名/ IP的行。

SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.com

使用stop-all.sh和start-all.sh停止并启动Spark。现在您可以使用它来连接远程主机

val spark = SparkSession.builder()

.appName("SparkSample")

.master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077")

.getOrCreate()

有关设置环境变量的更多信息,请检查http://spark.apache.org/docs/latest/spark-

standalone.html#cluster-launch-

scripts

以上是 连接到远程Spark主站-Java / Scala 的全部内容, 来源链接: utcz.com/qa/407439.html

回到顶部