Spark在HA情况下,以Tachyon为内存文件系统,如何运行在Yarn上?
测试环境
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| 测试环境: Ubuntu 14.04 LTS x64 Tachyon:tachyon-0.7.1-bin.tar.gz Hadoop:hadoop-2.7.1.tar.gz Spark:spark-1.5.2-bin-hadoop2.6.tgz Maven:apache-maven-3.3.9-bin.tar.gz Scala:scala-2.11.7.tgz hostname IP role spark-master: 192.168.108.20 master & worker spark-slave1: 192.168.108.21 worker spark-slave2: 192.168.108.22 worker !默认情况全部操作在root下进行
|
Scala安装
Scala环境变量
1 2 3 4 5 6 7 8 9
| /** * 对每台主机做如下配置 */ vim /etc/profile
export SCALA_HOME=/home/jabo/software/scala-2.11.7 export PATH=${SCALA_HOME}/bin:$PATH
source /etc/profile
|
测试Scala
1 2 3
| scala -version
Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL
|
Java环境安装
请参考:Ubuntu下安装JDK环境
ZooKeeper集群安装
请参考:Zookeeper集群环境搭建
Tachyon集群安装
请参考:Tachyon集群部署
Hadoop2.X集群安装
请参考:Hadoop集群环境搭建
Tachyon集群High Available
请参考:Tachyon集群High Available
Spark集群安装
Spark下载
下载地址:Spark官方下载地址
下载前请先查看,Tachyon和Spark相关版本支持
spark环境变量
1 2 3 4 5 6
| vim /etc/profile
export SPARK_HOME=/home/jabo/software/spark-1.5.2-bin-hadoop2.6 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
source /etc/profile
|
目录权限
1
| sudo chmod -R 775 spark-1.5.2-bin-hadoop2.6/
|
spark-env.sh配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| cp ./conf/spark-env.sh.template ./conf/spark-env.sh vim ./conf/spark-env.sh
export SCALA_HOME=/home/jabo/software/scala-2.11.7 export JAVA_HOME=/usr/lib/jvm/java export SPARK_MASTER_IP=spark-master export SPARK_WORKER_MEMORY=1G export SPARK_WORKER_PORT=7077 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_JAVA_OPTS=" -Dtachyon.zookeeper.address=spark-master:2181,spark-slave1:2181,spark-slave2:2181 -Dtachyon.usezookeeper=true $SPARK_JAVA_OPTS "
|
配置slaves
1 2 3 4 5 6
| cp ./conf/slaves.template ./conf/slaves vim ./conf/slaves
spark-master spark-slave1 spark-slave2
|
新建core-site.xml
1 2 3 4 5 6 7 8 9
| vim ./conf/core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>fs.tachyon-ft.impl</name> <value>tachyon.hadoop.TFSFT</value> </property> </configuration>
|
分发Spark目录
分发Spark目录到所有主机
启动Zookeeper集群
1 2 3 4
| /** * 每台主机上运行 */ zkServer.sh start
|
启动Hadoop集群
1 2 3 4
| /** * spark-master上运行 */ ./sbin/start-all.sh
|
启动Tachyon集群
1 2 3 4 5 6
| /** * spark-master上运行 */ ./bin/tachyon format
./bin/tachyon-start.sh all NoMount
|
启动Spark集群
查看各自集群情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| root@spark-master: jps 115633 Jps 93392 JournalNode 95446 TachyonMaster 92948 NameNode 93756 ResourceManager 115442 Worker 115246 Master 3072 QuorumPeerMain 93107 DataNode 93932 NodeManager 93642 DFSZKFailoverController 95643 TachyonWorker
root@spark-slave1: jps 85099 Worker 66267 JournalNode 3021 QuorumPeerMain 65967 NameNode 66621 NodeManager 85448 Jps 67496 TachyonWorker 66448 DFSZKFailoverController 66088 DataNode
root@spark-slave2: jps 11296 NodeManager 13100 Worker 11172 JournalNode 11050 DataNode 2976 QuorumPeerMain 13166 Jps 11794 TachyonWorker
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| * spark-master主机上 */ //hadoop address http://spark-master:9000/
It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.
//Yarn address http://spark-master:8188
左侧栏Nodes查看节点情况 左侧栏Applications查看应用执行情况
//HDFS address http://spark-master:50070
Datanodes查看节点情况
//Tachyon address http://spark-master:19999
Workers查看节点情况 Browse File System查看文件
//Spark address http://spark-master:8080/
查看节点情况
|
Spark With Tachyon测试
1 2 3 4 5 6 7 8
| tachyon tfs copyFromLocal /home/test.txt /test
MASTER=spark:
val s = sc.textFile("tachyon-ft://spark-master:19998/test") s.saveAsTextFile("tachyon-ft://activeHost:19998/test_done")
|
Spark On Yarn测试
cluster模式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| * 运行过程中,可以通过Yarn WebUI查看Applications运行情况 */ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ lib/spark-examples*.jar \ 2
16/01/21 11:06:15 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.108.20 ApplicationMaster RPC port: 0 queue: default start time: 1453345452519 final status: SUCCEEDED tracking URL: http: user: root 16/01/21 11:06:15 INFO util.ShutdownHookManager: Shutdown hook called 16/01/21 11:06:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-edf14341-3117-47c7-a96d-9741db4824bf
|
client模式
1
| ./bin/spark-shell --master yarn --deploy-mode client
|