作者:櫰木
1 spark on yarn安裝(每個節點)
cd /root/bigdata/
tar -xzvf spark-3.3.1-bin-hadoop3.tgz -C /opt/
ln -s /opt/spark-3.3.1-bin-hadoop3 /opt/spark
chown -R spark:spark /opt/spark-3.3.1-bin-hadoop3
2 配置環境變量及修改配置
cat /etc/profile.d/bigdata.sh
export SPARK_HOME=/opt/spark
export SPARK_CONF_DIR=/opt/spark/conf
引用變量
source /etc/profile
yarn的capacity-scheduler.xml文件修改配置保證資源調度按照CPU + 內存模式:(每個yarn 節點)
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
在yarn-site.xml開啓日誌功能:
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
修改mapred-site.xml: (每個yarn節點)
<property>
<name>mapreduce.jobhistory.address</name>
<value>hd1.dtstack.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hd1.dtstack.com:19888</value>
</property>
cd /opt/spark/conf
Spark 配置文件 (每個spark節點)
cat spark-defaults.conf
spark.eventLog.dir=hdfs:///user/spark/applicationHistory
spark.eventLog.enabled=true
spark.yarn.historyServer.address=http://hd1.dtstack.com:18018
spark.history.kerberos.enabled=true
spark.history.kerberos.principal=hdfs/hd1.dtstack.com@DTSTACK.COM
spark.history.kerberos.keytab=/etc/security/keytab/hdfs.keytab
Spark 環境配置文件 (每個spark節點)
cat spark-env.sh
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18018 -Dspark.history.fs.logDirectory=hdfs:///user/spark/applicationHistory"
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- 由於需要讀取日誌文件,所以使用hdfs的keytab
創建對應hdfs目錄,並修改權限
hdfs dfs -mkdir -p /user/spark/applicationHistory
hdfs dfs -chown -R spark /user/spark/
提交測試任務
cd /opt/spark
./bin/spark-submit --master yarn --deploy-mode client --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.12-3.3.1.jar
3 啓動spark history server
cd /opt/spark
開啓history server
./sbin/start-history-server.sh

4 查看效果
1)先進入YARN管理頁面查看Spark on Yarn應用,並點擊如下圖的History:
直接訪問histroy server
http://ip:18018
更多技術信息請查看雲掣官網https://yunche.pro/?t=yrgw