目錄

  • 條件
  • 安裝
  • scala
  • 發到虛擬機上,解壓
  • 配置環境變量
  • 配置SCALA_HOME,然後在PATH變量後加上`:$SCALA_HOME/bin`
  • 驗證
  • spark
  • 下載
  • 解壓
  • 配置環境變量
  • 配置SPARK_HOME,在PATH後加上`:$SPARK_HOME/bin`
  • 配置文件
  • copy一份 slaves.template到slaves中,並進行修改
  • 測試
  • 啓動Spark Shell命令行窗口
  • 啓動spark
  • 訪問http://192.168.199.100:8080/
  • 怎麼查看job
  • 啓動歷史服務器,需要指定hdfs目錄(hdfs目錄連不上或者不存在,那就無法啓動)
  • 訪問
  • 配置,不想每次都輸入hdfs目錄
  • spark-submit測試

條件

jdk
hadoop

安裝

scala

https://www.scala-lang.org/download/spark2支持hint嗎_spark2支持hint嗎

發到虛擬機上,解壓
tar -zxvf ./scala-2.13.2.tgz
mv scala-2.13.2 /usr/local/
配置環境變量
vim /etc/profile
source /etc/profile
配置SCALA_HOME,然後在PATH變量後加上:$SCALA_HOME/bin
#scala
export SCALA_HOME=/usr/local/scala-2.13.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin
驗證
scala -version
出現
Scala code runner version 2.13.2 -- Copyright 2002-2020, LAMP/EPFL and Lightbend, Inc.

spark

下載

https://www.apache.org/dyn/closer.lua/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgzspark2支持hint嗎_#hadoop_02

解壓
tar -zxvf ./spark-2.4.5-bin-hadoop2.7.tgz 
mv ./spark-2.4.5-bin-hadoop2.7 /usr/local/bigdata/
配置環境變量
vi /etc/profile
source /etc/profile
配置SPARK_HOME,在PATH後加上:$SPARK_HOME/bin
export SPARK_HOME=/usr/local/bigdata/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
配置文件

spark配置給了一些模板,我們可以copy一份改名,修改一下就可以用了

cd ./spark-2.4.5-bin-hadoop2.7/conf/

copy一份spark-env.sh.template到spark-env.sh中,並進行修改

cp ./spark-env.sh.template spark-env.sh

加上

export SCALA_HOME=/usr/local/scala-2.13.2
export JAVA_HOME=/usr/local/jdk1.8
export SPARK_HOME=/usr/local/bigdata/spark-2.4.5-bin-hadoop2.7
export SPARK_MASTER_IP=localhost
export SPARK_EXECUTOR_MEMORY=1G
copy一份 slaves.template到slaves中,並進行修改
cp    slaves.template    slaves
vim slaves

spark2支持hint嗎_#spark_03

測試

跑一個自帶的demo

./bin/run-example   SparkPi   10

spark2支持hint嗎_spark2支持hint嗎_04

啓動Spark Shell命令行窗口
spark-shell
啓動spark
./sbin/start-all.sh

你看start-all.sh 和hadoop命令是不是衝突了
我們改一下名字

mv start-all.sh  start-spark-all.sh 
mv stop-all.sh  stop-spark-all.sh
訪問http://192.168.199.100:8080/

spark2支持hint嗎_#spark_05

jps

10577 Master
10657 Worker
8153 SparkSubmit
10715 Jps

怎麼查看job

每次spark運行完,就無法查看spark application

啓動歷史服務器,需要指定hdfs目錄(hdfs目錄連不上或者不存在,那就無法啓動)

./sbin/start-history-server.sh hdfs://standalone:9000/spark/history

訪問

http://192.168.199.100:18080/?showIncomplete=falsespark2支持hint嗎_#spark_06
大概就是沒有權限的意思
spark2支持hint嗎_spark2支持hint嗎_07

hadoop fs -chgrp -R root/spark

配置,不想每次都輸入hdfs目錄

./run-example org.apache.spark.examples.SparkPi這種是不顯示在spark-submit裏面的

vim spark-defaults.conf

spark.eventLog.enabled  true
spark.eventLog.dir      hdfs://standalone:9000/spark/history
spark.eventLog.compress true
vim spark-env.sh

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://standalone:9000/spark/history"

spark.history.ui.port=18080 調整WEBUI訪問的端口號為18080

spark.history.fs.logDirectory=hdfs://hadoop000:8020/directory 配置了該屬性後,在start-history-server.sh時就無需再顯示的指定路徑

spark.history.retainedApplications=20 指定保存Application歷史記錄的個數,如果超過這個值,舊的應用程序信息將被刪除

直接啓動就可以

./sbin/start-history-server.sh

spark-submit測試

./bin/run-example SparkPi 10是不會記錄在Application中的
跑一個spark-submit試一下

spark-submit \
--master local[2] \
--class RddTest \
--executor-memory 2G \
--total-executor-cores 2 \
/home/hht/sparksubmitdemo-1.0-SNAPSHOT.jar
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>hht</groupId>
    <artifactId>sparksubmitdemo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <spark.version>2.3.0</spark.version>
        <scala.version>2.11</scala.version>
        <hadoop.version>2.6.0</hadoop.version>

        <mysql.version>5.1.38</mysql.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql.version}</version>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.7</source>
                    <target>1.7</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>
import org.apache.spark
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, sql}
import org.apache.spark.sql.SparkSession

object RddTest {
  def main(args: Array[String]): Unit = {
    val sparkSession = new spark.sql.SparkSession
    .Builder()
      .appName("wc")
      .getOrCreate()

    val sc: SparkContext = sparkSession.sparkContext

    val list = List("rose is beautiful", "jennie is beautiful", "lisa is beautiful", "jisoo is beautiful")

    val list_rdd: RDD[String] = sc.parallelize(list)
    list_rdd.foreach(println)

    //把一句話拆分成單詞
    val flat_rdd = list_rdd.flatMap(_.split(" "))
    flat_rdd.foreach(println)

    //將每個單詞變成kv結構
    val map_rdd = flat_rdd.map(word => (word,1))
    map_rdd.foreach(println)

    //聚合
    val reduce_rdd = map_rdd.reduceByKey(_+_)
    reduce_rdd.foreach(println)  }
}