基於 HDP2.4安裝(五):集羣及組件安裝  創建的hadoop集羣,修改默認配置,將hbase 存儲配置為 Azure Blob Storage

目錄:

  • 簡述
  • 配置
  • 驗證
  • FAQ

簡述:


  • hadoop-azure 提供hadoop 與 azure blob storage 集成支持,需要部署 hadoop-azure.jar
  • 配置成功後,讀寫的數據都存儲在 Azure Blob Storage account
  • 支持配置多個 Azure Blob Storage account, 實現了標準的 Hadoop FileSystem interface
  • Reference file system paths using URLs using the wasb scheme.
  • Tested on both Linux and Windows. Tested at scale.
  • Azure Blob Storage 包含三部分內容:
  1. Storage Account: All access is done through a storage account
  2. Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
  3. Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata

配置 :


  • 在 china Azure  門户創建一個 blob storage Account, 如下圖命名:localhbase
  • 配置訪問 Azure blob storage 訪問證書及key以及切換文件系統配置,本地 hadoop  core-site.xml 文件,內容如下 
<property>
  <name>fs.defaultFS</name>
  <value>wasb://localhbase@localhbase.blob.core.chinacloudapi.cn</value>
</property>
<property>
  <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name>
  <value>YOUR ACCESS KEY</value>
</property>
  • 在大多數場景下Hadoop clusters, the core-site.xml file is world-readable,為了安全起見,可通過配置將Key加密,然後通過配置的程序對key進行解密,此場景下的配置如下(基於安全考慮的可選配置):
<property>
  <name>fs.azure.account.keyprovider.localhbase.blob.core.chinacloudapi.cn</name>
  <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
</property>
<property>
  <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name>
  <value>YOUR ENCRYPTED ACCESS KEY</value>
</property>
<property>
  <name>fs.azure.shellkeyprovider.script</name>
  <value>PATH TO DECRYPTION PROGRAM</value>
</property>
  • Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob
  • Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times
  • 1TB in size, larger than the maximum 200GB size
  • In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir
<property>
   <name>fs.azure.page.blob.dir</name>
   <value>/hbase/WALs,/hbase/oldWALs,/mapreducestaging,/hbase/MasterProcWALs,/atshistory,/tezstaging,/ams/hbase</value>
</property>

驗證: 


  • 上面的參數配置均在 ambari 中完成,重啓參數依賴的服務
  • hdfs dfs -ls /hbase/data/default   
  • 參見 HBase(三): Azure HDInsigt HBase表數據導入本地HBase  將測試表數據導入,完成後如下圖:
  • 命令:./hbase hbck -repair -ignorePreCheckPermission
  • 命令: hbase shell
  • 查看數據,如下圖,則OK
  • 用我們自己開發的查詢工具驗證數據,如下圖,關於工具的開發見下一章

FAQ


  • ambari collector不要與regionserver一台機器
  • 配置ha一定要在更改數據目錄到wasb之前
  • hadoop core-site.xml增加以下配置,否則mapreduce2組件會起不來,(注意impl為小寫)
<property>         
  <name>fs.AbstractFileSystem.wasb.impl</name>                           
  <value>org.apache.hadoop.fs.azure.Wasb</value> 
</property>
  • 本地自建集羣,配置HA,修改集羣的FS為 wasb, 然後將原hbase集羣物理文件目錄copy至新建的blob storage, 此時,在使用phoenix插入帶有索引的表數據時出錯,修改hbase-site.xml配置如下:
<property>         
  <name>hbase.regionserver.wal.codec</name>                           
  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> 
</property>
  •