博客 / 詳情

返回

Halo博客的谷歌收錄自動提交

Halo博客的谷歌收錄自動提交

前言

  • 在Halo博客的百度定時頁面提交一文中已經實現了向百度的主動頁面提交,而對於Google平台,實際上並不需要設計類似的功能,一方面Google的基於sitemap的抓取效果已經很好,另一方面,雖然Google也提供了indexing API以提供主動提交的服務,但是需要掛代理才能訪問
  • 但是為了功能的完整性以及可以使用樹莓派直接掛代理訪問,於是決定基於Google indexing API實現谷歌收錄的自動提交

準備工作

  • 實際上,谷歌SEO提供了豐富的文檔供站點管理者學習,但是本文僅摘取其中對於indexing API支持的部分,進行簡要的介紹
  • 全程設置工作需要正常訪問谷歌

獲取訪問令牌

  • indexing API使用了OAuth2.0的驗證方式,請求該API時需要提供訪問令牌,因此第一步,首先在Google Cloud Platform中執行相關設置
  • 進入服務賬號頁面創建項目

    image-20220404164603288

  • 點擊創建服務賬號

    image-20220404164916395

  • 直接點擊完成即可,兩個可選部分不用管

    image-20220404164954814

  • 創建私鑰,注意選擇JSON類型的私鑰

    image-20220404165211467

    image-20220404165232614

    • 執行創建後,私鑰文件會下載到本地

Search Console添加網站

  • 在Search Console添加網站實際上是驗證網站所有權,有多種方法,可參考驗證網站所有權
  • 博主自己使用的是域名提供商的方式,比較簡單,如下圖所示就是驗證成功

    image-20220404165656898

賦予服務賬號所有者狀態

  • 實際上是向第一步創建的服務賬號授予第二步添加的網站的所有權
  • 訪問網站站長中心,計入到網站條目中,點擊添加所有者

    image-20220404170104678

  • 要求輸入服務賬號電子郵件地址,此地址可以從第一步中下載到的私鑰中的client_name字段中找到

項目構建

  • 建立Gradle工程,配置文件如下所示

    plugins {
        id 'java'
        id 'application'
    }
    
    group 'xyz.demoli'
    version '1.0-SNAPSHOT'
    
    sourceCompatibility = 1.11
    mainClassName="xyz.demoli.Main"
    
    repositories {
        mavenCentral()
    }
    
    application{
        applicationDefaultJvmArgs = ['-Duser.timezone=GMT+8']
    }
    
    dependencies {
        testImplementation 'org.junit.jupiter:junit-jupiter-api:5.8.1'
        testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.8.1'
        compile 'com.google.api-client:google-api-client:1.33.0'
        implementation 'com.google.auth:google-auth-library-oauth2-http:1.3.0'
        compile 'com.google.apis:google-api-services-indexing:v3-rev20200804-1.32.1'
        // https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp
        implementation group: 'com.squareup.okhttp3', name: 'okhttp', version: '4.9.3'
        implementation 'com.google.code.gson:gson:2.9.0'
        // https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api
        implementation group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.14.1'
        // https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core
        implementation group: 'org.apache.logging.log4j', name: 'log4j-core', version: '2.14.1'
        // https://mvnrepository.com/artifact/org.projectlombok/lombok
        compileOnly group: 'org.projectlombok', name: 'lombok', version: '1.18.22'
        annotationProcessor group: 'org.projectlombok', name: 'lombok', version: '1.18.22'
    }
    
    test {
        useJUnitPlatform()
    }
    • annotationProcessor group: 'org.projectlombok', name: 'lombok', version: '1.18.22'保證gradle項目中lombok的註解可以被正確解析
    • applicationDefaultJvmArgs參數的設置是為了解決後續服務部署在容器中時日誌打印時間不是東八區時區的問題
  • 配置文件config.properties如下:

    prefix=https://blog.demoli.xyz
    postAPI=%s/api/content/posts?api_access_key=%s&page=%d
    apiAccessKey=***
    proxyURL=192.168.0.137
    proxyPort=7890
    • apiAccessKey是在Halo博客設置中設定的

      image-20220327184136467

    • prefix是Halo博客的首頁訪問URL
    • proxy的兩個配置即是代理配置
  • 日誌配置文件如下(粗糙的配置):

    <?xml version="1.0" encoding="utf-8" ?>
    
    <configuration status="INFO">
        <appenders>
            <console name="console" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
            </console>
        </appenders>
    
        <loggers>
            <root level="INFO">
                <appender-ref ref="console"/>
            </root>
        </loggers>
    </configuration>
  • 將準備工作中得到的私鑰放在項目的resources目錄下,更名為cred.json
  • 整個工程只有兩個核心類

    • PostScrap

      import com.google.gson.Gson;
      import com.google.gson.JsonArray;
      import com.google.gson.JsonElement;
      import com.google.gson.JsonObject;
      import java.io.IOException;
      import java.io.InputStream;
      import java.util.ArrayList;
      import java.util.HashSet;
      import java.util.List;
      import java.util.Properties;
      import java.util.Set;
      import java.util.stream.Collectors;
      import okhttp3.OkHttpClient;
      import okhttp3.Request;
      import okhttp3.Response;
      
      /**
       * 使用Halo API獲取文章鏈接
       */
      public class PostScrap {
      
          static private String postAPI;
          static private String apiAccessKey;
          static private String prefix;
          // 緩存
          static private final Set<String> links = new HashSet<>();
      
          // 注意properties配置文件中字符串不用加引號
          static {
              try (InputStream stream = PostScrap.class.getResourceAsStream("/config.properties")) {
                  Properties properties = new Properties();
                  properties.load(stream);
                  apiAccessKey = properties.getProperty("apiAccessKey");
                  prefix = properties.getProperty("prefix");
                  postAPI = properties.getProperty("postAPI");
              } catch (IOException e) {
                  e.printStackTrace();
              }
          }
      
          /**
           * 發起請求獲取全部文章鏈接
           * @return
           */
          public static List<String> getPosts() {
      
              List<String> res = new ArrayList<>();
      
              OkHttpClient client = new OkHttpClient();
              Request initialRequest = new Request.Builder().get().url(String.format(postAPI,prefix,apiAccessKey,0)).build();
      
              try (Response response = client.newCall(initialRequest).execute()) {
                  res = handlePage(response, client);
              } catch (IOException e) {
                  e.printStackTrace();
              }
              return res;
          }
      
          /**
           * 處理分頁
           * @param initialResponse
           * @param client
           * @return
           * @throws IOException
           */
          private static List<String> handlePage(Response initialResponse, OkHttpClient client) throws IOException {
      
              JsonObject jsonObject = new Gson().fromJson(initialResponse.body().string(), JsonObject.class);
              JsonArray array = jsonObject.get("data").getAsJsonObject().get("content").getAsJsonArray();
              int pages = jsonObject.get("data").getAsJsonObject().get("pages").getAsInt();
      
              // jsonArray轉為List
              List<String> posts = new ArrayList<>();
              for (JsonElement element: array) {
                  posts.add(element.getAsJsonObject().get("fullPath").getAsString());
              }
      
              // 分頁查詢
              for(int i = 1; i < pages; i++) {
                  Request request = new Request.Builder().get().url(String.format(postAPI,prefix,apiAccessKey,i)).build();
                  try (Response response = client.newCall(request).execute()) {
                      jsonObject = new Gson().fromJson(response.body().string(), JsonObject.class);
                      array = jsonObject.get("data").getAsJsonObject().get("content").getAsJsonArray();
                      for (JsonElement element: array) {
                          posts.add(element.getAsJsonObject().get("fullPath").getAsString());
                      }
                  } catch (IOException e) {
                      e.printStackTrace();
                  }
              }
      
              // 緩存過濾
              return posts.stream().map(content -> prefix + content).filter(links::add).collect(
                  Collectors.toList());
          }
      }
    • GoogleSubmitter

      import com.google.api.client.googleapis.GoogleUtils;
      import com.google.api.client.http.HttpRequestInitializer;
      import com.google.api.client.http.HttpResponse;
      import com.google.api.client.http.HttpTransport;
      import com.google.api.client.http.javanet.NetHttpTransport;
      import com.google.api.client.json.JsonFactory;
      import com.google.api.client.json.gson.GsonFactory;
      import com.google.api.services.indexing.v3.Indexing;
      import com.google.api.services.indexing.v3.model.UrlNotification;
      import com.google.auth.http.HttpCredentialsAdapter;
      import com.google.auth.oauth2.GoogleCredentials;
      import java.io.IOException;
      import java.io.InputStream;
      import java.net.InetSocketAddress;
      import java.net.Proxy;
      import java.security.GeneralSecurityException;
      import java.util.List;
      import java.util.Properties;
      import lombok.extern.log4j.Log4j2;
      
      /**
       * 提交谷歌收錄
       */
      @Log4j2
      public class GoogleSubmitter {
      
      
          private static GoogleCredentials googleCredentials;
          private static String proxyURL;
          private static Integer proxyPort;
      
          static {
              // 加載秘鑰
              try (InputStream stream = PostScrap.class.getResourceAsStream("/cred.json")) {
                  googleCredentials = GoogleCredentials.fromStream(stream);
              } catch (IOException e) {
                  e.printStackTrace();
              }
              // 加載配置文件
              try (InputStream config = PostScrap.class.getResourceAsStream("/config.properties")) {
                  Properties properties = new Properties();
                  properties.load(config);
                  proxyURL = properties.getProperty("proxyURL");
                  proxyPort = Integer.parseInt(properties.getProperty("proxyPort"));
              } catch (IOException e) {
                  e.printStackTrace();
              }
          }
      
          /**
           * 設置本地代理
           *
           * @return
           * @throws GeneralSecurityException
           * @throws IOException
           */
          static HttpTransport newProxyTransport() throws GeneralSecurityException, IOException {
              NetHttpTransport.Builder builder = new NetHttpTransport.Builder();
              builder.trustCertificates(GoogleUtils.getCertificateTrustStore());
              builder.setProxy(new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyURL, proxyPort)));
              return builder.build();
          }
      
      
          /**
           * 提交鏈接
           */
          public static void submit() {
      
              // 獲取待提交鏈接
              List<String> urls = PostScrap.getPosts();
              if (urls.size() == 0) {
                  log.info("無新增文章");
                  return;
              }
      
              try {
                  // 構建indexing服務
                  // HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
                  HttpTransport httpTransport = newProxyTransport();
                  JsonFactory jsonFactory = GsonFactory.getDefaultInstance();
                  HttpRequestInitializer requestInitializer =
                      new HttpCredentialsAdapter(googleCredentials);
                  Indexing indexing = new Indexing(httpTransport, jsonFactory, requestInitializer);
                  Indexing.UrlNotifications notifications = indexing.urlNotifications();
      
                  int count = 0;
                  for (String url : urls) {
                      UrlNotification notification = new UrlNotification();
                      notification.setUrl(url);
                      // URL_REMOVED 或者 URL_UPDATED
                      notification.setType("URL_UPDATED");
                      Indexing.UrlNotifications.Publish publish = notifications.publish(notification);
                      HttpResponse response = publish.executeUnparsed();
                      if (response.getStatusCode() != 200) {
                          log.error("提交失敗: {}", url);
                      } else {
                          log.info("提交成功: {}", url);
                          count++;
                      }
                  }
                  log.info("提交成功 {} 條鏈接", count);
              } catch (GeneralSecurityException | IOException e) {
                  e.printStackTrace();
              }
          }
      }
  • Main

    import java.util.concurrent.Executors;
    import java.util.concurrent.TimeUnit;
    
    public class Main {
    
        public static void main(String[] args) {
            Executors.newScheduledThreadPool(1)
                .scheduleWithFixedDelay(GoogleSubmitter::submit, 0, 12, TimeUnit.HOURS);
        }
    }

工程部署

  • 項目根目錄執行gradle build -x test
  • build/distributions/GoogleSubmit-1.0-SNAPSHOT.tar拷貝到安裝有Java環境的服務器

    tar xf GoogleSubmit-1.0-SNAPSHOT.tar
    cd GoogleSubmit-1.0-SNAPSHOT
    nohup bin/GoogleSubmit > nohup.out &
  • tail -f nohup.out查看日誌

補充

  • 博主是一個Docker容器的究極愛好者,因為使用容器可以保證宿主機環境的”純淨“,所以這裏補充使用Docker容器部署服務的方式
  • 首先將項目構建得到的軟件包build/distributions/GoogleSubmit-1.0-SNAPSHOT.tar拷貝到服務器,解壓並重新命名,創建Dockerfile

    tar xf GoogleSubmit-1.0-SNAPSHOT.tar
    mkdir -p blogSubmitter/googleSubmitter
    mv GoogleSubmit-1.0-SNAPSHOT blogSubmitter/googleSubmitter/google
    cd blogSubmitter/googleSubmitter
    touch Dockerfile
  • Dockerfile文件如下:

    FROM openjdk:11
    COPY . /submitter
    WORKDIR /submitter
    # 更改時區
    RUN rm -rf /etc/localtime
    RUN ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
    CMD ["nohup","google/bin/GoogleSubmitter"," &"]
  • 創建yaml配置文件,使用Docker Compose構建服務

    cd blogSubmitter
    touch submitter.yaml
    version: '3.1'
    services:
      blog-google-submitter:
        build: ./googleSubmitter
        container_name: blogGoogleSubmitter
        restart: unless-stopped
  • 執行docker-compose -f submitter.yaml up -d創建服務

注意事項

  • 如果更改了源碼,需要重新構建鏡像,此時要把之前的鏡像刪除(應該有更好的解決辦法,有待改善,比如使用volume的方式執行掛載)

參考

  • Indexing API
  • Google Indexing API(Python)
  • 谷歌搜索中心
  • Indexing API錯誤
  • Google API Java Client Services
  • Google OAuth Java Client
  • Google API Java Client
  • 如何提高谷歌收錄
  • Gradle Application Plugin
  • 解決Docker容器和宿主機時間不一致的問題
user avatar pengjiyuan 頭像 qianduanmeizi_5b62dc5d1ac4d 頭像 bruceeewong 頭像
3 位用戶收藏了這個故事!

發佈 評論

Some HTML is okay.