簡單的PHP多線程爬蟲框架querylist實踐（應用於thinkphp5+）詳情 - php,php7,thinkphp,thinkphp5 oooonline 動態日志

動態

詳情

返回

簡單的PHP多線程爬蟲框架querylist實踐（應用於thinkphp5+） - 動態詳情

08:07 上午 · 11月 04 ,2025

php在多線程爬蟲這塊確實很薄弱，但也是存在可行易實現的方案的。

實踐框架：thinkphp5

要實現這個功能，需要安裝兩個包：

jaeger/querylist：可以實現一些爬網頁常用的語法，比如xPath
jaeger/querylist-curl-multi：實現多線程發起網絡操作的包
querylist的優點是安裝簡單、無坑，在命令行和接口都可以使用。

相關文檔：
http://www.querylist.cc/docs/...
http://www.querylist.cc/docs/...

實現步驟：

1.安裝包：

composer require jaeger/querylist
composer require jaeger/querylist-curl-multi

2.php文件：

use QL\QueryList;
use QL\Ext\CurlMulti;

//爬取列表
public function spider(){
  $urlPool = [];
  $startPage = 1;  //從第幾頁開始爬取
  $workerNum = 10;  //併發執行的數量
  $host = 'https://xxxxxx?page=';
  $nowPage = 1;  //執行中用到的暫存計數器
  while(1){

      //生成要爬取的鏈接，每次循環打印$workerNum頁數據
      for($i=1;$i<=$workerNum;$i++){
          $urlPool[] = $host.$nowPage;
          $nowPage++;
      }

      $ql = QueryList::use(CurlMulti::class);
      $ql->curlMulti($urlPool)

      // 每個任務成功完成調用此回調
      ->success(function (QueryList $ql,CurlMulti $curl,$r){

          //此處可以用xpath語法獲取到相應的數據
          //也可以採用別的形式來獲取數據，可查閲文檔
          $data = $ql->find('#hits-list > div:nth-child(n) > div.header > div > a:nth-child(1)')->texts();

          //打印下當前獲取到的鏈接 和 解析到的數據
          Log::write('Current url:'.$r['info']['url']);
          Log::write($data->all());

          //若有複雜邏輯，可以進行調用其他方法進行處理
          SpiderService::getInstance()->insertToDb($data->all());
      })

      // 每個任務失敗回調
      ->error(function ($errorInfo,CurlMulti $curl){
          echo "Current url:{$errorInfo['info']['url']} \r\n";
          print_r($errorInfo['error']);

          //出錯終止，跳出循環
          throw new Exception("報錯結束");
      })

      ->start([
          // 最大併發數
          'maxThread' => $workerNum,
          // 錯誤重試次數
          'maxTry' => 3,
      ]);

      //每次執行完畢，重置鏈接池
      $urlPool = [];
  }
}

php , thinkphp , thinkphp5 , php7

oooonline 動態日志

@tpwonline

標簽

php (185)

thinkphp (21)

thinkphp5 (9)

php7 (8)

動態

簡單的PHP多線程爬蟲框架querylist實踐（應用於thinkphp5+） - 動態詳情

實踐框架：thinkphp5

實現步驟：

Add a new 評論

oooonline 動態日志

@tpwonline

標簽

php (185)

thinkphp (21)

thinkphp5 (9)

php7 (8)

動態

簡單的PHP多線程爬蟲框架querylist實踐（應用於thinkphp5+） - 動態 詳情

實踐框架：thinkphp5

實現步驟：

Add a new 評論

簡單的PHP多線程爬蟲框架querylist實踐（應用於thinkphp5+） - 動態詳情