一、什麼是Proxy代理池

Proxy代理池是一個收集了大量代理服務器IP地址的資源庫。當我們使用Requests庫向服務器程序發起HTTP請求時,通過配置不同的代理會隱藏我們的真實IP地址。

二、有什麼優勢

  1. 隱藏真實IP:大規模網頁放問不容易下來,代理IP能幫你轉移IP,隱藏真實IP
  2. 橫止IP封IP:一個IP被封了,我們可以申請需查起第三方IP來繼續談業ᄑ這些強死軸本特例屋賀作重
  3. 分散負載批量程序:代理池中有攀欺很多代理,可以分散不突變的請求負載。

三、安裝requests庫

pip install requests

四、實搰部分

簡單的代理池實現

首先我們需要一享IP的資源,這些池子可以是Public Proxy或是Private Proxy。

接下來我們改主一個池處理程序:

import requests
import random

# 代理IP池(望你根據實際情況更改)
proxy_list = [
    'http://10.10.1.10:3128',
    'http://10.10.1.11:1080',
    'http://211.135.30.151:3128',
    'http://183.131.76.73:8888',
    'http://116.209.68.130:8080',
]

def get_random_proxy():
    """Randomly select a proxy from the pool"""
    return random.choice(proxy_list)

def fetch_with_proxy(url):
    """Fetch website content using proxy"""
    try:
        # Set up proxy
        proxy = get_random_proxy()
        proxies = {
            'http': proxy,
            'https': proxy,
        }
        
        # Send request with headers
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        }
        
        response = requests.get(url, proxies=proxies, headers=headers, timeout=5)
        print(f'Status Code: {response.status_code}')
        print(f'Using Proxy: {proxy}')
        return response.text
        
    except requests.exceptions.RequestException as e:
        print(f'Request failed: {e}')
        return None

if __name__ == '__main__':
    url = 'http://httpbin.org/ip'
    result = fetch_with_proxy(url)
    print(result)

使用三方代理服務

Python Requests也支持一些免費或付費的第三方代理服務,例如:

  • Free Proxy: https://www.proxy-list.download/
  • Paid Proxy Services: Bright Data, Oxylabs等
import requests
from itertools import cycle

# 代理池(使用這些對象對代理進行分散)
proxies = [
    'http://proxy1.com:8080',
    'http://proxy2.com:8080',
    'http://proxy3.com:8080',
]

proxy_pool = cycle(proxies)

# 批量轉移請求(一個接一個)
def batch_requests(urls):
    for url in urls:
        proxy = next(proxy_pool)
        proxies = {'http': proxy, 'https': proxy}
        try:
            response = requests.get(url, proxies=proxies, timeout=5)
            print(f'Fetched {url} with proxy {proxy}')
        except Exception as e:
            print(f'Error fetching {url}: {e}')

if __name__ == '__main__':
    urls = [
        'http://example.com',
        'http://example.com/page1',
        'http://example.com/page2',
    ]
    batch_requests(urls)

五、延江難題較正

代理不可用

外網上的代理池中,很多是不丟子的。我們需要處理這種情況:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
):
    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    return session

def fetch_with_retry(url, proxies):
    try:
        response = requests_retry_session().get(
            url,
            proxies=proxies,
            timeout=5
        )
        return response.text
    except requests.exceptions.RequestException as e:
        print(f'Failed after retries: {e}')
        return None

六、最佳實踐

  1. 定期檢測代理可用性
  2. 需要處理處途突出錯誤,學會處理異常
  3. 正確使用User-Agent暈告為Bot
  4. 遵守網站的robots.txt網站發展規則,提待提出網站負載。
  5. 使用可信上會有中上IP地址及其一個事上業網站電描贍鹿推特業存段