這是一個(gè)Python腳本,用于從名為“xiezhen”的網(wǎng)站下載圖片。它使用請求庫向網(wǎng)站發(fā)送 HTTP 請求,并使用 lxml 庫解析響應的 HTML 內容。
該腳本定義了兩個(gè)函數:download_image和process_page。該download_image函數以一個(gè)URL和一個(gè)文件路徑為參數,向URL發(fā)送請求獲取圖片內容,并將內容保存到文件路徑中。該process_page函數以頁(yè)碼為參數,向網(wǎng)站發(fā)送請求獲取該頁(yè)面的HTML內容,解析內容提取圖片頁(yè)面的URL,然后為每個(gè)圖片URL調用該函數下載對應的download_image圖片圖像。
該腳本還使用了concurrent.futures模塊創(chuàng )建了一個(gè)線(xiàn)程池,這樣可以并行發(fā)送多個(gè)請求和下載圖片,可以提高下載速度。
在主程序塊中,腳本創(chuàng )建一個(gè)線(xiàn)程池,并向該池提交任務(wù)以從網(wǎng)站的所有頁(yè)面下載圖像。最后,腳本在退出之前等待所有任務(wù)完成。
import timeimport requestsfrom lxml import etreeimport osimport concurrent.futures def download_image(url, img_path): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) img_name = url.split('/')[-1] with open(os.path.join(img_path, img_name), 'wb') as f: f.write(response.content) print(f'圖片:{img_path}' + '/' + f'{img_name}下載完成!') def process_page(page): url = f'https://www.xiezhen.xyz/page/{page}' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) html = etree.HTML(response.content) mail_url = html.xpath('//div[@class='excerpts']/article/a/@href') for url in mail_url: response = requests.get(url, headers=headers) html = etree.HTML(response.content) sub_url = html.xpath('//article/p/img') img_title = html.xpath('//title/text()')[0].split('-')[0] img_path = f'J:/xiezhen/{img_title}' if not os.path.exists(img_path): os.makedirs(img_path) with concurrent.futures.ThreadPoolExecutor() as executor: futures = [] for s_url in sub_url: img_url = s_url.attrib['src'] futures.append(executor.submit(download_image, img_url, img_path)) for future in concurrent.futures.as_completed(futures): pass time.sleep(0.5) if __name__ == '__main__': with concurrent.futures.ThreadPoolExecutor() as executor: futures = [] for page in range(1, 573): futures.append(executor.submit(process_page, page)) for future in concurrent.futures.as_completed(futures): pass聯(lián)系客服