如何将多个值放入循环并将结果存为 csv python selenium

小能豆

如何将多个值放入循环并将结果存为 csv python selenium

阅读 43

2024-12-12

共1个答案

小能豆

如果您想通过 Selenium 循环多个值（例如搜索关键词、产品链接等），并将每次循环的结果保存到 CSV 文件，可以按照以下步骤实现。

示例：抓取多个商品价格并保存为 CSV

步骤

定义输入值列表：包含要循环的值（如商品链接）。
使用 Selenium 抓取数据：逐个访问链接并提取需要的数据。
保存结果到 CSV：将每次提取的数据写入 CSV 文件。

完整代码

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
import csv

# 输入的商品链接列表
product_urls = [
    "https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/",
    "https://www.amazon.com/Python-Crash-Course-2nd-Edition/dp/1593279280/",
    # 添加更多链接...
]

# 初始化 Selenium WebDriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")  # 无头模式（可选）
driver = webdriver.Chrome(options=options)

# 打开 CSV 文件，准备写入
with open("amazon_prices.csv", "w", newline="", encoding="utf-8") as csvfile:
    csvwriter = csv.writer(csvfile)
    # 写入表头
    csvwriter.writerow(["Product URL", "Price", "Title"]) 

    for url in product_urls:
        try:
            driver.get(url)
            time.sleep(3)  # 等待页面加载

            # 获取商品价格
            try:
                price_element = driver.find_element(By.ID, "kindle-price")
                price = price_element.text.strip()
            except:
                price = "Price not found"

            # 获取商品标题
            try:
                title_element = driver.find_element(By.ID, "productTitle")
                title = title_element.text.strip()
            except:
                title = "Title not found"

            # 打印结果（调试用）
            print(f"Scraped {url} -> Price: {price}, Title: {title}")

            # 写入到 CSV 文件
            csvwriter.writerow([url, price, title])
        except Exception as e:
            print(f"Error scraping {url}: {e}")
            csvwriter.writerow([url, "Error", "Error"])

# 关闭 WebDriver
driver.quit()

代码解释

输入值列表 (product_urls)：
包含多个需要抓取的商品链接。
循环抓取：
对每个链接调用 driver.get(url) 加载页面。
使用 Selenium 定位价格和标题元素。
如果元素不存在，则捕获异常并记录“未找到”。
CSV 文件保存：
使用 Python 的 csv 模块写入数据。
每次抓取的结果写入一行。
异常处理：
捕获抓取失败的异常，并在 CSV 文件中记录错误。

运行后生成的 CSV 文件格式

Product URL	Price	Title
[链接1]	$19.99	Automate the Boring Stuff with Python
[链接2]	$25.99	Python Crash Course

注意事项

等待时间 (time.sleep)：
某些页面加载较慢，可能需要适当调整等待时间。
可用 Selenium 的 WebDriverWait 替代固定等待时间。
反爬机制：
Amazon 等网站可能会检测频繁请求，添加 headers 和代理来模仿正常用户行为。
文件保存路径：
CSV 文件将保存在当前工作目录。如果需要特定路径，可以修改 open 函数中的路径。
动态内容处理：
如果价格信息是通过 JavaScript 动态加载的，可能需要进一步调试定位。

通过这种方式，您可以批量抓取数据并保存为结构化的 CSV 文件。

2024-12-12