尝试使用 python 请求抓取该县网站的多个搜索阶段。基本上尝试进行搜索,筛选结果(该代码尚未出现),然后转到该结果页面。收到 asp.net 错误消息The session based SearchQueue is empty. 到目前为止,我的代码可能看起来很长,但我已将我在请求中使用的所有表单数据都包括在内。只是尝试搜索名称“Smith”
The session based SearchQueue is empty.
基本上,我发出一个空请求,抓取__VIEWSTATE其他值,然后发出一个搜索请求,该请求工作正常。然后,我__VIEWSTATE再次从搜索结果页面抓取朋友,并尝试使用我认为是hdLink值的值(尽管我不确定)来跟踪搜索结果。你认为我__EVENTTARGET可能错过了什么吗?我快要疯了,因为我不知道在这里要寻找什么。还发布了错误页面的图片。感谢任何可以提供知识的人。
__VIEWSTATE
hdLink
__EVENTTARGET
测试.py
import CountyFormDataList import requests import json from scrapy import Selector url = "http://property.franklincountyauditor.com/_web/search/CommonSearch.aspx?mode=OWNER" r = requests.post(url) scriptManager = Selector(text=r.text).xpath('//*[@id="ScriptManager1_TSM"]/@value').get() viewState = Selector(text=r.text).xpath('//*[@id="__VIEWSTATE"]/@value').get() viewStateGenerator = Selector(text=r.text).xpath('//*[@id="__VIEWSTATEGENERATOR"]/@value').get() eventValidation = Selector(text=r.text).xpath('//*[@id="__EVENTVALIDATION"]/@value').get() payload = json.loads( "{" + CountyFormDataList.formDataList["CommonSearchASPX"]["search"]["ownerSearch"].format( scriptManager, viewState, viewStateGenerator, eventValidation, "SMITH" ) + "}" ) cookies = CountyFormDataList.formDataList["CommonSearchASPX"]["cookies"] headers = CountyFormDataList.formDataList["CommonSearchASPX"]["headers"] r = requests.post(url, data=payload, cookies=cookies, headers=headers) scriptManager = Selector(text=r.text).xpath('//*[@id="ScriptManager1_TSM"]/@value').get() viewState = Selector(text=r.text).xpath('//*[@id="__VIEWSTATE"]/@value').get() viewStateGenerator = Selector(text=r.text).xpath('//*[@id="__VIEWSTATEGENERATOR"]/@value').get() eventValidation = Selector(text=r.text).xpath('//*[@id="__EVENTVALIDATION"]/@value').get() payload = json.loads( "{" + CountyFormDataList.formDataList["CommonSearchASPX"]["result"]["resultJSON"].format( scriptManager, viewState, viewStateGenerator, eventValidation, "SMITH", "sIndex=0&idx=1" ) + "}" ) r = requests.post(url, data=payload, cookies=cookies, headers=headers) f = open("ohioOutput.html", "w") f.write(r.text) f.close()
CountyFormDataList.py:
formDataList = { "CommonSearchASPX" : { #from commonsearch aspx websites, example: http://property.franklincountyauditor.com/_web/search/CommonSearch.aspx?mode=OWNER "cookies" : { #cookies for search to accept disclaimer 'DISCLAIMER': '1' }, "headers" : { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en-US,en;q=0.9", "Cache-Control": "max-age=0", "Connection": "keep-alive", "Content-Length": "4348", "Content-Type": "application/x-www-form-urlencoded", "Host": "auditor.ashtabulacounty.us", "Origin": "https://auditor.ashtabulacounty.us", "Referer": "https://auditor.ashtabulacounty.us/PT/search/CommonSearch.aspx?mode=OWNER", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "same-origin", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" }, "search" : { "ownerSearch" : """ "ScriptManager1_TSM" : "{}", "__EVENTTARGET" : "btSearch", "__EVENTARGUMENT" : "", "__VIEWSTATE" : "{}", "__VIEWSTATEGENERATOR" : "{}", "__EVENTVALIDATION" : "{}", "PageNum": 1, "SortBy" : "PARID", "SortDir": "asc", "PageSize": 100, "hdAction" : "Search", "hdIndex": 0, "sIndex": -1, "hdListType" : "PA", "hdJur" : "", "hdSelectAllChecked" : "false", "inpOwner" : "{}", "selSortBy" : "PARID", "selSortDir": "asc", "selPageSize": 100, "searchOptions$hdBeta" : "", "btSearch" : "", "hdLink" : "", "AkaCfgResults$hdPins" : "", "ReportsListParIDs" : "", "RadWindow_NavigateUrl_ClientState" : "", "mode" : "OWNER", "mask" : "", "param1" : "", "searchimmediate" : "" """ }, "result" : { #result page, found by clicking a result item on search page "resultJSON" : """ "ScriptManager1_TSM" : "{}", "__EVENTTARGET" : "", "__EVENTARGUMENT" : "", "__VIEWSTATE" : "{}", "__VIEWSTATEGENERATOR" : "{}", "__EVENTVALIDATION" : "{}", "PageNum":1, "SortBy" : "TAXID", "SortDir" : "+asc", "PageSize":100, "hdAction" : "Link", "hdIndex":1, "sIndex":-1, "hdListType" : "PA", "hdJur" : "", "hdSelectAllChecked" : "false", "inpOwner" : "{}", "selSortBy" : "TAXID", "selSortDir" : "+asc", "selPageSize":100, "searchOptions$hdBeta" : "", "hdLink" : "../Datalets/Datalet.aspx?{}", "AkaCfgResults$hdPins" : "", "ReportsListParIDs" : "", "RadWindow_NavigateUrl_ClientState" : "", "mode" : "OWNER", "mask" : "", "param1" : "", "searchimmediate" : "" """ } } }
根据你的描述,你已经成功抓取了初始页面并提取了__VIEWSTATE、__EVENTVALIDATION等隐藏字段值。但在发送后续请求时,遇到了The session based SearchQueue is empty错误。这通常是因为会话状态丢失或关键的表单字段值(如__EVENTTARGET或hdLink)未正确提供。
__EVENTVALIDATION
The session based SearchQueue is empty
以下是可能导致问题的原因及其解决方案:
ASP.NET 网站通常依赖于会话状态(Session)来跟踪用户的交互。确保在抓取过程中保持相同的会话。
解决方法: 使用 requests.Session 来保持会话一致性。例如:
requests.Session
session = requests.Session() # 初始请求 r = session.post(url)
一些隐藏字段可能依赖 JavaScript 在客户端生成,而不是直接包含在 HTML 中。如果这些字段未正确传递,服务器会拒绝请求。
解决方法: 检查页面源代码中的隐藏字段,确保所有需要的字段都被提取并传递到后续请求中。
__EVENTTARGET 和 hdLink 是关键字段,控制页面跳转和操作。根据页面的 HTML 结构,确保正确提取这些值。
POST
以下是改进的提取方式:
scriptManager = Selector(text=r.text).xpath('//*[@id="ScriptManager1_TSM"]/@value').get() viewState = Selector(text=r.text).xpath('//*[@id="__VIEWSTATE"]/@value').get() viewStateGenerator = Selector(text=r.text).xpath('//*[@id="__VIEWSTATEGENERATOR"]/@value').get() eventValidation = Selector(text=r.text).xpath('//*[@id="__EVENTVALIDATION"]/@value').get() hdLink = Selector(text=r.text).xpath('//input[@id="hdLink"]/@value').get()
通过抓包工具(如 Fiddler 或浏览器开发工具)检查真实的 POST 请求,确保你的请求字段和顺序与真实的请求一致。
添加日志记录,打印每个请求的响应文本,确保提取的字段值正确无误。
print("ScriptManager:", scriptManager) print("ViewState:", viewState) print("ViewStateGenerator:", viewStateGenerator) print("EventValidation:", eventValidation) print("hdLink:", hdLink)
以下是改进后的代码片段:
import requests from scrapy import Selector url = "http://property.franklincountyauditor.com/_web/search/CommonSearch.aspx?mode=OWNER" session = requests.Session() # 初始请求,抓取隐藏字段 r = session.post(url) scriptManager = Selector(text=r.text).xpath('//*[@id="ScriptManager1_TSM"]/@value').get() viewState = Selector(text=r.text).xpath('//*[@id="__VIEWSTATE"]/@value').get() viewStateGenerator = Selector(text=r.text).xpath('//*[@id="__VIEWSTATEGENERATOR"]/@value').get() eventValidation = Selector(text=r.text).xpath('//*[@id="__EVENTVALIDATION"]/@value').get() # 提交搜索请求 payload = { "ScriptManager1_TSM": scriptManager, "__EVENTTARGET": "btSearch", "__EVENTARGUMENT": "", "__VIEWSTATE": viewState, "__VIEWSTATEGENERATOR": viewStateGenerator, "__EVENTVALIDATION": eventValidation, "inpOwner": "SMITH", "hdAction": "Search", } r = session.post(url, data=payload) print(r.text) # 确认搜索结果页面是否正确 # 进入结果页面 viewState = Selector(text=r.text).xpath('//*[@id="__VIEWSTATE"]/@value').get() eventValidation = Selector(text=r.text).xpath('//*[@id="__EVENTVALIDATION"]/@value').get() hdLink = Selector(text=r.text).xpath('//input[@id="hdLink"]/@value').get() payload = { "ScriptManager1_TSM": scriptManager, "__EVENTTARGET": "", "__EVENTARGUMENT": "", "__VIEWSTATE": viewState, "__EVENTVALIDATION": eventValidation, "hdLink": hdLink, "hdAction": "Link", } r = session.post(url, data=payload) print(r.text) # 检查结果页面是否正确
Content-Type
Referer
如仍有问题,请分享具体错误日志或抓包结果,以进一步调试。