代码会生成一系列 URL 并在其中搜索特定字符串。由于网站需要登录信息:
这是代码:
url ='https://www.aug.ipp.mpg.de/cgibin/sfread_only/isis?' shotn = shot_a #Shot_a, shot_z, diag and param are user inputs. enter code here while (shotn <= shot_z): values = {'shot': shotn, 'exp': 'AUGD', 'diag': diag , 'action': 'SignalDetails', 'signal': param} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) #The upper line is line 42- the first error respData = resp.read() shotn +=1
预期结果将是.txt计算机中的一个文件,其中包含shotn's在相应 URL 中找到的该特定语句。
.txt
shotn's
这是实际结果:
Traceback (most recent call last): File "C:/Users/lenovo/PycharmProjects/ url/venv/Final.py", line 42, in <module> resp = urllib.request.urlopen(req) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error result = self._call_chain(*args) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 734, in http_error_302 new = self.redirect_request(req, fp, code, msg, headers, newurl) File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 672, in redirect_request raise HTTPError(req.full_url, code, msg, headers, fp) urllib.error.HTTPError: HTTP Error 307: Temporary Redirect Process finished with exit code 1
The error you’re seeing (HTTP Error 307: Temporary Redirect) indicates that the server is trying to redirect your request to another URL temporarily. This is often caused by either a login or session management mechanism on the server that expects certain cookies, headers, or authentication tokens.
HTTP Error 307: Temporary Redirect
Solution: - Session Management: Use a session object in urllib or requests to handle cookies and maintain the session between requests.
urllib
requests
Here’s an example using requests to handle the session:
```python import requests import urllib.parse
# Create a session to maintain the login state session = requests.Session()
# URL for the form submission url = ‘https://www.aug.ipp.mpg.de/cgibin/sfread_only/isis?'
shotn = shot_a # shot_a, shot_z, diag, param are user inputs
while shotn <= shot_z: values = { ‘shot’: shotn, ‘exp’: ‘AUGD’, ‘diag’: diag, ‘action’: ‘SignalDetails’, ‘signal’: param }
data = urllib.parse.urlencode(values).encode('utf-8') # Send the POST request through the session response = session.post(url, data=data) if response.status_code == 200: # If the response is successful, handle the data respData = response.text # Process respData (e.g., save to a file) else: print(f"Failed to retrieve data for shot {shotn}, Status Code: {response.status_code}") shotn += 1
```
In this example: - Session Handling: requests.Session() automatically manages cookies and headers, keeping your login session active across requests. - POST Request: Use session.post instead of urllib to send the data. This allows easier handling of form data, cookies, and redirects. - Response Handling: You can check if the request was successful by looking at response.status_code.
requests.Session()
session.post
response.status_code
Since you mentioned that the website is on an “internal network” rather than the public internet, this could also be a factor:
Solution: - Check Network Access: Ensure you have proper access to the internal network from your script. If you’re using a VPN or a proxy for internal network access, you will need to replicate that in your code (e.g., through requests‘ proxy configuration).
User-Agent
Solution: - Mimic the headers from your browser request using the requests library. You can inspect the headers using your browser’s developer tools and replicate them in the script.
Example to add headers:
python headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', # Add any other necessary headers here } response = session.post(url, data=data, headers=headers)
You can find the required headers by inspecting the network traffic in the browser’s developer tools (F12 > Network tab).
Sometimes, requests or urllib may follow redirects automatically. If you want to handle redirects manually or if the automatic redirection doesn’t work for some reason, you can control it like this:
python response = session.post(url, data=data, allow_redirects=True) # Set allow_redirects=False to handle redirects manually
If you still encounter issues after these adjustments, you may need to check how the site is managing sessions and authentication or whether additional network-specific configurations (like VPNs or proxies) are required for access.