小能豆

HTTP 错误 307:Python3 中的临时重定向 - INTRANET

py

代码会生成一系列 URL 并在其中搜索特定字符串。由于网站需要登录信息:

  • 我通过浏览器登录了该网站。
  • 为了进行更多检查,我只尝试了一个完整的 URL(同一个网站),没有任何值和编码,它运行得很好。所以我认为登录信息不应该是问题所在。
  • 我确实尝试通过代码添加登录信息,但由于它本身引发了一系列错误,我想看看这是否真的有必要。也许还有其他不需要登录的解决方案。
  • 最近,我明白了链接是在“内部网”而不是“互联网”。这可能是问题所在吗?

这是代码:

         url ='https://www.aug.ipp.mpg.de/cgibin/sfread_only/isis?'

    shotn = shot_a
    #Shot_a, shot_z, diag and param are user inputs.
    enter code here
    while (shotn <= shot_z):
        values = {'shot': shotn,
                  'exp': 'AUGD',
                  'diag': diag ,
                  'action': 'SignalDetails',
                  'signal': param}



       data = urllib.parse.urlencode(values)
        data = data.encode('utf-8')
        req = urllib.request.Request(url, data)
        resp = urllib.request.urlopen(req)
       #The upper line is line 42- the first error
        respData = resp.read()
       shotn +=1

预期结果将是.txt计算机中的一个文件,其中包含shotn's在相应 URL 中找到的该特定语句。


这是实际结果:

Traceback (most recent call last):
  File "C:/Users/lenovo/PycharmProjects/ url/venv/Final.py", line 42, in <module>
    resp = urllib.request.urlopen(req)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 734, in http_error_302
    new = self.redirect_request(req, fp, code, msg, headers, newurl)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 672, in redirect_request
    raise HTTPError(req.full_url, code, msg, headers, fp)
urllib.error.HTTPError: HTTP Error 307: Temporary Redirect

Process finished with exit code 1

阅读 24

收藏
2024-12-24

共1个答案

小能豆

The error you’re seeing (HTTP Error 307: Temporary Redirect) indicates that the server is trying to redirect your request to another URL temporarily. This is often caused by either a login or session management mechanism on the server that expects certain cookies, headers, or authentication tokens.

1. Redirect and Cookies Handling

  • The 307 Temporary Redirect means that the server is sending a redirect to another URL, which is often related to login sessions or post-login redirection. Since you mentioned that you are logged in through the browser, it seems the site expects a valid session to be passed along with the request.

Solution:
- Session Management: Use a session object in urllib or requests to handle cookies and maintain the session between requests.

Here’s an example using requests to handle the session:

```python
import requests
import urllib.parse

# Create a session to maintain the login state
session = requests.Session()

# URL for the form submission
url = ‘https://www.aug.ipp.mpg.de/cgibin/sfread_only/isis?'

shotn = shot_a
# shot_a, shot_z, diag, param are user inputs

while shotn <= shot_z:
values = {
‘shot’: shotn,
‘exp’: ‘AUGD’,
‘diag’: diag,
‘action’: ‘SignalDetails’,
‘signal’: param
}

   data = urllib.parse.urlencode(values).encode('utf-8')

   # Send the POST request through the session
   response = session.post(url, data=data)

   if response.status_code == 200:
       # If the response is successful, handle the data
       respData = response.text
       # Process respData (e.g., save to a file)
   else:
       print(f"Failed to retrieve data for shot {shotn}, Status Code: {response.status_code}")

   shotn += 1

```

In this example:
- Session Handling: requests.Session() automatically manages cookies and headers, keeping your login session active across requests.
- POST Request: Use session.post instead of urllib to send the data. This allows easier handling of form data, cookies, and redirects.
- Response Handling: You can check if the request was successful by looking at response.status_code.

2. Internal Network (Intranet) Issues

Since you mentioned that the website is on an “internal network” rather than the public internet, this could also be a factor:

  • Access Restrictions: The website might be accessible only from within the internal network or from IPs that are authenticated through a VPN or other internal infrastructure. If you are outside of the internal network, the request might be blocked or redirected to a login page.

Solution:
- Check Network Access: Ensure you have proper access to the internal network from your script. If you’re using a VPN or a proxy for internal network access, you will need to replicate that in your code (e.g., through requests‘ proxy configuration).

3. Check Headers and Authentication

  • Even though you can access the site through the browser, your script might be missing headers (like User-Agent) or other authentication tokens that the website expects.

Solution:
- Mimic the headers from your browser request using the requests library. You can inspect the headers using your browser’s developer tools and replicate them in the script.

Example to add headers:

python headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', # Add any other necessary headers here } response = session.post(url, data=data, headers=headers)

You can find the required headers by inspecting the network traffic in the browser’s developer tools (F12 > Network tab).

4. Redirection Handling

Sometimes, requests or urllib may follow redirects automatically. If you want to handle redirects manually or if the automatic redirection doesn’t work for some reason, you can control it like this:

python response = session.post(url, data=data, allow_redirects=True) # Set allow_redirects=False to handle redirects manually

Summary of Steps:

  1. Use requests.Session() to maintain login state and cookies.
  2. Handle redirects by ensuring you follow or manage them properly.
  3. Mimic browser headers to ensure the server doesn’t block or misinterpret the request.
  4. Verify internal network access if the site is on an intranet.

If you still encounter issues after these adjustments, you may need to check how the site is managing sessions and authentication or whether additional network-specific configurations (like VPNs or proxies) are required for access.

2024-12-24