I’m trying to scrape a page for data, but their login process has me stumped. As it’s a ASP Net site, my searches has me including __VIEWSTATE and __VIEWSTATEGENERATOR, but I cannot find __EVENTTARGET or __EVENTVALIDATION, not sure if they can be missing sometimes.
__VIEWSTATE
__VIEWSTATEGENERATOR
__EVENTTARGET
__EVENTVALIDATION
The Website login page has this form (Personal data get’s prefilled, so * those):
<form method="get" action="./login.aspx" id="validateSubmitForm" autocomplete="off" novalidate=""> <div class="aspNetHidden"> <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="*****long viewstate here****" /> </div> <div class="aspNetHidden"> <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="******" /> </div> <div class="row"> <div class="form-group col-md-12 mb-4"> <!-- <input type="email" class="form-control input-lg" id="email" aria-describedby="emailHelp" placeholder="email"> --> <input name="TextBox1N" type="text" value="*******" id="TextBox1N" title="Username" class="form-control input-lg" placeholder="Username" /> </div> <div class="form-group col-md-12 "> <!-- <input type="password" class="form-control input-lg" id="password" placeholder="Password"> --> <input name="TextBox2N" type="password" id="TextBox2N" class="form-control input-lg" placeholder="Password" value="******" /> </div> <div class="form-group col-md-12 "> </div> <div class="col-md-12"> <div class="d-flex justify-content-between mb-3"> <div class="custom-control custom-checkbox mr-3 mb-3"> <!-- <input type="checkbox" class="custom-control-input" id="customCheck2"> <label class="custom-control-label" for="customCheck2">Remember me</label> --> <input id="CheckBox1N" type="checkbox" name="CheckBox1N" checked="checked" /> <span id="remember_meN" for="CheckBox1N">Remember me</span> </div> <a class="text-color" href="remember.aspx"> Remember </a> </div> <!-- <button type="submit" class="btn btn-primary btn-pill mb-4" style="width:100% !important">Sign In</button> --> <input type="submit" name="Button1N" value="Sign in" id="Button1N" class="btn btn-primary btn-pill mb-4" style="width:100% !important" /> <p> Don't have an account yet ? <a class="text-blue" href="registrati.aspx"> Sign</a> <!-- <input type="submit" name="Button2N" value="Sign up" id="Button2N" class="text-blue" /> --> </p> </div> </div> </form>
What I’ve cobbled together so far is (url and login info masked):
from bs4 import BeautifulSoup import requests #Session Setup s = requests.Session() s.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}) uName='******' pwd ='******' #Load page url='http://***/login.aspx' r = s.get(url) soup = BeautifulSoup(r.text, 'html.parser') #Set params paramsPost = {"TextBox1N": uName, "TextBox2N": pwd, "CheckBox1N": "on", "Button1N": "Sign in" } #Add __VIEWSTATE params paramsPost['__VIEWSTATE'] = soup.find('input', id='__VIEWSTATE')['value'] paramsPost['__VIEWSTATEGENERATOR'] = soup.find('input', id='__VIEWSTATEGENERATOR')['value'] #Login to a GET form req = requests.Request('GET', url, data=paramsPost) prep = req.prepare() pUrl = prep.url+'?'+prep.body #this was mostly done so I could print the full url and verify against a browser generated one r = s.get(url)
For posterity I have also tried the following:
r = s.post(url, data=paramsPost) print(r.url)
Both ways just send me to the ./error.aspx page.
Logging in with a browser and inspecting the network shows a GET request was made, __VIEWSTATE, __VIEWSTATEGENERATOR, TextBox1N, TextBox2N, CheckBox1N and Button1N was added to the Request URL. Status 302 returned and then redirected to ./dashboardAssets.aspx
GET
TextBox1N
TextBox2N
CheckBox1N
Button1N
Status 302
Interestingly, __VIEWSTATE my code returns is shorter than the __VIEWSTATE my browser returns. Is this related?
Everything I Google or Search on SO points to __EVENT params, but I can’t locate them, so not sure this site needs them.
__EVENT
Any other ideas I can try?
Based on the HTML form you provided, it seems that the form is submitted using a GET request, and the parameters are added to the URL. However, in your code, you are using requests.Request with a GET method, but then you are making a GET request using s.get(url) separately. You should be using s.get(pUrl) to include the parameters in the URL.
requests.Request
s.get(url)
s.get(pUrl)
Here’s a modified version of your code:
from bs4 import BeautifulSoup import requests # Session Setup s = requests.Session() s.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}) uName = '******' pwd = '******' # Load page url = 'http://***/login.aspx' r = s.get(url) soup = BeautifulSoup(r.text, 'html.parser') # Set params paramsPost = {"TextBox1N": uName, "TextBox2N": pwd, "CheckBox1N": "on", "Button1N": "Sign in" } # Add __VIEWSTATE params paramsPost['__VIEWSTATE'] = soup.find('input', id='__VIEWSTATE')['value'] paramsPost['__VIEWSTATEGENERATOR'] = soup.find('input', id='__VIEWSTATEGENERATOR')['value'] # Prepare the GET request URL req = requests.Request('GET', url, params=paramsPost) prep = req.prepare() pUrl = prep.url # Make the GET request r = s.get(pUrl) # Print the final URL and check the response print(pUrl) print(r.url) print(r.text)
This should mimic the behavior of the browser by appending the parameters to the URL. Make sure to check the response (r.text) for any error messages or clues about why the login is not successful. Also, note that there might be additional JavaScript-based actions or headers required for successful login in more complex scenarios.
r.text