艾子霏博客:Python爬虫笔记三:微博登录(出师未捷身先死 长使英雄泪满襟)

学习地址:https://www.cnblogs.com/xiao-apple36/articles/8768270.html

完整地址:https://www.cnblogs.com/xiao-apple36/p/8747351.html

以下是学习内容:

具体信息:

https://login.sina.com.cn/sso/prelogin.php

#常规请求 URL: https://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=d2hiZXN0c29mdCU0MDE2My5jb20%3D&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.19)&_=1629269267519请求方法: GET状态代码: 200 OK远程地址: 58.63.236.212:443引用站点策略: strict-origin-when-cross-origin#响应头Cache-Control: no-cache, must-revalidateConnection: keep-aliveContent-Type: application/javascript; charset=utf-8Date: Wed, 18 Aug 2021 06:47:46 GMTDPOOL_HEADER: dryad61Expires: Sat, 26 Jul 1997 05:00:00 GMTP3P: CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"Pragma: no-cachePragma: no-cacheServer: nginx/1.6.1Transfer-Encoding: chunked#请求头Accept: */*Accept-Encoding: gzip, deflate, brAccept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6Connection: keep-aliveHost: login.sina.com.cnReferer: https://weibo.com/sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"sec-ch-ua-mobile: ?0Sec-Fetch-Dest: scriptSec-Fetch-Mode: no-corsSec-Fetch-Site: cross-siteUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73#查询字符串数entry: weibocallback: sinaSSOController.preloginCallBacksu: d2hiZXN0c29mdCU0MDE2My5jb20=rsakt: modcheckpin: 1client: ssologin.js(v1.4.19)_: 1629269267519

https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)

#常规请求 URL: https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)请求方法: POST状态代码: 200 OK远程地址: 58.63.236.212:443引用站点策略: strict-origin-when-cross-origin#响应头Access-Control-Allow-Credentials: trueAccess-Control-Allow-Origin: https://weibo.comConnection: keep-aliveContent-Encoding: gzipContent-Type: text/htmlDate: Wed, 18 Aug 2021 06:47:50 GMTDPOOL_HEADER: dryad52P3P: CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"Pragma: no-cacheServer: nginx/1.6.1Transfer-Encoding: chunkedVary: Accept-Encoding#请求头Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9Accept-Encoding: gzip, deflate, brAccept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6Cache-Control: max-age=0Connection: keep-aliveContent-Length: 636Content-Type: application/x-www-form-urlencodedHost: login.sina.com.cnOrigin: https://weibo.comReferer: https://weibo.com/sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"sec-ch-ua-mobile: ?0Sec-Fetch-Dest: iframeSec-Fetch-Mode: navigateSec-Fetch-Site: cross-siteSec-Fetch-User: ?1Upgrade-Insecure-Requests: 1User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73#查询字符串数client: ssologin.js(v1.4.19)#表单数据entry: weibogateway: 1from: savestate: 7qrcode_flag: falseuseticket: 1pagerefer: vsnf: 1su: d2hiZXN0c29mdCU0MDE2My5jb20=service: miniblogservertime: 1629269272nonce: XL9ZA7pwencode: rsa2rsakv: 1330428213sp: c7aeb0d1a4212ec69daa2943c1eef5ecae9bf04c490657b18b7759a62cfb193667446933d75af0c96e69a4328e63b842256bd9ed6a2fff933caf5ac7bc6d2b91d4fa3d5ed8e609690d8cfb7074ddecd71ebcec8050797b037ac1ecd1f3ee64f52ed721df53434e2993da42fd66248af2137a3de26bbc956371e24e38a214dcb7sr: 2048*1152encoding: UTF-8prelt: 28url: https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBackreturntype: META

这个是反回信息, 目当可能进行不下去了,现在微博加了一个短信或扫码验证,好在验证了上面的都OK

<html><head><title>新浪通行证</title><meta http-equiv="refresh" content="0; url=&#39;https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=2071&reason=%C7%EB%CA%B9%D3%C3%C9%A8%C2%EB%B5%C7%C2%BC&protection_url=https%3A%2F%2Fpassport.weibo.com%2Fprotection%2Findex%3Ftoken%3D2OTFhHLTcAFCwq_w1anCKzI5t8F1gYph1CnByb3RlY3Rpb24.&#39;"/><meta http-equiv="Content-Type" content="text/html; charset=GBK" /></head><body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000"><script type="text/javascript" language="javascript">location.replace("https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=2071&reason=%C7%EB%CA%B9%D3%C3%C9%A8%C2%EB%B5%C7%C2%BC&protection_url=https%3A%2F%2Fpassport.weibo.com%2Fprotection%2Findex%3Ftoken%3D2OTFhHLTcAFCwq_w1anCKzI5t8F1gYph1CnByb3RlY3Rpb24.");</script></body></html>

 https://weibo.com/ajaxlogin.php

#常规请求 URL: https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3请求方法: GET状态代码: 200 远程地址: 180.149.153.187:443引用站点策略: strict-origin-when-cross-origin#响应头cache-control: no-cache, must-revalidatecontent-encoding: gzipcontent-security-policy: block-all-mixed-content;content-type: text/html; charset=utf-8date: Wed, 18 Aug 2021 06:47:51 GMTdpool_header: mapi-weibocom-ug-1-79db94d59-zv67vexpires: Mon, 26 Jul 1997 05:00:00 GMTlast-modified: Wed, 18 Aug 2021 06:47:51 GMTlb: 180.149.153.187pramga: no-cacheserver: nginxssl_node: mweibo-172-16-138-207.yf.intra.weibo.cnvary: Accept-Encoding#请求头:authority: weibo.com:method: GET:path: /ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3:scheme: httpsaccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9accept-encoding: gzip, deflate, braccept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6cookie: SUB=_2AkMWQCjGf8NxqwJRmf0XyG7ib4lwzw3EieKgHNkdJRMxHRl-yj8XqlBatRB6PcAGLUW3iSNyJZOD3zft19l_o65mAXlUif-modified-since: Wed, 18 Aug 2021 06:43:50 GMTreferer: https://login.sina.com.cn/sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"sec-ch-ua-mobile: ?0sec-fetch-dest: iframesec-fetch-mode: navigatesec-fetch-site: cross-siteupgrade-insecure-requests: 1user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73#查询字符串数framelogin: 1callback: parent.sinaSSOController.feedBackUrlCallBacksudaref: weibo.comdisplay: 0retcode: 101reason: (无法对值进行解码)

完整代码

#!/usr/bin/env pythonimport reimport requestsimport timeimport urllib3import base64import jsonimport rsafrom binascii import b2a_hexfrom urllib.parse import quote_plus,unquote_plusfrom bs4 import BeautifulSoupclass Weibo_login(): def __init__(self, user, pwd): urllib3.disable_warnings() # 关闭警告 self.session = requests.Session() self.session.verify = False # 忽略证书认证 self.session.headers = { 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6', 'Connection': 'keep-alive', 'Host': 'login.sina.com.cn', 'Referer': 'https://weibo.com/', 'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Dest': 'script', 'Sec-Fetch-Mode': 'no-cors', 'Sec-Fetch-Site': 'cross-site', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73', } self.session.headers2 = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', #'Content-Length': '636', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'login.sina.com.cn', 'Origin': 'https://weibo.com', 'Referer': 'https://weibo.com/', 'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Dest': 'iframe', 'Sec-Fetch-Mode': 'navigate', 'Sec-Fetch-Site': 'cross-site', 'Sec-Fetch-User': '?1', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73', } self.session.headers3 = { #':authority': 'weibo.com', #':method': 'GET', #':path': '/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3', #':scheme': 'https', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6', #'cookie': 'SUB=_2AkMWQCjGf8NxqwJRmf0XyG7ib4lwzw3EieKgHNkdJRMxHRl-yj8XqlBatRB6PcAGLUW3iSNyJZOD3zft19l_o65mAXlU', 'if-modified-since': 'Wed, 18 Aug 2021 06:43:50 GMT', 'referer': 'https://login.sina.com.cn/', 'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"', 'sec-ch-ua-mobile': '?0', 'sec-fetch-dest': 'iframe', 'sec-fetch-mode': 'navigate', 'sec-fetch-site': 'cross-site', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73', } self.user = user self.pwd = pwd pass def get_Time(self): ''' get time str :return: ''' return str(int(time.time() * 1000)) def get_server_data(self): ''' access pre_login_url get :return: ''' data_dict = { 'entry': 'weibo', 'callback': 'sinaSSOController.preloginCallBack', 'su': self.get_username(), 'rsakt': 'mod', 'checkpin': '1', 'client': 'ssologin.js(v1.4.19)', '_': self.get_Time() #类似:1629268542865 } pre_login_url = 'https://login.sina.com.cn/sso/prelogin.php?' response = self.session.get(pre_login_url, headers=self.session.headers, params=data_dict, verify=self.session.verify) # print(response.text) if response.status_code == 200: html = response.text if html: json_data = re.findall(r'sinaSSOController.preloginCallBack\((.*?)\)', html) # 正则匹配sinaSSOController.preloginCallBack() json_dict = json.loads(json_data[0]) # 把json str转换为字典 # print(json_dict) self.servertime = json_dict['servertime'] self.nonce = json_dict['nonce'] self.rsakv = json_dict['rsakv'] self.exectime = json_dict['exectime'] self.pubkey = json_dict['pubkey'] print('get_server_data servertime={} nonce={} rsakv={}'.format(self.servertime, self.nonce, self.rsakv)) else: print('data is null') else: print('get_server_data response html error !!!') def login(self): """ login weibo :return: """ # preloginTimeStart = int(time.time()*1000) # temp_url = 'https://passport.weibo.com/visitor/visitor?entry=miniblog&a=enter&url=https%3A%2F%2Fweibo.com%2F&domain=.weibo.com&ua=php-sso_sdk_client-0.6.23&_rand=1523284754.9734' # parse_url = quote_plus(temp_url) # 解码url # print(parse_url) # preloginTime = abs((int(time.time()*1000) - preloginTimeStart - self.exectime)) # 得到prelt login_url = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)' # login url username = self.get_username() # get user name print('username base64=', username) pwd = self.get_pwd() print('pwd rsa =', pwd) data_dict = { 'entry': 'weibo', 'gateway': '1', 'from': '', 'savestate': '7', 'qrcode_flag': 'false', 'useticket': '1', # 'pagerefer':parse_url, 'vsnf': '1', 'su': username, 'service': 'miniblog', 'servertime': self.servertime, 'nonce': self.nonce, 'pwencode': 'rsa2', 'rsakv': self.rsakv, 'sp': pwd, 'sr': '2048*1152', 'encoding': 'UTF-8', 'prelt': 28, 'url': 'https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack', 'returntype': 'META' } logining_page = self.session.post(login_url, data=data_dict, headers=self.session.headers2) # logining_page.encoding = 'GBK' # print(logining_page.content.decode('GBK')) # <title>新浪通行证</title> login_loop = logining_page.content.decode('GBK') pa = r'location\.replace\([\'"](.*?)[\'"]\)' loc = re.findall(pa, login_loop) ret=re.findall(r'retcode=(\d+)',loc[0]) if len(ret)>0 and ret[0]!='0': print('无法对值进行解码!') print('重新验证网址:'+unquote_plus(re.findall(r'protection_url=([^&]+)', loc[0])[0])) return #目当可能进行不下去了,现在微博加了一个短信或扫码验证,好在验证了上面的都OK login_html = self.session.get(loc[0], headers=self.session.headers3) login_content = login_html.content.decode('GBK') # "正在登录 ..." if '正在登录' in login_content or 'Signing in' in login_html: pa = r'location\.replace\([\'](.*?)[\']\)' print('正在登录') cross_loc = re.findall(pa, login_content) # print(loc1) cross_html = self.session.get(cross_loc[0], headers=self.session.headers3) cross_data = cross_html.content.decode('GBK') pa = r'parent.sinaSSOController\.feedBackUrlCallBack\((.*?)\)' feedback_data = json.loads(re.findall(pa, cross_data)[0]) print(feedback_data) if feedback_data['result']: print("return result True") uniqueid = feedback_data['userinfo']['uniqueid'] # print(uniqueid) main_html = self.session.get('https://weibo.com/u/{}/home'.format(uniqueid), verify=False).content.decode() soup = BeautifulSoup(main_html, 'lxml') main_title = soup.title.string print(main_title) # 我的首页 微博-随时随地发现新鲜事 else: print('用户登录失败') def get_username(self): """ get base64 username 返回必须是字符串 :return: """ username_quote = quote_plus(str(self.user)) username_base64 = base64.b64encode(username_quote.encode('utf-8')) # base64编码 return username_base64.decode('utf-8') def get_pwd(self): """ 返回rsa加密的密码串 返回必须是字符串 :return: """ rsa_publickey = int(self.pubkey, 16) # 函数用于将一个字符串或数字转换为整型,把16进制字符转换为整型 key = rsa.PublicKey(rsa_publickey, 65537) message = str(self.servertime) + '\t' + str(self.nonce) + '\n' + str(self.pwd) message = message.encode('utf-8') passwd = rsa.encrypt(message, key) passwd = b2a_hex(passwd).decode() # 转换为16进制 return passwdif __name__ == '__main__': user_name = 'myuserid' # 用自己的用户和密码 pwd = 'mypassword' wo = Weibo_login(user_name, pwd) wo.get_server_data() wo.login()

只能得到如下结果:

 https://passport.weibo.com/protection/index?token=2YzlhHMfQAFDPgDZJ289RL-k7NgViHxkjCnByb3RlY3Rpb24.

相关推荐

相关文章