2.安裝第三方庫:requests、BeautifulSoup、selenium
使用管理員身份打開(kāi)CMD或者PowerShell執行以下命令
pip install requests pip install beautifulsoup4 pip install selenium
你也可以下載源碼使用命令:python setup.py install 進(jìn)行安裝,具體的安裝過(guò)程我就不贅述了,實(shí)在安裝不上的話(huà)百度吧。
3.安裝對應的瀏覽器和瀏覽器驅動(dòng)
這里博主用的是火狐瀏覽器(Firefox)的最新版本59.0.2(64位),為什么不用chrome呢?因為最新的chrome驅動(dòng)不支持最新版本的chrome瀏覽器。盡量下載新版的驅動(dòng)以支持瀏覽器,驅動(dòng)可能需要FQ下載。
1.Firefox的驅動(dòng)geckodriver 下載地址:https://github.com/mozilla/geckodriver/releases/
2.chromedriver 下載地址:https://code.google.com/p/chromedriver/downloads/list
有興趣的也可以用chrome+chromedriver,使用時(shí)請注意版本匹配,并更改相應代碼,請參考這個(gè)網(wǎng)頁(yè):點(diǎn)擊打開(kāi)鏈接
嫌麻煩的還是使用firefox+geckodriver吧。
將下面的代碼保存在run.py文件里,執行:python run.py,輸入信息,腳本會(huì )自動(dòng)打開(kāi)瀏覽器開(kāi)始刷課。
看完一節網(wǎng)課后對應的抓包:
當一節視頻看完后瀏覽器會(huì )向服務(wù)器POST saveCacheIntervalTime、saveDatabaseIntervalTime 兩個(gè)數據包以保存進(jìn)度。so,我們只需要偽造這兩個(gè)數據包發(fā)送到服務(wù)器就可以實(shí)現秒課了。要偽造這個(gè)數據包我們需要一些參數:cookie、headers、data。
我們使用selenium模擬登陸網(wǎng)站獲取cookie和頁(yè)面,cookie提供給requests用來(lái)構造數據包,頁(yè)面提供給BeautifulSoup以獲取參數。
通過(guò)抓包軟件或者F12獲取,為了防止被服務(wù)器發(fā)現,我使用了headers里面的全部參數。
saveCacheIntervalTime發(fā)送的參數:
saveDatabaseIntervalTime發(fā)送的參數:
大多數的參數都可以通過(guò)cookie或者從頁(yè)面中得到,把learnTime、studyToalTime設置為視頻對應的長(cháng)度就能刷課了,但是 __learning_token__ 和 ev 這兩個(gè)參數需要分析JS才能得到。
1)__learning_token__
通過(guò)分析videoList.js可以知道__learning_token__是通過(guò) studiedId Base64加密得到的。
搜索定位studiedId可以發(fā)現,studiedId是通過(guò)POST prelearningNote 這個(gè)數據包,并從response里獲得的。

發(fā)送prelearningNote所需要的參數:

2)ev

從上圖可以知道ev是通過(guò)傳遞 [rid, lessonId, lessonVideoId == null ? 0 : lessonVideoId, videoId] 給函數 D24444.Z() 加密得到的。接下來(lái)我們找到函數 D24444.Z() 所在的 e.min.js。

這個(gè)JS使用上面的3個(gè)字符串來(lái)加密方法名,翻譯一下我們可以得到:

再把這段js代碼翻譯為python即可,到這里我們就可以得到所有的參數了,只需要在python里使用requests循環(huán)發(fā)包就行了。
如果網(wǎng)站改動(dòng)了,腳本不能正常運行的話(huà),同樣也可以通過(guò)分析網(wǎng)站的JS來(lái)修改我們的腳本。
#Author : TimoQAQ
#Author's QQ : 2190778650
#Version :1.1
#Date : 2018-4-20
#需要安裝的第三方庫:requests BeautifulSoup selenium+火狐驅動(dòng)
import re
import time
import base64
import requests
from http import cookiejar
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
temp = ''
copyright = ''
speed = 0.2 #刷課時(shí)間間隔,默認200毫秒
#切換到當前頁(yè)面
def convert():
time.sleep(3)
windows = browser.window_handles
browser.switch_to.window(windows[-1])
#加載網(wǎng)頁(yè),獲取cookies和頁(yè)面
def load(u,p,n):
url = 'https://passport.zhihuishu.com/login?service=http://online.zhihuishu.com/onlineSchool/'
browser.get(url)
browser.implicitly_wait(8) #隱式等待頁(yè)面加載
#登錄頁(yè)面
try:
browser.find_element_by_id('lUsername').send_keys(u) #模擬輸入用戶(hù)名
browser.find_element_by_id('lPassword').send_keys(p) #模擬輸入密碼
browser.find_element_by_css_selector('.wall-sub-btn').click() #模擬點(diǎn)擊登錄按鈕
print('登錄成功...')
except:
print('登錄失敗!')
exit()
#選擇一門(mén)課程
try:
time.sleep(3) #可加大參數
browser.find_elements_by_css_selector('.courseImgs')[n].click() #點(diǎn)擊對應課程
convert()
print('進(jìn)入視頻頁(yè)面...')
except:
#關(guān)閉彈出界面
try:
browser.find_element_by_id('close_windowa').click() #點(diǎn)擊關(guān)閉彈出界面
time.sleep(1) #可加大參數
print('關(guān)閉彈出界面...')
browser.find_elements_by_css_selector('.courseImgs')[n].click() #點(diǎn)擊對應課程
convert()
print('進(jìn)入視頻頁(yè)面...')
except:
print('進(jìn)入視頻頁(yè)面失敗!')
exit()
#保存cookies到requests.session
cookie = ["'" + item["name"] + "':'" + item["value"] + "'" for item in browser.get_cookies()]
cookiestr = ','.join(item for item in cookie)
cookiestr = '{' + cookiestr + '}'
session.cookies = requests.utils.cookiejar_from_dict(eval(cookiestr), cookiejar=None, overwrite=True)
#由于不知道怎么將selenium的cookies格式轉換為帶域名的ccookiejar,所以我們要為csrftoken、SERVERID添加域名
token = session.cookies['csrftoken']
serverid = session.cookies['SERVERID']
session.cookies.set('csrftoken', None)
session.cookies.set('SERVERID', None)
session.cookies.set('csrftoken', token, path='/', domain='study.zhihuishu.com')
session.cookies.set('SERVERID', serverid, path='/', domain='study.zhihuishu.com')
print('成功獲取cookie...')
print('返回頁(yè)面...')
return (browser.page_source)
#傳入一個(gè)標準參數字典,返回studiedId
def studiedId(params):
t = int(round(time.time() * 1000))
url = 'http://study.zhihuishu.com/json/learning/prelearningNote?time=' + str(t)
header = {
'Host': 'study.zhihuishu.com',
'Connection': 'keep-alive',
'Accept': '*/*',
'Origin': 'http://study.zhihuishu.com',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': 'http://study.zhihuishu.com/learning/videoList?courseId=' + params['courseId'] + '&rid=' + params['rid'],
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
}
data = {
'rid':params['rid'],
'studentCount':'1',
'lessonId':params['lessonId'],
'PCourseId':params['PCourseId'],
'chapterId':params['chapterId'],
'lessonVideoId':'',
'userId':params['userId'],
'videoId':params['videoId'],
'studyStatus':''
}
response = session.post(url, headers = header, data = data) #POST數據包
return(re.search(r'"id":([0-9]*),"is',response.text).group(1)) #匹配response里的studiedId并返回
#存入緩存
def saveCache(params):
t = int(round(time.time() * 1000))
url = 'http://study.zhihuishu.com/json/learning/saveCacheIntervalTime?time=' + str(t)
if params['lessonvideoid'] != None:
lessonVideoId = params['lessonvideoid']
else:
lessonVideoId = '0'
global temp,copyright
temp = base64.b64encode(studiedId(params).encode('utf-8')) #加密studiedId生成__learning_token__
copyright = params['copyright']
header = {
'Host': 'study.zhihuishu.com',
'Connection': 'keep-alive',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': 'http://study.zhihuishu.com/learning/videoList?courseId=' + params['courseId'] + '&rid=' + params['rid'],
#'Referer': 'http://study.zhihuishu.com/learning/videoList;jsessionid='+session.cookies['JSESSIONID']+'?courseId=' + params['courseId'] + '&rid=' + params['rid'],
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
}
data = {
'rid':params['rid'],
'chapterId':params['chapterId'],
'courseId':params['courseId'],
'lessonId':params['lessonId'],
'learnTime':params['videosize'],
'studyTotalTime':studyTime(params['videosize']),
'__learning_token__':temp,
'studyStatus':'',
'videoId':params['videoId'],
#'watchPoint':'0%2C1%2C93%2C93',
'ev':encode([params['rid'], params['lessonId'], lessonVideoId, params['videoId']]),
'csrfToken':session.cookies['csrftoken'],
'lessonVideoId':params['lessonvideoid']
}
response = session.post(url, headers = header, data = data)
print('saveCache:' + params['name'] + ' ',response.status_code)
#存入數據庫
def saveDatabase(params):
t = int(round(time.time() * 1000))
url = 'http://study.zhihuishu.com/json/learning/saveDatabaseIntervalTime?time=' + str(t)
if params['lessonvideoid'] != None:
lessonVideoId = params['lessonvideoid']
else:
lessonVideoId = '0'
header = {
'Host': 'study.zhihuishu.com',
'Connection': 'keep-alive',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': 'http://study.zhihuishu.com/learning/videoList?courseId=' + params['courseId'] + '&rid=' + params['rid'],
#'Referer': 'http://study.zhihuishu.com/learning/videoList;jsessionid='+session.cookies['JSESSIONID']+'?courseId=' + params['courseId'] + '&rid=' + params['rid'],
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
}
data = {
'__learning_token__':temp,
'studiedLessonDto.learnTime':params['videosize'],
'studiedLessonDto.studyTotalTime':studyTime(params['videosize']),
'studiedLessonDto.playTimes':'35',
'studiedLessonDto.recruitId':params['rid'],
'studiedLessonDto.lessonVideoId':'',
'studiedLessonDto.lessonId':params['lessonId'],
'studiedLessonDto.videoId':params['videoId'],
'studyStatus':'',
'studiedLessonDto.sourseType':'1',
'ev':encode([params['rid'], params['lessonId'], lessonVideoId, params['videoId'], '1']),
'csrfToken':session.cookies['csrftoken']
}
response = session.post(url, headers = header, data = data)
print('saveDatabase:' + params['name'] + ' ',response.status_code)
#輸入視頻長(cháng)度,返回毫秒級時(shí)間戳
def studyTime(videosize):
std_time = '1970-01-01 08'
temp = std_time + videosize[2:]
date = time.mktime(time.strptime(temp,'%Y-%m-%d %H:%M:%S'))
return(int(date))
#傳入一個(gè)html頁(yè)面,返回一個(gè)input_hidden字典
def const_param(page):
nums = {}
soup = BeautifulSoup(page, 'html.parser')
input = soup.find_all('input',attrs={'type':'hidden'}) #BeautifulSoup匹配所有type為hidden的<input>標簽
#生成常量字典
for each in input:
nums[each.get('id')] = each.get('value')
return(nums)
#傳入一個(gè)html頁(yè)面,返回一個(gè)視頻參數的二維字典{videoId:{},videoId:{}}
def li_param(page):
nums = {}
soup = BeautifulSoup(page, 'html.parser')
li = soup.find_all('li',id=re.compile('video-[0-9]{4,8}')) #BeautifulSoup匹配對應視頻的<li>標簽
#加入視頻時(shí)間,并生成視頻字典
for each in li:
span = each.find('span', attrs={'class':'time fl'}).text
nums[each.get('_videoid')] = {'videoId':each.get('_videoid'),'name':each.get('_name'),'watchstate':each.get('watchstate'),'chapterId':each.get('_chapterid'),'lessonId':each.get('_lessonid'),'videosize':span,'lessonvideoid':each.get('_lessonvideoid')}
return(nums)
#由JS翻譯過(guò)來(lái)的加密函數,返回ev參數
def encode(param):
a = ''
f = ''
d = 0
string=copyright
for c in range(len(param)):
a += param[c]+';'
a=a[:len(a)-1];
for c in range(len(a)):
d = ord(a[c])^ord(string[c%len(string)])
e = str(hex(d))[2:]
if len(e) < 2 :
e ='0'+e
f += e
return(f)
#執行函數
def action(page,rate):
input = const_param(page) #獲取隱域的常量
li = li_param(page) #獲取<li>標簽里的視頻參數
num = int(len(li)*rate/100) #將進(jìn)度轉化為要刷的視頻數量
print('開(kāi)始刷課...')
for each in li.values(): #用for循環(huán)遍歷每一個(gè)視頻
#刷到相應的視頻數量時(shí)跳出循環(huán)
if num == 0:
break
num -= 1
if each['watchstate'] != '1': #判斷改視頻觀(guān)看狀態(tài),若不等于1(未觀(guān)看or沒(méi)看完),則進(jìn)行刷課
temp = input
temp.update(each) #更新字典,將視頻參數字典更新到常量字典里面
#調用函數發(fā)包
saveCache(temp)
saveDatabase(temp)
time.sleep(speed) #刷課時(shí)間間隔
print('刷課結束!')
#入口函數
if __name__ == '__main__':
userid = input('請輸入賬號:') #輸入賬戶(hù)(手機號)
password = input('請輸入密碼:') #輸入密碼
number = int(input('請輸入課程號:')) - 1 #獲取網(wǎng)課號,比如要刷第一門(mén)網(wǎng)課就輸入1
try:
rate = int(input('請輸入刷課進(jìn)度(默認50%):')) #輸入要刷到的進(jìn)度,默認50%
except:
rate = 50
options = Options()
#options.add_argument('-headless')
options.add_argument('--disable-gpu')
browser = webdriver.Firefox(firefox_options=options)
session = requests.Session() #建立會(huì )話(huà)
html = load(userid,password,number) #調用load()獲取cookie和html頁(yè)面
action(html,rate) #傳入頁(yè)面和進(jìn)度,進(jìn)入執行函數
#關(guān)閉會(huì )話(huà)
#關(guān)閉瀏覽器
#刷課驗證函數
之前的代碼selenium兼容性有點(diǎn)問(wèn)題,可能出現打不開(kāi)視頻界面的情況?,F在做出了更改,運行起來(lái)存在問(wèn)題的可以聯(lián)系我,2190778650@qq.com。
聯(lián)系客服