国产三级av电影在线观看,午夜在线视频一区二区三区

主頁(yè) > 知識(shí)庫(kù) > 用python下載百度文庫(kù)的代碼

用python下載百度文庫(kù)的代碼

先去下載一個(gè)叫SWFToImage.dll的東西
再建立一個(gè)bat文件，并運(yùn)行：

復(fù)制代碼代碼如下:

 
COPY SWFToImage.dll %windir%\system32 
regsvr32 %windir%\system32\SWFToImage.dll 

復(fù)制代碼代碼如下:

 
#用python下載百度文庫(kù)的代碼，需要的同志請(qǐng)修改，下面有提示 
#http://www.cnblogs.com/dearplain/ 
#code by plain 
import urllib2 
import win32com.client 
import os 
import sys 

if __name__=='__main__': 
#os.system(''); 
os.chdir('D:\my project\pywenku')#保存到哪個(gè)文件夾 
SWFToImage=win32com.client.Dispatch("{479A1AAC-C148-40BB-9868-A9773DA66AF9}"); 
''' 
allfile=os.listdir(".") 
findrecord=0 
for file in allfile: 
if file==".record": 
record=open(file,'rw') 
findrecord=1 
break 
if findrecord==0: 
record=open('.record','w') 
''' 
#url="http://wenku.baidu.com/view/8d3ed840be1e650e52ea9938.html?from=recpos=1weight=2lastweight=2count=5" 
#url="http://wenku.baidu.com/view/f2fe7a3987c24028915fc37a.html?from=relatedhasrec=1" 
#url就是你要下載的文檔的地址 
url=sys.argv[1] 
if url.find("http://")!=0: 
print "error! the url is not correct" 
sys.exit() 
print "downloading %s"%url 
try: 
urlReferer=url[url.index('http'):url.index('/v')] 
print urlReferer 
#urlbody=url[url.index('/v')-1:] 
urlnum=url[url.index('ew/')+3:url.index('.htm')] 
except ValueError: 
print "parse url error" 
sys.exit() 
#print urlnum 
wenku='wenku.baidu.com' 
reurl='/play/' 
pagefrom='?pn=' 
downnum='rn=' 
#try to get title and make dir 
req=urllib2.Request(url) 
res=urllib2.urlopen(req) 
data=res.read() 
try: 
sfrom=data.index('title>')+len('title>') 
#print sfrom 
sbefore=sfrom+data[sfrom:].index('/title>') 
#print sbefore 
title=data[sfrom:sbefore] 
title=title[:title.rindex('_')] 
print 'downloading '+title 
except ValueError: 
print "get title error" 
sys.exit() 
allfile=os.listdir(".") 
if (title in allfile)==False: 
os.mkdir(title) 
os.chdir('./'+title) 
#get the first swf 
req=urllib2.Request('http://wenku.baidu.com'+reurl+urlnum+pagefrom+'1'+downnum+'1') 
req.add_header("Referer", urlReferer ) 
res=urllib2.urlopen(req) 
data=res.read() 
res.close() 
head=data[0:45] 
pagenum=0 
sfrom=head.index('\":\"')+len('\":\"') 
sbefore=sfrom+head[sfrom:].index('\"') 
pagenum=int(head[sfrom:sbefore]) 
print 'pagenum:'+str(pagenum) 
if pagenum=0 or pagenum>2000: 
print "error!!!pagenum0 or pagenum>2000" 
sys.exit() 
data=data[106:] 

swf=open("1.pywenku",'wb') 
swf.write(data) 
swf.close() 
i=1 
SWFToImage.InputSWFFileName="%d.pywenku"%i 
SWFToImage.ImageOutputType = 1 
SWFToImage.ImageWidth=1048 
SWFToImage.ImageHeight=1478 
SWFToImage.Execute_Begin() 
SWFToImage.FrameIndex = 1 
SWFToImage.Execute_GetImage() 
SWFToImage.SaveToFile("%d.jpg"%i) 
SWFToImage.Execute_End() 
os.rename("%d.pywenku"%i,"%d.swf"%i) 
allfile=os.listdir(".") 
#從第二頁(yè)下到最后一頁(yè) 
for i in range(2,pagenum+1): 

if '%d.swf'%i in allfile: 
continue 
#not find in the dir mean 
req=urllib2.Request('http://wenku.baidu.com'+reurl+urlnum+pagefrom+str(i)+downnum+'1') 
res=urllib2.urlopen(req) 
data=res.read() 
data=data[106:] 
swf=open("%d.pywenku"%i,'wb') 
swf.write(data) 
swf.close() 
SWFToImage.InputSWFFileName="%d.pywenku"%i 
SWFToImage.ImageOutputType = 1 
SWFToImage.Execute_Begin() 
SWFToImage.FrameIndex = 1 
SWFToImage.Execute_GetImage() 
SWFToImage.SaveToFile("%d.jpg"%i) 
SWFToImage.Execute_End() 
os.rename("%d.pywenku"%i,"%d.swf"%i) 
res.close() 
print 'task complete' 

您可能感興趣的文章:

python 爬取百度文庫(kù)并下載(免費(fèi)文章限定)
Python實(shí)現(xiàn)的爬取百度文庫(kù)功能示例
python 爬蟲如何實(shí)現(xiàn)百度翻譯
詳解用Python爬蟲獲取百度企業(yè)信用中企業(yè)基本信息
Python爬蟲爬取百度搜索內(nèi)容代碼實(shí)例
Python爬蟲實(shí)現(xiàn)百度翻譯功能過程詳解
python 爬蟲百度地圖的信息界面的實(shí)現(xiàn)方法
python爬蟲之爬取百度音樂的實(shí)現(xiàn)方法
Python爬蟲實(shí)現(xiàn)爬取百度百科詞條功能實(shí)例
python爬蟲獲取百度首頁(yè)內(nèi)容教學(xué)
Python爬蟲實(shí)現(xiàn)百度圖片自動(dòng)下載
Python爬蟲實(shí)例_利用百度地圖API批量獲取城市所有的POI點(diǎn)
python實(shí)現(xiàn)百度文庫(kù)自動(dòng)化爬取

標(biāo)簽：宜春松原六安河北鄂州淄博自貢石嘴山

巨人網(wǎng)絡(luò)通訊聲明：本文標(biāo)題《用python下載百度文庫(kù)的代碼》，本文關(guān)鍵詞用,python,下載,百度,文庫(kù),；如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問題，煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們，我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò)，涉及言論、版權(quán)與本站無(wú)關(guān)。