性性爽视频在线观看直播,国产成人永久免费视

主頁 > 知識庫 > Python爬蟲之用Xpath獲取關(guān)鍵標(biāo)簽實(shí)現(xiàn)自動評論蓋樓抽獎(二)

Python爬蟲之用Xpath獲取關(guān)鍵標(biāo)簽實(shí)現(xiàn)自動評論蓋樓抽獎(二)

一、分析鏈接

上一篇文章指路

一般來說，我們參加某個網(wǎng)站的蓋樓抽獎活動，并不是僅僅只參加一個，而是多個蓋樓活動一起參加。

這個時候，我們就需要分析評論的鏈接是怎么區(qū)分不同帖子進(jìn)行評論的，如上篇的刷帖鏈接，具體格式如下：

https://club.hihonor.com/cn/forum.php?mod=postaction=replyfid=154tid=21089001extra=page%3D1replysubmit=yesinfloat=yeshandlekey=fastpostinajax=1

這里面用于區(qū)分不同帖子的鍵是tid，不妨大家可以會看上一篇博文評論帖子的鏈接，是不是同樣有一個21089001的數(shù)字。

而經(jīng)過博主的測試，該網(wǎng)站評論post請求網(wǎng)址除了tid之外，其他數(shù)據(jù)是一模一樣的并不需要變更。所以，我們切換新帖子評論時，只需要替換tid的值就行。

二、切分提取tid

讀者可以自行隨便打開一個該網(wǎng)站的帖子，我們一般會得到如下形式的字符串帖子鏈接：

https://club.hihonor.com/cn/thread-26194745-1-1.html

這里，我們需要應(yīng)用字符串切割知識，來獲取鏈接字符串種的長數(shù)字字符串26194745。具體代碼如下：

import re
# 獲取需要評論的所有網(wǎng)頁鏈接
url_start = "https://club.hihonor.com/cn/forum.php?mod=postaction=replyfid=4515tid="
url_end = "extra=page%3D1replysubmit=yesinfloat=yeshandlekey=fastpostinajax=1"

url = []  # 評論網(wǎng)頁
txt_url = []  # 提供的網(wǎng)頁（格式不同）
f = open("隨機(jī)帖子.txt", "r", encoding='utf-8')
line = f.readline()  # 讀取第一行
while line:
    if re.match(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', line):
        txt_url.append(line.strip())  # 列表增加
    line = f.readline()  # 讀取下一行

datas = []
headers = []

for i in txt_url:
    url_start = "https://club.hihonor.com/cn/forum.php?mod=postaction=replyfid=4515tid="
    url_end = "extra=page%3D1replysubmit=yesinfloat=yeshandlekey=fastpostinajax=1"
    url.append(url_start + i.split("-")[1] + url_end)

這里，博主將一大堆需要評論的鏈接全部放到文本文件之中，然后通過讀取文件獲取每一行鏈接數(shù)據(jù)（其中用正則表達(dá)式判斷鏈接是否合法）。

在通過遍歷鏈接切分獲取帖子標(biāo)識數(shù)字字符串，最后進(jìn)行拼接獲取到真正的post評論鏈接。

三、隨機(jī)提取評論的內(nèi)容

在眾多的網(wǎng)站蓋樓活動中，官方網(wǎng)站一般都會檢測是否有內(nèi)容重復(fù)，一般同一個賬號多次評論重復(fù)的內(nèi)容，肯定會被禁止評論一段時間。

所以，我們需要將評論的內(nèi)容多樣化，比如說這個網(wǎng)站要我們稱贊手機(jī)性能進(jìn)行蓋樓抽獎，那么我們就需要備用一些評論文字，方便程序隨機(jī)獲取。

具體文字放置在txt文件中，我們通過下面的代碼進(jìn)行讀?。?/p>

# 獲取需要評論的文本內(nèi)容
txt_contents = []
f = open("回帖文案.txt", "r", encoding='utf-8')
line = f.readline()  # 讀取第一行
while line:
    if line.strip() != "":
        txt_contents.append(line.strip())  # 列表增加
    line = f.readline()  # 讀取下一行
print(txt_contents)
count = len(txt_contents)

假如，我們是需要參加游戲論壇的蓋樓評論活動，那么就可以用下面的文本進(jìn)行隨機(jī)提取評論，樣本越多，重復(fù)性越少。

四、蓋樓刷抽獎

一般來說，這種經(jīng)常有活動的網(wǎng)站都是需要驗(yàn)證登錄的。而各個網(wǎng)站的驗(yàn)證碼算法都不相同，怎么自動登錄賬號，往往就非常關(guān)鍵了。

對于識別驗(yàn)證碼，我們要么用百度，騰訊，阿里云提供的文字識別接口，但是博主試過了都無法保證百分百識別成功，而且最高識別準(zhǔn)備率都不到50%。

如果需要自己寫機(jī)器學(xué)習(xí)識別算法，那么學(xué)過機(jī)器學(xué)習(xí)的都應(yīng)該知道，這個是需要龐大的標(biāo)記的，哪怕你真的做出來，恐怕人家網(wǎng)站又會換了驗(yàn)證方式。

這種驗(yàn)證碼與防驗(yàn)證碼一直在進(jìn)步，花費(fèi)大量實(shí)現(xiàn)標(biāo)注驗(yàn)證碼這些內(nèi)容，往往會浪費(fèi)大量的時間，到最后人家可能又換了。

所以，博主的建議還是自己手動輸入驗(yàn)證碼，也就這一步輸入驗(yàn)證碼手動，其他的全自動。完整代碼如下：

import random
import time
from selenium import webdriver
import requests
import re

# 獲取需要評論的文本內(nèi)容
txt_contents = []
f = open("回帖文案.txt", "r", encoding='utf-8')
line = f.readline()  # 讀取第一行
while line:
    if line.strip() != "":
        txt_contents.append(line.strip())  # 列表增加
    line = f.readline()  # 讀取下一行
print(txt_contents)
count = len(txt_contents)


# 獲取需要評論的所有網(wǎng)頁鏈接
url_start = "https://club.hihonor.com/cn/forum.php?mod=postaction=replyfid=4515tid="
url_end = "extra=page%3D1replysubmit=yesinfloat=yeshandlekey=fastpostinajax=1"

url = []  # 評論網(wǎng)頁
txt_url = []  # 提供的網(wǎng)頁（格式不同）
f = open("隨機(jī)帖子.txt", "r", encoding='utf-8')
line = f.readline()  # 讀取第一行
while line:
    if re.match(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', line):
        txt_url.append(line.strip())  # 列表增加
    line = f.readline()  # 讀取下一行

datas = []
headers = []

for i in txt_url:
    url_start = "https://club.hihonor.com/cn/forum.php?mod=postaction=replyfid=4515tid="
    url_end = "extra=page%3D1replysubmit=yesinfloat=yeshandlekey=fastpostinajax=1"
    url.append(url_start + i.split("-")[1] + url_end)

# 獲取賬號
usernames = []
f = open("賬號.txt", "r", encoding='utf-8')
line = f.readline()  # 讀取第一行
while line:
    usernames.append(line.strip())  # 列表增加
    line = f.readline()  # 讀取下一行

for name in usernames:
    browser = webdriver.Chrome()
    browser.implicitly_wait(10)
    browser.get("https://club.hihonor.com/cn/")
    time.sleep(5)
    login_text = browser.find_element_by_xpath("http://*[@id='loginandreg']/a[1]")
    login_text.click()
    username = browser.find_element_by_xpath(
'/html/body/div[1]/div[2]/div/div/div[1]/div[3]/span/div[1]/span/div[2]/div[2]/div/input')
    password = browser.find_element_by_xpath(
'/html/body/div[1]/div[2]/div/div/div[1]/div[3]/span/div[1]/span/div[3]/div/div/div/input')
    username.send_keys(name)
    password.send_keys("密碼")#所有蓋樓刷評論賬號密碼盡量統(tǒng)一，這樣就可以只在txt每行輸入賬號即可
    sign = browser.find_element_by_xpath(
'/html/body/div[1]/div[2]/div/div/div[1]/div[3]/span/div[1]/span/div[6]/div/div/span/span')
#等待10秒，讓程序運(yùn)行者輸入驗(yàn)證碼
    time.sleep(10)
    sign.click()
    time.sleep(2)
    cookie = [item["name"] + "=" + item["value"] for item in browser.get_cookies()]
    cookiestr = ';'.join(item for item in cookie)
    url2 = "https://club.hihonor.com/cn/thread-26183971-1-1.html"
    time.sleep(2)
    browser.get(url2)
    posttime = browser.find_element_by_id("posttime")
    posttime = posttime.get_attribute("value")
    formhash = browser.find_element_by_name("formhash")
    formhash = formhash.get_attribute("value")
    browser.close()
    data = {
        "formhash": formhash,
        "posttime": posttime,
        "usesig": "1",
        "message": txt_contents[0],
    }
    header = {
        "accept": "application/json, text/javascript, */*; q=0.01",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "zh-CN,zh;q=0.9",
        "Content-Length": "146",
        "sec-ch-ua": '"Google Chrome";v="87", "\"Not;A\\Brand";v="99", "Chromium";v="87"',
        "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36",
        "Cookie": cookiestr,
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "X-Requested-With": "XMLHttpRequest",
    }
    datas.append(data)
    headers.append(header)

while True:
    z = 0
    if int(time.strftime("%H%M%S")) = 220000:
        url_num = random.sample(range(0, len(url)), len(url))
        for i in url_num:
            j = 1
            for data, header in zip(datas, headers):
                data['message'] = txt_contents[random.randint(0, count - 1)]
                res = requests.post(url=url[i], data=data, headers=header)
                if '回復(fù)發(fā)布成功' in res.text:
                    print("賬號{0}回復(fù)成功".format(j))
                else:
                    print(res.text)
                j += 1
                z += 1
            time.sleep(5)
            print("已經(jīng)評論{0}條".format(str(z)))

如上面代碼所示，我們的賬號也是用txt文件統(tǒng)一處理的，這樣可以達(dá)到多個賬號同時刷的目的，當(dāng)然一般網(wǎng)站獲獎都不能是同一個IP，這里讀者可以通過代理來處理。

其實(shí)登錄后，隨便一個帖子都有posttime與formhash兩個值，只要你隨機(jī)打開一個帖子（url2）就可以通過爬蟲知識獲取。

到此這篇關(guān)于Python爬蟲之用Xpath獲取關(guān)鍵標(biāo)簽實(shí)現(xiàn)自動評論蓋樓抽獎(二)的文章就介紹到這了,更多相關(guān)Python實(shí)現(xiàn)自動蓋樓抽獎內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

python數(shù)據(jù)XPath使用案例詳解
python網(wǎng)絡(luò)爬蟲精解之XPath的使用說明
python動態(tài)網(wǎng)站爬蟲實(shí)戰(zhàn)(requests+xpath+demjson+redis)
python使用xpath獲取頁面元素的使用
Python爬蟲必備之XPath解析庫
python數(shù)據(jù)解析之XPath詳解

標(biāo)簽：揚(yáng)州聊城六盤水撫州迪慶南寧楊凌牡丹江

巨人網(wǎng)絡(luò)通訊聲明：本文標(biāo)題《Python爬蟲之用Xpath獲取關(guān)鍵標(biāo)簽實(shí)現(xiàn)自動評論蓋樓抽獎(二)》，本文關(guān)鍵詞 Python,爬蟲,之用,Xpath,獲取,；如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問題，煩請?zhí)峁┫嚓P(guān)信息告之我們，我們將及時溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò)，涉及言論、版權(quán)與本站無關(guān)。