制作NBA數(shù)據(jù)爬蟲
捋順?biāo)悸?/h3>
我們?cè)谶@里選擇的是百度體育帶來(lái)的數(shù)據(jù),我們?cè)诎俣犬?dāng)中直接搜索NBA跳轉(zhuǎn)到網(wǎng)頁(yè),我們可以看到,百度已經(jīng)為我們提供了相關(guān)的數(shù)據(jù)
我們點(diǎn)擊進(jìn)去后,可以發(fā)現(xiàn)這是一個(gè)非常簡(jiǎn)潔的網(wǎng)址
我們看一下這個(gè)地址欄,發(fā)現(xiàn)毫無(wú)規(guī)律https://tiyu.baidu.com/live/detail/576O5Zu955S35a2Q6IGM5Lia56%2Bu55CD6IGU6LWbI2Jhc2tldGJhbGwjMjAyMS0wNi0xMyPniLXlo6t2c%2BWspritq%2BiIuQ%3D%3D/from/baidu_aladdin
好吧,只能再找找了,我們點(diǎn)擊整個(gè)標(biāo)簽發(fā)現(xiàn),這是一個(gè)網(wǎng)址,那就容易多了。
這里我們想要獲取的無(wú)非就是具體的每一節(jié)數(shù)據(jù)和總分,然后如果用戶還有其他需求的話我們就直接將其推送到百度網(wǎng)址上面來(lái)
爬取的思路大概就是這樣,首先先訪問主頁(yè)面,然后在訪問旗下今天的比賽,最后將比賽結(jié)果返回
編寫代碼
首先我們使用REQUESTS來(lái)訪問網(wǎng)址
我們可以看到,百度沒有做任何限制,直接訪問也可以獲得內(nèi)容
接下來(lái)我們使用解析庫(kù)進(jìn)行解析
首先我們先將程序定位到Main標(biāo)簽
代碼則是這樣的,運(yùn)行代碼我們會(huì)發(fā)現(xiàn),整個(gè)代碼縮進(jìn)了不少
獲取主要的頁(yè)面,我們使用FIND函數(shù)進(jìn)行進(jìn)一步操作
我們成功定位到了這個(gè)主頁(yè)面,接下來(lái)就是我們開始爬取最近幾次的比賽信息和詳細(xì)頁(yè)面了
更改代碼,我們直接獲取所有的比賽信息
在測(cè)試網(wǎng)站的時(shí)候,我發(fā)現(xiàn)百度竟然使用了AJAX技術(shù),就是說你一次性獲得的網(wǎng)站源代碼可能只有五條,然后要進(jìn)行再一次加載才能獲取接下來(lái)的數(shù)據(jù)。但是這也對(duì)我們程序來(lái)說挺好的,我們本來(lái)也不需要那么多數(shù)據(jù)。
我們?cè)谶@里查找了每一個(gè)的日期,查找對(duì)象為 date,接下來(lái)我們把其轉(zhuǎn)換成字符串,因?yàn)榘俣壬厦孢@個(gè)日期有縮進(jìn),所以我們?cè)诤竺嫣砑?STRIP() 取消字符串前面的空格。按照這樣的方式獲取比賽地址
在這里,我們使用拼接字符串的方法,完成了對(duì)最后地址的解析
# 程序名稱 : NBAReporter
# 制作時(shí)間 : 2021年6月13日
# 運(yùn)行環(huán)境 : Windows 10
import requests
from bs4 import BeautifulSoup
# 基礎(chǔ)數(shù)據(jù)定義
baidu_nba_url = "https://tiyu.baidu.com/match/NBA/"
request_url = "https:"
nba_dict = {}
# 訪問網(wǎng)址
nba_res = requests.get(baidu_nba_url)
# print(nba_res.text)
# 開始使用解析器
nba_soup = BeautifulSoup(nba_res.text, "html.parser")
nba_main = nba_soup.main
# print(nba_main)
nba_div = nba_main.find_all("div", class_ = "wa-match-schedule-list-wrapper")
for i in nba_div:
# 獲取比賽時(shí)間
nba_time = i.find("div", class_ = "date").string.strip()
print(nba_time)
# 獲取比賽的次數(shù)
nba_times = i.find("div", class_ = "list-num c-color").string
print(nba_times)
# 獲取詳細(xì)的比賽地址
nba_href = i.find_all("div", class_ = "wa-match-schedule-list-item c-line-bottom")
for url_nba in nba_href:
url_nba = url_nba.a
url_href = url_nba["href"]
real_url = request_url + url_href
print(real_url)
接下來(lái)我們要開始剩余部分的解析,我們可以看到我們還有一部分的詳細(xì)信息沒有爬取,所以我們開始爬取詳細(xì)信息
按照邏輯繼續(xù)編寫代碼
然后我們獲取一下這里面的值
獲取比賽的相關(guān)分?jǐn)?shù)后,我們創(chuàng)建兩個(gè)列表,一個(gè)列表定義我們等一下需要用到NBA的樣式,另一個(gè)列表則存儲(chǔ)今天的日期,最后返回
我們已經(jīng)在這里吧這個(gè)方法封裝了,所以我們創(chuàng)建一個(gè)新的文件,直接導(dǎo)入即可
NBAReporter.py
# 程序名稱 : NBAReporter
# 制作時(shí)間 : 2021年6月13日
# 運(yùn)行環(huán)境 : Windows 10
import requests
from bs4 import BeautifulSoup
def NBAReporter():
# 基礎(chǔ)數(shù)據(jù)定義
baidu_nba_url = "https://tiyu.baidu.com/match/NBA/"
request_url = "https:"
nba_list = []
today_list = []
# 訪問網(wǎng)址
nba_res = requests.get(baidu_nba_url)
# print(nba_res.text)
# 開始使用解析器
nba_soup = BeautifulSoup(nba_res.text, "html.parser")
nba_main = nba_soup.main
# print(nba_main)
nba_div = nba_main.find_all("div", class_ = "wa-match-schedule-list-wrapper")
for i in nba_div:
# 獲取比賽時(shí)間
today = i.find("div", class_ = "date").string.strip()
# 獲取比賽的次數(shù)
nba_times = i.find("div", class_ = "list-num c-color").string
# 獲取詳細(xì)的比賽地址
nba_href = i.find_all("div", class_ = "wa-match-schedule-list-item c-line-bottom")
for url_nba in nba_href:
url_nba = url_nba.a
url_href = url_nba["href"]
real_url = request_url + url_href
# print(real_url)
# 獲取詳細(xì)數(shù)據(jù)
vs_time = url_nba.find("div", class_ = "font-14 c-gap-bottom-small").string
vs_finals = url_nba.find("div",class_ = "font-12 c-color-gray").string
team_row_1 = url_nba.find("div", class_ = "team-row")
team_row_2 = url_nba.find("div", class_ = "c-gap-top-small team-row")
"""team_row_1_jpg = team_row_1.find("div", class_ = "inline-block")["style"]
team_row_2_jpg = team_row_2.find("div", class_ = "inline-block")["style"]
print(team_row_1_jpg,team_row_2_jpg)"""
team_row_1_name = team_row_1.find("span", class_ = "inline-block team-name team-name-360 team-name-320 c-line-clamp1").string
team_row_2_name = team_row_2.find("span", class_ = "inline-block team-name team-name-360 team-name-320").string
# print(team_row_1_name,team_row_2_name)
team_row_1_score = team_row_1.find("span", class_ = "inline-block team-score-num c-line-clamp1").string
team_row_2_score = team_row_2.find("span", class_ = "inline-block team-score-num c-line-clamp1").string
# print(team_row_1_score,team_row_2_score)
"""import re # 導(dǎo)入re庫(kù),不過最好還是在最前面導(dǎo)入,這里是為了演示的需要
team_row_1_jpg_url = re.search(r'background:url(.*)', team_row_1_jpg)
team_row_1_jpg_url = team_row_1_jpg_url.group(1)
team_row_2_jpg_url = re.search(r'background:url(.*)', team_row_2_jpg)
team_row_2_jpg_url = team_row_2_jpg_url.group(1)"""
nba = [ today, nba_times,"","",
vs_time, vs_finals, team_row_1_name, team_row_2_name,
"","", team_row_1_score, team_row_2_score
]
nba_list.append(nba)
today_list.append(today)
return nba_list,today_list
這里我們要編寫的是GUI界面的實(shí)現(xiàn)程序
首先先導(dǎo)入我們運(yùn)行所需要的庫(kù)
簡(jiǎn)單定義一下我們的代碼,設(shè)置標(biāo)題和其他的一些窗口屬性# self.setWindowOpacity(0.5)
這里是設(shè)置窗口透明程度的一行代碼,但是經(jīng)過我的測(cè)驗(yàn)之后,發(fā)現(xiàn)這樣子真的對(duì)于用戶體驗(yàn)一點(diǎn)也不好,所以在這里我把它注釋掉了
程序主邏輯如上圖所示,我們創(chuàng)建了一個(gè)單元布局,然后又創(chuàng)建了和比賽一樣的若干個(gè)標(biāo)簽,最后將函數(shù)返回的列表以標(biāo)簽的形式放在主窗口上面
最后創(chuàng)建事件,運(yùn)行程序,這樣子整個(gè)程序就完成了
NBAWindow.py
# 程序名稱 : NBAWindow
# 制作時(shí)間 : 2021年6月14日
# 運(yùn)行環(huán)境 : Windows 10
import sys
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.QtWidgets import *
from NBAReporter import *
# 首先創(chuàng)建一個(gè)主窗口
class NBAWindow(QTabWidget):
def __init__(self):
super().__init__()
self.make_Ui()
""" # 兩分鐘自動(dòng)刷新
self.timer = QTimer()
self.timer.setInterval(500)
self.timer.timeout.connect(self.make_Ui)
self.timer.start()"""
self.setWindowTitle("NBA數(shù)據(jù)")
self.setGeometry(1440,0,480,300)
self.setFixedSize(480,300)
self.setWindowIcon(QIcon('images/nba.jpg'))
self.setStyleSheet("""background-color:red; """)
# self.setWindowOpacity(0.5)
self.setWindowFlags(Qt.WindowStaysOnTopHint|Qt.WindowMinimizeButtonHint|Qt.FramelessWindowHint)
def make_Ui(self):
self.nba,today = NBAReporter()
self.tab = 0
for a in self.nba:
# 設(shè)置網(wǎng)格單元布局
grid = QGridLayout()
self.setLayout(grid)
# 開始添加一個(gè)標(biāo)簽
tab = QWidget()
# 將這個(gè)標(biāo)簽設(shè)置為TAB并按照列表中的數(shù)值命名
self.addTab(tab,today[self.tab])
# 獲取循環(huán)之后的位置,重寫列表
positions = [(i, j) for i in range(3) for j in range(4)]
nba_list = self.nba[self.tab]
# 開始創(chuàng)建Tab下面的標(biāo)簽
for position, nba in zip(positions, nba_list):
#print(nba)
# 當(dāng)時(shí)空值時(shí),跳過執(zhí)行
if nba == "":
continue
# 設(shè)置文字樣式
label = QLabel("font color='black', size=5>b>%s/b>/font>"%nba)
grid.addWidget(label, *position)
# 設(shè)置整個(gè)窗口為表格布局
tab.setLayout(grid)
# grid.update()
# 將數(shù)值加一
self.tab += 1
if __name__ == '__main__':
app = QApplication(sys.argv)
window = NBAWindow()
window.show()
app.exec_()
NBAReporter.py
# 程序名稱 : NBAReporter
# 制作時(shí)間 : 2021年6月13日
# 運(yùn)行環(huán)境 : Windows 10
import requests
from bs4 import BeautifulSoup
def NBAReporter():
# 基礎(chǔ)數(shù)據(jù)定義
baidu_nba_url = "https://tiyu.baidu.com/match/NBA/"
request_url = "https:"
nba_list = []
today_list = []
# 訪問網(wǎng)址
nba_res = requests.get(baidu_nba_url)
# print(nba_res.text)
# 開始使用解析器
nba_soup = BeautifulSoup(nba_res.text, "html.parser")
nba_main = nba_soup.main
# print(nba_main)
nba_div = nba_main.find_all("div", class_ = "wa-match-schedule-list-wrapper")
for i in nba_div:
# 獲取比賽時(shí)間
today = i.find("div", class_ = "date").string.strip()
# 獲取比賽的次數(shù)
nba_times = i.find("div", class_ = "list-num c-color").string
# 獲取詳細(xì)的比賽地址
nba_href = i.find_all("div", class_ = "wa-match-schedule-list-item c-line-bottom")
for url_nba in nba_href:
url_nba = url_nba.a
url_href = url_nba["href"]
real_url = request_url + url_href
# print(real_url)
# 獲取詳細(xì)數(shù)據(jù)
vs_time = url_nba.find("div", class_ = "font-14 c-gap-bottom-small").string
vs_finals = url_nba.find("div",class_ = "font-12 c-color-gray").string
team_row_1 = url_nba.find("div", class_ = "team-row")
team_row_2 = url_nba.find("div", class_ = "c-gap-top-small team-row")
"""team_row_1_jpg = team_row_1.find("div", class_ = "inline-block")["style"]
team_row_2_jpg = team_row_2.find("div", class_ = "inline-block")["style"]
print(team_row_1_jpg,team_row_2_jpg)"""
team_row_1_name = team_row_1.find("span", class_ = "inline-block team-name team-name-360 team-name-320 c-line-clamp1").string
team_row_2_name = team_row_2.find("span", class_ = "inline-block team-name team-name-360 team-name-320").string
# print(team_row_1_name,team_row_2_name)
team_row_1_score = team_row_1.find("span", class_ = "inline-block team-score-num c-line-clamp1").string
team_row_2_score = team_row_2.find("span", class_ = "inline-block team-score-num c-line-clamp1").string
# print(team_row_1_score,team_row_2_score)
"""import re # 導(dǎo)入re庫(kù),不過最好還是在最前面導(dǎo)入,這里是為了演示的需要
team_row_1_jpg_url = re.search(r'background:url(.*)', team_row_1_jpg)
team_row_1_jpg_url = team_row_1_jpg_url.group(1)
team_row_2_jpg_url = re.search(r'background:url(.*)', team_row_2_jpg)
team_row_2_jpg_url = team_row_2_jpg_url.group(1)"""
nba = [ today, nba_times,"","",
vs_time, vs_finals, team_row_1_name, team_row_2_name,
"","", team_row_1_score, team_row_2_score
]
nba_list.append(nba)
today_list.append(today)
return nba_list,today_list
效果演示
轉(zhuǎn)載聲明
本文于2021年6月14日首發(fā)自CSDN,如需轉(zhuǎn)載,請(qǐng)附上原文鏈接:Python利用PyQt5制作一個(gè)獲取網(wǎng)絡(luò)實(shí)時(shí)數(shù)據(jù)NBA數(shù)據(jù)播報(bào)GUI
到此這篇關(guān)于Python利用PyQt5制作一個(gè)獲取網(wǎng)絡(luò)實(shí)時(shí)數(shù)據(jù)NBA數(shù)據(jù)播報(bào)GUI功能的文章就介紹到這了,更多相關(guān)Python獲取網(wǎng)絡(luò)實(shí)時(shí)數(shù)據(jù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
您可能感興趣的文章:- 詳解Python GUI工具取色器
- Python常用GUI框架原理解析匯總
- python GUI框架pyqt5 對(duì)圖片進(jìn)行流式布局的方法(瀑布流flowlayout)
- Python的GUI框架PySide的安裝配置教程
- 帶你詳細(xì)了解Python GUI編程框架