在此場景中,您嘗試使用 Qt 的 QWebPage 來呈現動態更新的頁面。但是,您在嘗試渲染第二個頁面時經常遇到崩潰。
問題分析
問題出在您的方法上。您正在為每個 URL 取得初始化一個新的 QApplication 和 QWebPage。相反,建議維護單一 QApplication 和 QWebPage,使用訊號和自訂處理來處理同一實例中的多個 URL。
建議的解決方案
以下是PyQt5 和PyQt4 的自訂🎜>
<code class="python">from PyQt5.QtCore import pyqtSignal, QUrl from PyQt5.QtWidgets import QApplication from PyQt5.QtWebEngineWidgets import QWebEnginePage class WebPage(QWebEnginePage): htmlReady = pyqtSignal(str, str) def __init__(self, verbose=False): super().__init__() self._verbose = verbose self.loadFinished.connect(self.handleLoadFinished) def process(self, urls): self._urls = iter(urls) self.fetchNext() def fetchNext(self): try: url = next(self._urls) except StopIteration: return False else: self.load(QUrl(url)) return True def processCurrentPage(self, html): self.htmlReady.emit(html, self.url().toString()) if not self self.fetchNext(): QApplication.instance().quit() def handleLoadFinished(self): self.toHtml(self.processCurrentPage) def javaScriptConsoleMessage(self, *args, **kwargs): if self._verbose: super().javaScriptConsoleMessage(*args, **kwargs)</code>PyQt4 WebPage
用法
<code class="python">from PyQt4.QtCore import pyqtSignal, QUrl from PyQt4.QtGui import QApplication from PyQt4.QtWebKit import QWebPage class WebPage(QWebPage): htmlReady = pyqtSignal(str, str) def __init__(self, verbose=False): super(WebPage, self).__init__() self._verbose = verbose self.mainFrame().loadFinished.connect(self.handleLoadFinished) def process(self, urls): self._urls = iter(urls) self.fetchNext() def fetchNext(self): try: url = next(self._urls) except StopIteration: return False else: self.mainFram().load(QUrl(url)) return True def processCurrentPage(self): self.htmlReady.emit(self.mainFrame().toHtml(), self.mainFrame().url().toString()) if not self.fetchNext(): QApplication.instance().quit() def javaScripConsoleMessage(self ,* args, **kwargs): if self._verbose: super(WebPage, self).javaScriptConsoleMessage(*args, **kwargs)</code>
用法
<code class="python">from PyQt5.QtCore import QUrl from PyQt5.QtWidgets import QApplication # PyQt5 url_list = ['https://example.com', 'https://example2.com'] app = QApplication(sys.argv) webpage = WebPage(verbose=True) webpage.htmlReady.connect(my_html_processor) webpage.process(url_list) sys.exit(app.exec_()) # PyQt4 from PyQt4.QtCore import QUrl from PyQt4.QtGui import QApplication url_list = ['https://example.com', 'https://example2.com'] app = QApplication(sys.argv) webpage = WebPage(verbose=True) webpage.htmlReady.connect(my_html_processor) webpage.process(url_list) sys.exit(app.exec_())</code>用法用法在此程式碼中,my_html_processor 是一個可以自訂的函數,用於處理已處理的HTML 和每個頁面的URL 資訊。 透過實作此方法,您可以防止先前遇到的崩潰和隨機行為,從而實現更穩定、更有效率的網頁抓取工作流程。
以上是如何在 Qt 中使用 QWebPage 有效率地檢索多個 URL?的詳細內容。更多資訊請關注PHP中文網其他相關文章!