집 >백엔드 개발 >파이썬 튜토리얼 >Qt에서 QWebPage를 사용하여 여러 URL을 효율적으로 검색하는 방법은 무엇입니까?

Qt에서 QWebPage를 사용하여 여러 URL을 효율적으로 검색하는 방법은 무엇입니까?

DDD원래의: 2024-10-27 11:42:30803검색

How to Efficiently Retrieve Multiple URLs Using QWebPage in Qt?

QWebPage로 여러 URL 검색

이 시나리오에서는 Qt의 QWebPage를 사용하여 동적으로 업데이트된 페이지를 렌더링하려고 했습니다. 그러나 두 번째 페이지를 렌더링하려고 할 때 자주 충돌이 발생했습니다.

문제 분석

문제는 접근 방식에 있습니다. 각 URL 가져오기에 대해 새로운 QApplication 및 QWebPage를 초기화하고 있습니다. 대신, 동일한 인스턴스 내에서 여러 URL을 처리하기 위해 신호 및 사용자 정의 처리를 사용하여 단일 QApplication 및 QWebPage를 유지하는 것이 좋습니다.

제안 솔루션

WebPage 클래스

다음은 PyQt5 및 PyQt4용 사용자 정의 WebPage 클래스입니다.

PyQt5 WebPage

<code class="python">from PyQt5.QtCore import pyqtSignal, QUrl
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineWidgets import QWebEnginePage

class WebPage(QWebEnginePage):
    htmlReady = pyqtSignal(str, str)

    def __init__(self, verbose=False):
        super().__init__()
        self._verbose = verbose
        self.loadFinished.connect(self.handleLoadFinished)

    def process(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.load(QUrl(url))
        return True

    def processCurrentPage(self, html):
        self.htmlReady.emit(html, self.url().toString())
        if not self self.fetchNext():
            QApplication.instance().quit()

    def handleLoadFinished(self):
        self.toHtml(self.processCurrentPage)

    def javaScriptConsoleMessage(self, *args, **kwargs):
        if self._verbose:
            super().javaScriptConsoleMessage(*args, **kwargs)</code>

PyQt4 WebPage

<code class="python">from PyQt4.QtCore import pyqtSignal, QUrl
from PyQt4.QtGui import QApplication
from PyQt4.QtWebKit import QWebPage

class WebPage(QWebPage):
    htmlReady = pyqtSignal(str, str)

    def __init__(self, verbose=False):
        super(WebPage, self).__init__()
        self._verbose = verbose
        self.mainFrame().loadFinished.connect(self.handleLoadFinished)

    def process(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try: 
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.mainFram().load(QUrl(url))
        return True

    def processCurrentPage(self):
        self.htmlReady.emit(self.mainFrame().toHtml(), self.mainFrame().url().toString())
        if not self.fetchNext():
            QApplication.instance().quit()

    def javaScripConsoleMessage(self ,* args, **kwargs):
        if self._verbose:
            super(WebPage, self).javaScriptConsoleMessage(*args, **kwargs)</code>

사용법

다음은 이러한 WebPage 클래스를 사용하는 방법에 대한 예입니다.

<code class="python">from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication

# PyQt5
url_list = ['https://example.com', 'https://example2.com']
app = QApplication(sys.argv)
webpage = WebPage(verbose=True)
webpage.htmlReady.connect(my_html_processor)
webpage.process(url_list)
sys.exit(app.exec_())

# PyQt4
from PyQt4.QtCore import QUrl
from PyQt4.QtGui import QApplication
url_list = ['https://example.com', 'https://example2.com']
app = QApplication(sys.argv)
webpage = WebPage(verbose=True)
webpage.htmlReady.connect(my_html_processor)
webpage.process(url_list)
sys.exit(app.exec_())</code>

이 코드에서 my_html_processor는 처리된 HTML 및 각 페이지의 URL 정보.

이 접근 방식을 구현하면 이전에 경험했던 충돌과 무작위 동작을 방지하여 보다 안정적이고 효율적인 웹 스크래핑 작업 흐름을 얻을 수 있습니다.

위 내용은 Qt에서 QWebPage를 사용하여 여러 URL을 효율적으로 검색하는 방법은 무엇입니까?의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

qt html for using function this issue

성명：

이전 기사：양식과 JSON 데이터를 모두 허용하는 FastAPI 엔드포인트를 어떻게 생성할 수 있습니까?다음 기사：양식과 JSON 데이터를 모두 허용하는 FastAPI 엔드포인트를 어떻게 생성할 수 있습니까?