使用Python和pywinauto實現自動化採集任務的步驟和方法-Python教學-PHP中文網

首頁

後端開發

Python教學

使用Python和pywinauto實現自動化採集任務的步驟和方法

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 26, 2023 pm 11:13 PM

pythonpywinauto

基於pywinauto 的自動化採集任務

實現技術

這個程式使用了一個Python 的自動化庫---- pywinauto, 因為官方已經很久沒更新了, 所以python 的版本最高只能是Python 3.7 左右, 我用的是Python 3.7.1. 我使用它模擬了輸入單字, 複製例句, 獲取例句, 清空剪切板, 然後重複這個操作, 總體上實現比較簡陋. 而且, 為了簡單, 我是之間手動切換到例句頁, 這樣就不用使用程序來切換到例句頁了.

代碼

requirements.txt##

pyperclip==1.8.2
pywin32==304
pywinauto==0.6.8

程式碼

import os
import random
import time
import re
from typing import Dict, List
from pywinauto.application import Application
from pywinauto import mouse
from pywinauto import keyboard
import pyperclip
import json


# 程序处理中的各种路径
dir_path = r"C:/Users/Dick/Desktop/work/DragonEnglish/tools"
input_path = os.path.join(dir_path, r"input.txt")
output_path = os.path.join(dir_path, r"output.json")
error_path = os.path.join(dir_path, r"error.txt")
# 顺序错误的单词
error_words = []
# 有道词典的进程id
processId = 13840


def line_process(content: str) -> str:
    """
    去除所有空行, 再去除前面四行无关内容
    """
    lines = content.split("\r\n")
    # 因为例句开头是 数字. 开头的, 所以先以这个为特点来进行过滤掉多复制的开头
    count = 0
    for i in range(len(lines)):
        if re.match(r"\d+\.", lines[i]):
            count = i
            break

    lines = lines[count:]
    filter_lines = []
    for line in lines:
        if line.strip() != "":  # 过滤空行
            if not line.startswith("youdao") and not \
                    (line.startswith("《") and line.endswith("》")):  # 过滤来源
                filter_lines.append(line)

    if len(filter_lines) % 3 != 0:
        raise Exception("抓取数据错误")

    content = "\n".join(filter_lines) + "\n"  # 补上一个 \n, 不然正则会漏掉一个结果
    return content


def to_list(line: str) -> List[Dict[str, str]]:
    """
    直接生成列表字典对象
    [{
        "no": 1,
        "original": "",
        "translate"
    }]
    """
    sentences = []
    # 正则表达式
    REGEXP = r&#39;(?P<no>\d+?)\.\n(?P<original>.+?)\n(?P<translate>.+?)\n&#39;
    # 编译
    pattern = re.compile(REGEXP)
    # 匹配
    rs = pattern.finditer(line)
    # 组装结果
    for r in rs:
        print(r.groupdict())
        sentences.append(r.groupdict())
    return sentences


if __name__ == "__main__":
    # 连接网易有道词典
    app = Application(backend="uia").connect(process=processId)
    # 获取需要的窗口
    win = app.window(class_name="RICHEDIT50W")

    # 输入词汇列表
    input_words = []
    # 输出词汇对象列表
    output_words = []
    # 打开输入文件，初始化输入词汇列表
    with open(input_path, "r", encoding="utf-8") as input_file:
        input_words = input_file.read().split("\n")

    for word in input_words:
        print("正在抓取单词: %s" % word)
        # 清空剪切板，这步很重要，防止重复复制
        pyperclip.copy("")
        # 将输入数据复制到剪切板
        pyperclip.copy(word)
        # 定位到输入框（采用坐标定位，定位到大致位置即可）
        mouse.click(coords=(2400, 80))
        # 模拟按键操作：全选 删除 粘贴 回车（触发查询）
        keyboard.send_keys("^a{DELETE}^v{ENTER}")
        # 清空剪切板，这步很重要，防止重复复制
        pyperclip.copy("")
        # 鼠标左键点击，这个操作只是为了把鼠标移动到这里
        mouse.click(button="left", coords=(2200, 330))
        # 模拟键盘 CTRL+A CTRL+C，直接全选所有的例句（这里会多选一部分内容，待会再处理）
        keyboard.send_keys("^a^c")
        # 暂停一会儿，不做操作的太快
        time.sleep(random.random() * 2 + 1)
        # pywinauto 复制的内容是在系统的剪切板里面的，所以需要其它库读取
        content = pyperclip.paste()
        # 对内容进行简单的预处理后，加入output_words
        try:
            lines = line_process(content)
        except BaseException as exp:
            print(exp)
            # 如果抓取出现问题，说明被网易抓了现行，直接退出即可。
            break

        sentences = to_list(lines)
        if not sentences:
            print("获取例句为空, 可能是数据格式错误.")
            break
        output_words.append({
            "word": word,
            "example": sentences,
        })
        # 模拟暂停一个较长的随机时间，没有必要追求速度，平稳运行即可。
        time.sleep(random.random() * 3 + 3)
        # 清空剪切板，这步很重要，防止重复复制
        pyperclip.copy("")

    # 抓取完毕一个文件的内容后，然后一次性写入即可。
    # 之前的写法是一个单词写入一次，会造成太多的IO次数，浪费性能！
    with open(output_path, "a+", encoding="utf-8") as output_file:
        output_file.write(json.dumps(
            output_words, ensure_ascii=False, indent=4))

        # 错误单词记录
        with open(error_path, "w", encoding="utf-8") as err_file:
            err_file.writelines("\n".join(error_words))

示範 如果想要啟動這個程式碼, 還是蠻複雜的. 我這裡直接把需要的步驟羅列一下, 希望能幫助感有興趣的同學.

修改dir_path, 並且在下面準備一個input.txt 檔案.
取得有道字典進程的id.
取得單字輸入框的座標, 取得複製貼上處的座標.
將有道字典介面調整到例句處.

啟動專案, 需要一個

input.txt 檔案, 這裡是我測試的檔案.

sophisticated
centralization
phenomenon
internationalization
radioactive

我是透過工作管理員取得的進程pid, 你也可以透過它存取. 或者最簡單的是使用Inspect 和Spy , 我這裡就偷懶了, 直接怎麼省事怎麼來了.

使用Python和pywinauto實現自動化採集任務的步驟和方法

單字輸入框的座標, 複製貼上處的座標. 第一個座標是為了定位輸入框的, 然後程式會把單字複製進去, 並執行一下回車鍵, 然後內容被查詢出來. 再將滑鼠移動到第二個座標處, 這裡只是移動到下面的空白處就行了, 然後會執行一個全選CTRL A 操作. 這樣一個單字的內容就全部獲取到了.

使用Python和pywinauto實現自動化採集任務的步驟和方法

將有道調整到這個位置, 首選查詢一個單字, 選擇例句, 然後保持這個介面不要動即可.

使用Python和pywinauto實現自動化採集任務的步驟和方法

最後就是程式的執行了, 錄製的GIF 做了加速處理, 實際上執行的時候, 是特意加了延時的, 防止被過早的發現了.

使用Python和pywinauto實現自動化採集任務的步驟和方法

控制台輸出

使用Python和pywinauto實現自動化採集任務的步驟和方法

#output.json 檔案

使用Python和pywinauto實現自動化採集任務的步驟和方法

以上是使用Python和pywinauto實現自動化採集任務的步驟和方法的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文轉載於：亿速云。如有侵權，請聯絡admin@php.cn刪除

Python的科學計算中如何使用陣列？Apr 25, 2025 am 12:28 AM

Arraysinpython，尤其是Vianumpy，ArecrucialInsCientificComputingfortheireftheireffertheireffertheirefferthe.1）Heasuedfornumerericalicerationalation，dataAnalysis和Machinelearning.2）Numpy'Simpy'Simpy'simplementIncressionSressirestrionsfasteroperoperoperationspasterationspasterationspasterationspasterationspasterationsthanpythonlists.3）inthanypythonlists.3）andAreseNableAblequick

您如何處理同一系統上的不同Python版本？Apr 25, 2025 am 12:24 AM

你可以通過使用pyenv、venv和Anaconda來管理不同的Python版本。 1）使用pyenv管理多個Python版本：安裝pyenv，設置全局和本地版本。 2）使用venv創建虛擬環境以隔離項目依賴。 3）使用Anaconda管理數據科學項目中的Python版本。 4）保留系統Python用於系統級任務。通過這些工具和策略，你可以有效地管理不同版本的Python，確保項目順利運行。

與標準Python陣列相比，使用Numpy數組的一些優點是什麼？Apr 25, 2025 am 12:21 AM

numpyarrayshaveseveraladagesoverandastardandpythonarrays：1）基於基於duetoc的iMplation，2）2）他們的aremoremoremorymorymoremorymoremorymoremorymoremoremory，尤其是WithlargedAtasets和3）效率化，效率化，矢量化函數函數函數函數構成和穩定性構成和穩定性的操作，製造

陣列的同質性質如何影響性能？Apr 25, 2025 am 12:13 AM

數組的同質性對性能的影響是雙重的：1)同質性允許編譯器優化內存訪問，提高性能；2)但限制了類型多樣性，可能導致效率低下。總之，選擇合適的數據結構至關重要。

編寫可執行python腳本的最佳實踐是什麼？Apr 25, 2025 am 12:11 AM

到CraftCraftExecutablePythcripts，lollow TheSebestPractices：1）Addashebangline（＃！/usr/usr/bin/envpython3）tomakethescriptexecutable.2）setpermissionswithchmodwithchmod xyour_script.3）

Numpy數組與使用數組模塊創建的數組有何不同？Apr 24, 2025 pm 03:53 PM

numpyArraysareAreBetterFornumericalialoperations andmulti-demensionaldata，而learthearrayModuleSutableforbasic，內存效率段

Numpy數組的使用與使用Python中的數組模塊陣列相比如何？Apr 24, 2025 pm 03:49 PM

numpyArraySareAreBetterForHeAvyNumericalComputing，而lelethearRayModulesiutable-usemoblemory-connerage-inderabledsswithSimpleDatateTypes.1）NumpyArsofferVerverVerverVerverVersAtility andPerformanceForlargedForlargedAtatasetSetsAtsAndAtasEndCompleXoper.2）

CTYPES模塊與Python中的數組有何關係？Apr 24, 2025 pm 03:45 PM

ctypesallowscreatingingangandmanipulatingc-stylarraysinpython.1）usectypestoInterfacewithClibrariesForperfermance.2）createc-stylec-stylec-stylarraysfornumericalcomputations.3）passarraystocfunctions foreforfunctionsforeffortions.however.however，However，HoweverofiousofmemoryManageManiverage，Pressiveo，Pressivero

See all articles