


Document batch translation tool written in Python, the effect is better than paid software?
"Use Python to batch translate English Word documents and preserve the format", the final effect is even better than some paid software! Let’s take a look at the specific work content first.
I have a large number of foreign language documents on hand (this case takes
5 as an example, and they are named test1 .docx
test2.docx and so on), one of which is as follows:
Basic requirements:"Batch these documents All the contents are translated into Chinese and transferred to a new file", the effect is as follows:
Advanced requirements: While the basic needs are met, the requirements『Keep the format of the original document』, the effect is as follows:
1 . Translation API
The core of this requirement isTranslation. The strategy is to use the translation API of the network. The Baidu Translation Open Platform is recommended here. It can be used if the number of concurrency is not considered. Standard version, free to use with no character limit!
“Baidu Translation Open Platform:
”http://api.fanyi.baidu.com/api/trans/product/index
Before using Baidu’s universal translation API, you need to complete the following tasks:
Use a Baidu account to log in to the Baidu Translation Open Platform ( http://api.fanyi.baidu.com
);Register as a developer and obtain APPID; Conduct developer certification (if you only need the standard version, you can skip it); Open the universal translation API service: activation link Refer to the technical documentation Write code with Demo

After completion, you can see the ID and key on the personal page. This is very important! The demo of the compiled universal translation API is given below. The output has been simply modified, and the code can be used!
You can see that the test content is accurately translated. Note that if you need to access the API multiple times, the free version has concurrency and time limits, you can use time
The module sleeps for one second
2. Format modification
The difficulty with advanced requirements is to retain the format. To put it simplyoriginal What is the page format and paragraph format of the document, and what are the corresponding parts after translation.
Based on the above logical relationship, you only need to obtain the corresponding content of the original document and assign it to the newly translated document. (For the time being, it can only meet the unification of page settings and paragraph settings. For the format modification of specific words in a paragraph, ensuring accuracy requires natural language processing NLP, which is not covered in this article)
2.1 Page style
The page style only needs to include margins, direction, height, width, etc., as can be seen from the original document, the following is Narrow margins. But we don’t need to know how to set the four directions of narrow margins. We only need to present the variable transfer of the old and new documents in the code, as follows
2.2 Paragraph style
Paragraph styles include alignment, indentation, spacing, etc. In the original document, post-paragraph indentation is adopted, and the title is centered. These settings can be done well in variable passing. If the variable value not set in the original document is None
2.3 Text block style modification
for To adjust styles such as font size, bold, italics, and color, the strategy adopted is to create an empty list, traverse each text block of each paragraph of the original document, obtain the corresponding attributes and put them in their respective lists , and for the same paragraph For example, the option that contains the most text block attributes is assigned to the corresponding paragraph of the translated document (if all or most of the text in a paragraph is bold, then all text blocks in the corresponding paragraph after translation will be set to bold)
Readers who are interested in NLP can try on their own how to highly restore the style modifications of certain specific words in English documents and reflect them in the translated documents
The above code does not include font settings , because there is no need to pass English fonts to Chinese documents. The setting of Chinese fonts has been mentioned in previous articles. It is relatively complicated. See the code directly:
from docx.oxml.ns import qn run.font.name = '微软雅黑' r = run._element.rPr.rFonts r.set(qn('w:eastAsia'), '微软雅黑')
##3. Overall implementation steps
Now each part of the operation has been completed. Considering that there are multiple documents that need to be translated in this example, the entire logic is as follows:利用 glob
模块批处理框架可获取某个文件的绝对路径由 python-docx
完成 Word 文件实例化后对段落进行解析解析出的段落文本交给百度通用翻译 API,解析返回的 Json 格式结果(上面的修改 demo 中已经完成了这一步)并重新写入新的文件 同个文件全部解析、翻译并写入新文件后保存文件
三、代码实现
导入需要的模块,除翻译 demo 中需要的库外还需要 glob
库批量获取文件、python-docx
读取文件、time
模块控制访问并发。为什么要 os
模块见下文:
import requests import random import json from hashlib import md5 import time from docx import Document import glob import os
对原 demo 的部分内容进行保留,涉及到 query
参数的代码需要移动到后面的循环中。保留的部分:
效果如下
获取到段落文本后,可以将段落文本赋值给 query
参数,调用 API demo 的后续代码。输出结果的同时用 add_paragraph
将结果写入新文档:
最后保存成新文件,期望命名为 原文件名_translated 的形式,可用 os.path.basename
方法获取并经字符串拼接达到目的:
wordfile_new.save(path + r'\\' + os.path.basename(file)[:-5] + '_translated.docx')

单个文件操作完成后将读取和创建文件的代码块放到批处理框架内:
完成了上面的内容后,基本需求就完成了。根据我们梳理的对样式的修改知识,再把样式调整的代码加进来就行了,最终完整代码如下:
代码运行完毕后得到五个新的翻译后文件
翻译效果如下,可以看到英文被翻译成中文,并且样式大部分保留!
至此,所有文档都被成功翻译,当然这是机器翻译的,具体应用时还需要对关键部分进一步人工调整,不过整体来说还是一次成功的Python办公自动化尝试!
The above is the detailed content of Document batch translation tool written in Python, the effect is better than paid software?. For more information, please follow other related articles on the PHP Chinese website!

Python's flexibility is reflected in multi-paradigm support and dynamic type systems, while ease of use comes from a simple syntax and rich standard library. 1. Flexibility: Supports object-oriented, functional and procedural programming, and dynamic type systems improve development efficiency. 2. Ease of use: The grammar is close to natural language, the standard library covers a wide range of functions, and simplifies the development process.

Python is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.

Yes, learn Python in two hours a day. 1. Develop a reasonable study plan, 2. Select the right learning resources, 3. Consolidate the knowledge learned through practice. These steps can help you master Python in a short time.

Python is suitable for rapid development and data processing, while C is suitable for high performance and underlying control. 1) Python is easy to use, with concise syntax, and is suitable for data science and web development. 2) C has high performance and accurate control, and is often used in gaming and system programming.

The time required to learn Python varies from person to person, mainly influenced by previous programming experience, learning motivation, learning resources and methods, and learning rhythm. Set realistic learning goals and learn best through practical projects.

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function