Rumah >pembangunan bahagian belakang >Tutorial Python >使用Python网页文档处理脚本实例代码

使用Python网页文档处理脚本实例代码

高洛峰asal: 2017-03-20 13:14:021276semak imbas

　　嵌入式web服务器不同于传统服务器，web需要转换成数组格式保存在flash中，才方便lwip网络接口的调用，最近因为业务需求，需要频繁修改网页，每次的压缩和转换就是个很繁琐的过程，因此我就有了利用所掌握的知识，利用python编写个能够批量处理网页文件，压缩并转换成数组的脚本。

　　脚本运行背景(后续版本兼容)：

Python 3.5.1(下载、安装、配置请参考网上教程)

node.js v4.4.7，安装uglifyjs管理包，支持js文件非文本压缩

uglifyjs 用来压缩JS文件的引擎

具体实现代码如下:

#/usr/bin/python
import os
import binascii
import shutil 
from functools import partial
import re
import gzip

#创建一个新文件夹
def mkdir(path):
    path=path.strip()
    isExists=os.path.exists(path)
 
    #判断文件夹是否存在，不存在则创建
    if not isExists:
        os.makedirs(path)
        print(path+&#39; 创建成功&#39;)
    else:
        pass
    return path

#删除一个文件夹(包含内部所有文件)
def deldir(path):
    path = path.strip()

    isExists=os.path.exists(path)
 
    #判断文件夹是否存在，存在则删除
    if isExists:
        shutil.rmtree(path)
        print(path + "删除成功")
    else:
        pass

#网页一次压缩文件
def FileReduce(inpath, outpath):
        infp = open(inpath, "r", encoding="utf-8")
        outfp = open(outpath, "w", encoding="utf-8")
        for li in infp.readlines():
            if li.split():
                #去除多余的\r \n
                li = li.replace(&#39;\n&#39;, &#39;&#39;).replace(&#39;\t&#39;, &#39;&#39;);
                #空格只保留一个
                li = &#39; &#39;.join(li.split())
                outfp.writelines(li)
        infp.close()
        outfp.close()
        print(outpath+" 压缩成功")

#shell命令行调用(用ugllifyjs来压缩js文件)
def ShellReduce(inpath, outpath):
    Command = "uglifyjs "+inpath+" -m -o "+outpath
    print(Command)
    os.system(Command)

#gzip压缩模块
def FileGzip(inpath, outpath):
    with open(inpath, &#39;rb&#39;) as plain_file:
        with gzip.open(outpath, &#39;wb&#39;) as zip_file:
            zip_file.writelines(plain_file)
    print(outpath+" gzip-压缩成功")

#将文件以二进制读取, 并转化成数组保存
def FileHex(inpath, outpath):
    i = 0
    count = 0
    a = &#39;&#39;
    inf = open(inpath, &#39;rb&#39;);
    outf = open(outpath, &#39;w&#39;)
    records = iter(partial(inf.read, 1), b&#39;&#39;)
    for r in records:
        r_int = int.from_bytes(r, byteorder=&#39;big&#39;)  
        a +=  strzfill(hex(r_int), 2, 2) + &#39;, &#39;
        i += 1
        count += 1
        if i == 16:             
            a += &#39;\n&#39;
            i = 0
    a = "const static char " + outpath.split(&#39;.&#39;)[-2].split(&#39;/&#39;)[-1] + "["+ str(count) +"]={\n" + a + "\n}\n\n" 
    outf.write(a)
    inf.close()
    outf.close()
    print(outpath + " 转换成数组成功")

#在指定位置填充0
def strzfill(istr, index, n):
    return istr[:index] + istr[index:].zfill(n)

#去css注释 /*.....*/
def unCommentReduce(inpath, outpath):
    infp = open(inpath, "r", encoding="utf-8")
    outfp = open(outpath, "w", encoding="utf-8")
    fileByte = infp.read();

    replace_reg = re.compile(&#39;/\*[\s\S]*?\*/&#39;)
    fileByte = replace_reg.sub(&#39;&#39;, fileByte)
    fileByte = fileByte.replace(&#39;\n&#39;, &#39;&#39;).replace(&#39;\t&#39;, &#39;&#39;);
    fileByte = &#39; &#39;.join(fileByte.split())
    outfp.write(fileByte)
    infp.close()
    outfp.close()
    print(outpath+"去注释 压缩成功!")

#程序处理主函数
def WebProcess(path):
        #原网页 ..\basic\  
        #压缩网页 ..\reduce\
        #gzip二次压缩 ..\gzip
        #编译完成.c网页 ..\programe
        BasicPath = path + "\\basic"
        ReducePath = path + "\\reduce"
        GzipPath = path + "\\gzip"
        ProgramPath = path + "\\program"
        #删除原文件夹，再创建新文件夹
        deldir(ProgramPath)
        deldir(ReducePath)
        deldir(GzipPath)
        mkdir(ProgramPath)

        for root, dirs, files in os.walk(BasicPath):
                for item in files:
                        ext = item.split(&#39;.&#39;)
                        InFilePath = root + "/" + item
                        OutReducePath = mkdir(root.replace("basic", "reduce")) + "/" + item
                        OutGzipPath = mkdir(root.replace("basic", "gzip"))  + "/" + item + &#39;.gz&#39;
                        OutProgramPath = ProgramPath + "/" + item.replace(&#39;.&#39;, &#39;_&#39;) + &#39;.c&#39;

                        #根据后缀不同进行相应处理
                        #html 去除&#39;\n&#39;,&#39;\t&#39;, 空格字符保留1个
                        #css  去除\*......*\注释数据、&#39;\n&#39;和&#39;\t&#39;, 同时空格字符保留1个
                        #js 调用uglifyjs2进行压缩
                        #gif jpg ico 直接拷贝 
                        #其它 直接拷贝
                        #上述执行完毕后压缩成.gz文件
                        #除其它外，剩余文件同时转化成16进制数组, 保存为.c文件
                        if ext[-1] == &#39;html&#39;:
                            FileReduce(InFilePath, OutReducePath)
                            FileGzip(OutReducePath, OutGzipPath)
                            FileHex(OutGzipPath, OutProgramPath)
                        elif ext[-1] == &#39;css&#39;:
                            unCommentReduce(InFilePath, OutReducePath)
                            FileGzip(OutReducePath, OutGzipPath)
                            FileHex(OutGzipPath, OutProgramPath)
                        elif ext[-1] == &#39;js&#39;:
                            ShellReduce(InFilePath, OutReducePath)
                            FileGzip(OutReducePath, OutGzipPath)
                            FileHex(OutGzipPath, OutProgramPath)
                        elif ext[-1] in ["gif", "jpg", "ico"]:
                            shutil.copy(InFilePath, OutReducePath)
                            FileGzip(OutReducePath, OutGzipPath)
                            FileHex(OutGzipPath, OutProgramPath)
                        else:
                            shutil.copy(InFilePath, OutReducePath)


#获得当前路径
path = os.path.split(os.path.realpath(__file__))[0];
WebProcess(path)

上述实现的原理主要包含：

1.遍历待处理文件夹(路径为..\basic，需要用户创建，并将处理文件复制到其中，并将脚本放置到该文件夹上一层)--WebProcess

2.创建压缩页面文件夹(..\reduce, 用于存储压缩后文件), 由脚本完成，处理动作：

　htm: 删除文本中的多余空格，换行符

　css: 删除文本中的多余空格，换行符及注释文件/*......*/

js：调用uglifyjs进行压缩处理

gif, jpg, ico和其它: 直接进行复制处理

3.创建gzip文件处理文件夹(..\gzip, 用于保存二次压缩后文件), 由脚本调用gzip模块完成。

4.创建处理页面文件夹(..\program, 用于存储压缩后文件), 由脚本完成，处理动作：

　以二进制模式读取文件，并转换成16进制字符串写入到文件中。

在文件夹下(shift+鼠标右键)启用windows命令行，并输入python web.py, 就可以通过循环重复这三个过程就可以完成所有文件的处理。

特别注意：所有处理的文件需要以utf-8格式存储，否则读取时会报"gbk"读取错误。

实现效果如下图

html文件：

使用Python网页文档处理脚本实例代码

转换数组:

使用Python网页文档处理脚本实例代码

另外附送一个小的脚本，查询当前目录及子文件夹下选定代码行数和空行数(算是写这个脚本测试时衍生出来的):

#/usr/bin/python
import os

total_count = 0; 
empty_count = 0;

def CountLine(path):
        global total_count
        global empty_count
        tempfile = open(path)
        for lines in tempfile:
                total_count += 1
                if len(lines.strip()) == 0:
                       empty_count += 1
 
def TotalLine(path):
        for root, dirs, files in os.walk(path):
                for item in files:
                        ext = item.split(&#39;.&#39;)
                        ext = ext[-1]  
                        if(ext in ["cpp", "c", "h", "java", "php"]):
                                subpath = root + "/" + item
                                CountLine(subpath)

path = os.path.split(os.path.realpath(__file__))[0];
TotalLine(path)
print("Input Path:", path)
print("total lines: ",total_count)
print("empty lines: ",empty_count)
print("code lines: ", (total_count-empty_count))

Atas ialah kandungan terperinci 使用Python网页文档处理脚本实例代码. Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!

Kenyataan：

Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn

Artikel sebelumnya：详解python中Threadpool线程池任务终止示例代码Artikel seterusnya：python获取指定时间段内的随机不重复时间点的实现代码

Artikel berkaitan

Lihat lagi