Home > Article > Backend Development > How to implement code statistics tool in Python
This time I will show you how to implement the code statistics tool in Python. What are the precautions for implementing the code statistics tool in Python? The following is a practical case, let's take a look.
Question
Design a program to count the number of lines of code in a project, including the number of files, lines of code, and comments Number of lines, number of blank lines. Try to be more flexible in the design by inputting different parameters to count projects in different languages, for example:# type用于指定文件类型 python counter.py --type pythonOutput:
files:10code_lines:200
comments: 100
blanks:20
Analysis
This is a design that looks simple but is a bit complicated to make. We can make the problem smaller. As long as we can correctly count the number of lines of code in a file, then counting a directory is not a problem. The most complicated one is about multi-line comments. Taking Python as an example, the comment lines of code are as follows Situation: 1. Single-line comments starting with the pound sign# Single-line comments2. Multi-line comments on the same line"" "This is a multi-line comment"""'''This is also a multi-line comment'''
3. Multi-line comment symbol
These 3 lines are all comments Symbol
"""
Knowledge points
How to correctlyread files, the read files should be stringsCommon methods for string processing
Simplified version
We iterate step by step, first implement a simplified version of the program, and only count Python A single file of code, regardless of multi-line comments, is a function that anyone who is new to Python can achieve. The key point is that after reading each line, first use the strip() method to remove the spaces and carriage returns on both sides of the string# -*- coding: utf-8 -*- """ 只能统计单行注释的py文件 """ def parse(path): comments = 0 blanks = 0 codes = 0 with open(path, encoding='utf-8') as f: for line in f.readlines(): line = line.strip() if line == "": blanks += 1 elif line.startswith("#"): comments += 1 else: codes += 1 return {"comments": comments, "blanks": blanks, "codes": codes} if name == 'main': print(parse("xxx.py"))
Multi-line comment version
If you can only count the code of single-line comments, it is of little significance. Only by solving the statistics of multi-line comments can it be regarded as a real code statistician# -*- coding: utf-8 -*- """Can count py containing multiple lines of comments File
""" def parse(path): in_multi_comment = False # 多行注释符标识符号 comments = 0 blanks = 0 codes = 0 with open(path, encoding="utf-8") as f: for line in f.readlines(): line = line.strip() # 多行注释中的空行当做注释处理 if line == "" and not in_multi_comment: blanks += 1 # 注释有4种 # 1. # 井号开头的单行注释 # 2. 多行注释符在同一行的情况 # 3. 多行注释符之间的行 elif line.startswith("#") or \ (line.startswith('"""') and line.endswith('"""') and len(line)) > 3 or \ (line.startswith("'''") and line.endswith("'''") and len(line) > 3) or \ (in_multi_comment and not (line.startswith('"""') or line.startswith("'''"))): comments += 1 # 4. 多行注释符的开始行和结束行 elif line.startswith('"""') or line.startswith("'''"): in_multi_comment = not in_multi_comment comments += 1 else: codes += 1 return {"comments": comments, "blanks": blanks, "codes": codes} if name == 'main': print(parse("xxx.py"))In the fourth case above, when encountering multi-line comment symbols, it is the key operation to invert the in_multi_comment identifier, instead of simply setting it to False or True. The first time it encounters " "" is True. The second time """ is encountered is the end of the multi-line comment, and the negation is False. And so on. The third time it is the beginning, the negation is True again. So how to judge whether other languages need to rewrite a parsing function? If you observe carefully, the four situations of multi-line comments can abstract four judgment conditions, because most languages have single-line comments and multi-line comments, but their symbols are different.
CONF = {"py": {"start_comment": ['"""', "'''"], "end_comment": ['"""', "'''"], "single": "#"}, "java": {"start_comment": ["/*"], "end_comment": ["*/"], "single": "//"}} start_comment = CONF.get(exstansion).get("start_comment") end_comment = CONF.get(exstansion).get("end_comment") cond2 = False cond3 = False cond4 = False for index, item in enumerate(start_comment): cond2 = line.startswith(item) and line.endswith(end_comment[index]) and len(line) > len(item) if cond2: break for item in end_comment: if line.startswith(item): cond3 = True break for item in start_comment+end_comment: if line.startswith(item): cond4 = True break if line == "" and not in_multi_comment: blanks += 1 # 注释有4种 # 1. # 井号开头的单行注释 # 2. 多行注释符在同一行的情况 # 3. 多行注释符之间的行 elif line.startswith(CONF.get(exstansion).get("single")) or cond2 or \ (in_multi_comment and not cond3): comments += 1 # 4. 多行注释符分布在多行时,开始行和结束行 elif cond4: in_multi_comment = not in_multi_comment comments += 1 else: codes += 1Only one configuration constant is needed to mark the single-line and multi-line comment symbols of all languages, corresponding to cond1 to cond4. It is ok. The remaining task is to parse multiple files, which can be done using the os.walk method.
def counter(path): """ 可以统计目录或者某个文件 :param path: :return: """ if os.path.isdir(path): comments, blanks, codes = 0, 0, 0 list_dirs = os.walk(path) for root, dirs, files in list_dirs: for f in files: file_path = os.path.join(root, f) stats = parse(file_path) comments += stats.get("comments") blanks += stats.get("blanks") codes += stats.get("codes") return {"comments": comments, "blanks": blanks, "codes": codes} else: return parse(path)Of course, there is still a lot of work to be done to perfect this program, including command line parsing, which only parses a certain language based on specified parameters.
Supplement:
Python implementation of code line counting tool
We often want to count The number of lines of code of the project, but if you want to have a more complete statistical function, it may not be that simple. Today we will take a look at how to use python to implement a line of code statistics tool.Idea:
First get all the files, then count the number of lines of code in each file, and finally add the number of lines.Function implemented:
统计每个文件的行数;
统计总行数;
统计运行时间;
支持指定统计文件类型,排除不想统计的文件类型;
递归统计文件夹下包括子文件件下的文件的行数;
排除空行;
# coding=utf-8 import os import time basedir = '/root/script' filelists = [] # 指定想要统计的文件类型 whitelist = ['php', 'py'] #遍历文件, 递归遍历文件夹中的所有 def getFile(basedir): global filelists for parent,dirnames,filenames in os.walk(basedir): #for dirname in dirnames: # getFile(os.path.join(parent,dirname)) #递归 for filename in filenames: ext = filename.split('.')[-1] #只统计指定的文件类型,略过一些log和cache文件 if ext in whitelist: filelists.append(os.path.join(parent,filename)) #统计一个文件的行数 def countLine(fname): count = 0 for file_line in open(fname).xreadlines(): if file_line != '' and file_line != '\n': #过滤掉空行 count += 1 print fname + '----' , count return count if name == 'main' : startTime = time.clock() getFile(basedir) totalline = 0 for filelist in filelists: totalline = totalline + countLine(filelist) print 'total lines:',totalline print 'Done! Cost Time: %0.2f second' % (time.clock() - startTime)
结果:
[root@pythontab script]# python countCodeLine.py
/root/script/test/gametest.php---- 16
/root/script/smtp.php---- 284
/root/script/gametest.php---- 16
/root/script/countCodeLine.py---- 33
/root/script/sendmail.php---- 17
/root/script/test/gametest.php---- 16
total lines: 382
Done! Cost Time: 0.00 second
[root@pythontab script]#
相信看了本文案例你已经掌握了方法,更多精彩请关注php中文网其它相关文章!
推荐阅读:
The above is the detailed content of How to implement code statistics tool in Python. For more information, please follow other related articles on the PHP Chinese website!