Python Ast 추상 구문 트리는 어떻게 사용해야 합니까?-파이썬 튜토리얼-php.cn

집

백엔드 개발

파이썬 튜토리얼

Python Ast 추상 구문 트리는 어떻게 사용해야 합니까?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 09, 2023 pm 12:49 PM

pythonast

소개

추상 구문 트리는 추상 구문 트리입니다. Ast는 Python 소스 코드에서 바이트 코드까지의 중간 제품입니다. ast 모듈의 도움으로 소스 코드 구조를 구문 트리 관점에서 분석할 수 있습니다.

또한 구문 트리를 수정하고 실행할 수 있을 뿐만 아니라 Source에서 생성된 구문 트리를 Python 소스 코드로 구문 분석할 수도 있습니다. 따라서 ast는 Python 소스 코드 확인, 구문 분석, 코드 수정 및 코드 디버깅을 위한 충분한 공간을 남겨둡니다.

1. AST 소개

Python에서 공식적으로 제공하는 CPython 인터프리터는 Python 소스 코드를 다음과 같이 처리합니다.

소스 코드를 구문 분석 트리(Parser/pgen.c)로 구문 분석

구문 분석 트리를 추상 구문 트리로 변환 (Python/ast.c)

AST를 제어 흐름 그래프로 변환 (Python/compile.c)

제어 흐름 그래프를 기반으로 바이트 코드를 생성합니다 (Python/compile.c)

즉, 실제 Python 코드는 다음과 같습니다.

소스 코드 분석--> 구문 트리-->추상 구문 트리(AST)-->바이트코드

위 프로세스는 python2 이후에 적용됩니다. 5. Python 소스 코드는 먼저 구문 트리로 구문 분석된 다음 추상 구문 트리로 변환됩니다. 추상 구문 트리에서는 소스 코드 파일의 Python 구문 구조를 볼 수 있습니다.

대부분의 경우 프로그래밍에는 추상 구문 트리가 필요하지 않을 수 있지만 특정 조건 및 요구 사항에서는 AST 고유의 특별한 편의성이 있습니다.

다음은 추상 구문의 간단한 예입니다.

Module(body=[
    Print(
          dest=None,
          values=[BinOp( left=Num(n=1),op=Add(),right=Num(n=2))],
          nl=True,
 )])

2. AST 만들기

2.1 컴파일 기능

먼저 컴파일 기능에 대해 간단히 알아보겠습니다.

compile(source, filename, mode[, flags[, dont_inherit]])

source - 문자열 또는 AST(추상 구문 트리) 개체. 일반적으로 전체 py 파일 내용을 file.read()에 전달할 수 있습니다.
filename – 코드 파일의 이름이거나 파일에서 코드를 읽지 못하는 경우 식별 가능한 값을 전달합니다.
mode - 컴파일된 코드 유형을 지정합니다. exec, eval, Single로 지정할 수 있습니다.
flags - 변수 범위, 로컬 네임스페이스(제공된 경우)는 모든 매핑 개체가 될 수 있습니다.
flags 및 dont_inherit는 소스 코드를 컴파일할 때 제어하는 데 사용되는 플래그입니다.

func_def = \
"""
def add(x, y):
    return x + y
print add(3, 5)
"""

컴파일을 사용하여 컴파일하고 실행합니다.

>>> cm = compile(func_def, &#39;<string>&#39;, &#39;exec&#39;)
>>> exec cm
>>> 8

위의 func_def는 바이트코드를 얻기 위해 컴파일로 컴파일됩니다. cm은 코드 객체입니다.

True == isinstance(cm, type.CodeType).

compile(source, filename, mode, ast.PyCF_ONLY_AST) ast.parse(source, filename='', mode='exec')

2.2 ast 생성

위 func_def를 사용하여 ast를 생성합니다.

r_node = ast.parse(func_def)
print astunparse.dump(r_node)    # print ast.dump(r_node)

다음은 func_def에 해당하는 ast 구조입니다.

Module(body=[
    FunctionDef(
        name=&#39;add&#39;,
        args=arguments(
            args=[Name(id=&#39;x&#39;,ctx=Param()),Name(id=&#39;y&#39;,ctx=Param())],
            vararg=None,
            kwarg=None,
            defaults=[]),
        body=[Return(value=BinOp(
            left=Name(id=&#39;x&#39;,ctx=Load()),
            op=Add(),
            right=Name(id=&#39;y&#39;,ctx=Load())))],
        decorator_list=[]),
    Print(
        dest=None,
        values=[Call(
                func=Name(id=&#39;add&#39;,ctx=Load()),
                args=[Num(n=3),Num(n=5)],
                keywords=[],
                starargs=None,
                kwargs=None)],
        nl=True)
  ])

ast.dump 외에도 astunparse, codegen, unparse 등 ast를 덤프하는 타사 라이브러리가 많이 있습니다. , 등. 이러한 타사 라이브러리는 AST 구조를 더 나은 방식으로 표시할 수 있을 뿐만 아니라 AST를 Python 소스 코드로 역으로 내보낼 수도 있습니다.

module Python version "$Revision$"
{
  mod = Module(stmt* body)| Expression(expr body)
  stmt = FunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list)
        | ClassDef(identifier name, expr* bases, stmt* body, expr* decorator_list)
        | Return(expr? value)
        | Print(expr? dest, expr* values, bool nl)| For(expr target, expr iter, stmt* body, stmt* orelse)
  expr = BoolOp(boolop op, expr* values)
       | BinOp(expr left, operator op, expr right)| Lambda(arguments args, expr body)| Dict(expr* keys, expr* values)| Num(object n) -- a number as a PyObject.
       | Str(string s) -- need to specify raw, unicode, etc?| Name(identifier id, expr_context ctx)
       | List(expr* elts, expr_context ctx) 
        -- col_offset is the byte offset in the utf8 string the parser uses
        attributes (int lineno, int col_offset)
  expr_context = Load | Store | Del | AugLoad | AugStore | Param
  boolop = And | Or 
  operator = Add | Sub | Mult | Div | Mod | Pow | LShift | RShift | BitOr | BitXor | BitAnd | FloorDiv
  arguments = (expr* args, identifier? vararg, identifier? kwarg, expr* defaults)
}

위는 공식 홈페이지에서 발췌한 Abstract Grammar의 일부입니다. ast Node의 실제 순회 중에는 Node의 유형에 따라 해당 속성에 액세스합니다.

3. 탐색 AST

python은 전체 추상 구문 트리를 탐색하는 두 가지 방법을 제공합니다.

3.1 ast.NodeTransfer

func_def의 add 함수에 있는 덧셈 연산을 뺄셈으로 변경하고, 함수 구현을 위한 호출 로그를 추가합니다.

  class CodeVisitor(ast.NodeVisitor):
      def visit_BinOp(self, node):
          if isinstance(node.op, ast.Add):
              node.op = ast.Sub()
          self.generic_visit(node)
      def visit_FunctionDef(self, node):
          print &#39;Function Name:%s&#39;% node.name
          self.generic_visit(node)
          func_log_stmt = ast.Print(
              dest = None,
              values = [ast.Str(s = &#39;calling func: %s&#39; % node.name, lineno = 0, col_offset = 0)],
              nl = True,
              lineno = 0,
              col_offset = 0,
          )
          node.body.insert(0, func_log_stmt)
  r_node = ast.parse(func_def)
  visitor = CodeVisitor()
  visitor.visit(r_node)
  # print astunparse.dump(r_node)
  print astunparse.unparse(r_node)
  exec compile(r_node, &#39;<string>&#39;, &#39;exec&#39;)

실행 결과:

Function Name:add
def add(x, y):
    print &#39;calling func: add&#39;
    return (x - y)
print add(3, 5)
calling func: add
-2

3.2 ast.NodeTransformer

NodeVisitor를 사용하면 주로 구문 트리의 노드를 수정하여 AST 구조가 변경되고, NodeTransformer는 주로 ast의 노드를 대체합니다.

이제 func_def에 정의된 add가 빼기 함수로 바뀌었으니, 좀 더 철저하게 ast에서 함수 이름, 매개변수, 호출 함수를 변경하고, 추가된 함수 호출 로그를 좀 더 복잡하게 만들어 보도록 하겠습니다. 인식할 수 없을 정도로:-)

  class CodeTransformer(ast.NodeTransformer):
      def visit_BinOp(self, node):
          if isinstance(node.op, ast.Add):
              node.op = ast.Sub()
          self.generic_visit(node)
          return node
      def visit_FunctionDef(self, node):
          self.generic_visit(node)
          if node.name == &#39;add&#39;:
              node.name = &#39;sub&#39;
          args_num = len(node.args.args)
          args = tuple([arg.id for arg in node.args.args])
          func_log_stmt = &#39;&#39;.join(["print &#39;calling func: %s&#39;, " % node.name, "&#39;args:&#39;", ", %s" * args_num % args])
          node.body.insert(0, ast.parse(func_log_stmt))
          return node
      def visit_Name(self, node):
          replace = {&#39;add&#39;: &#39;sub&#39;, &#39;x&#39;: &#39;a&#39;, &#39;y&#39;: &#39;b&#39;}
          re_id = replace.get(node.id, None)
          node.id = re_id or node.id
          self.generic_visit(node)
          return node
  r_node = ast.parse(func_def)
  transformer = CodeTransformer()
  r_node = transformer.visit(r_node)
  # print astunparse.dump(r_node)
  source = astunparse.unparse(r_node)
  print source
  # exec compile(r_node, &#39;<string>&#39;, &#39;exec&#39;)        # 新加入的node func_log_stmt 缺少lineno和col_offset属性
  exec compile(source, &#39;<string>&#39;, &#39;exec&#39;)
  exec compile(ast.parse(source), &#39;<string>&#39;, &#39;exec&#39;)

결과:

def sub(a, b):
    print &#39;calling func: sub&#39;, &#39;args:&#39;, a, b
    return (a - b)
print sub(3, 5)
calling func: sub args: 3 5
-2
calling func: sub args: 3 5
-2

둘 사이의 차이점은 코드에서 명확하게 볼 수 있습니다. 여기서는 자세히 다루지 않겠습니다.

4.AST 응용

AST 모듈은 실제 프로그래밍에서는 거의 사용되지 않지만, 구문 검사, 오류 디버깅, 특수 필드 감지 등 보조 소스 코드 검사 방법으로 매우 의미가 있습니다.

위에서 함수에 호출 로그 정보를 추가하는 것은 Python 소스 코드를 디버깅하는 방법이지만 실제로는 전체 Python 파일을 구문 분석하여 소스 코드를 순회하고 수정합니다.

4.1 한자 감지

다음은 중국어, 일본어, 한국어 문자의 유니코드 인코딩 범위입니다

CJK 통합 표의어

범위: 4E00— 9FFF

문자 수: 20992

Langu 연령: 중국인, 일본어, 한국어, 베트남어

한자를 식별하려면 유니코드 범위 u4e00 - u9fff를 사용하세요. 이 범위에는 한자가 포함되지 않습니다(예: u';' == u'uff1b').

다음은 문자열에 중국어 문자가 포함되어 있는지 확인하는 방법 CNCheckHelper 클래스:

  class CNCheckHelper(object):
      # 待检测文本可能的编码方式列表
      VALID_ENCODING = (&#39;utf-8&#39;, &#39;gbk&#39;)
      def _get_unicode_imp(self, value, idx = 0):
          if idx < len(self.VALID_ENCODING):
              try:
                  return value.decode(self.VALID_ENCODING[idx])
              except:
                  return self._get_unicode_imp(value, idx + 1)
      def _get_unicode(self, from_str):
          if isinstance(from_str, unicode):
              return None
          return self._get_unicode_imp(from_str)
      def is_any_chinese(self, check_str, is_strict = True):
          unicode_str = self._get_unicode(check_str)
          if unicode_str:
              c_func = any if is_strict else all
              return c_func(u&#39;\u4e00&#39; <= char <= u&#39;\u9fff&#39; for char in unicode_str)
          return False

인터페이스 is_any_chinese에는 중국어 문자열이 포함된 경우 엄격한 감지를 확인할 수 있으며, 비엄격 감지에는 모든 중국어 문자가 포함되어야 합니다.

下面我们利用ast来遍历源文件的抽象语法树，并检测其中字符串是否包含中文字符。

  class CodeCheck(ast.NodeVisitor):
      def __init__(self):
          self.cn_checker = CNCheckHelper()
      def visit_Str(self, node):
          self.generic_visit(node)
          # if node.s and any(u&#39;\u4e00&#39; <= char <= u&#39;\u9fff&#39; for char in node.s.decode(&#39;utf-8&#39;)):
          if self.cn_checker.is_any_chinese(node.s, True):
              print &#39;line no: %d, column offset: %d, CN_Str: %s&#39; % (node.lineno, node.col_offset, node.s)
  project_dir = &#39;./your_project/script&#39;
  for root, dirs, files in os.walk(project_dir):
      print root, dirs, files
      py_files = filter(lambda file: file.endswith(&#39;.py&#39;), files)
      checker = CodeCheck()
      for file in py_files:
          file_path = os.path.join(root, file)
          print &#39;Checking: %s&#39; % file_path
          with open(file_path, &#39;r&#39;) as f:
              root_node = ast.parse(f.read())
              checker.visit(root_node)

上面这个例子比较的简单，但大概就是这个意思。

关于CPython解释器执行源码的过程可以参考官网描述：PEP 339

4.2 Closure 检查

一个函数中定义的函数或者lambda中引用了父函数中的local variable，并且当做返回值返回。特定场景下闭包是非常有用的，但是也很容易被误用。

关于python闭包的概念可以参考我的另一篇文章：理解Python闭包概念

这里简单介绍一下如何借助ast来检测lambda中闭包的引用。代码如下：

  class LambdaCheck(ast.NodeVisitor):
      def __init__(self):
          self.illegal_args_list = []
          self._cur_file = None
          self._cur_lambda_args = []
      def set_cur_file(self, cur_file):
          assert os.path.isfile(cur_file)， cur_file
          self._cur_file = os.path.realpath(cur_file)
      def visit_Lambda(self, node):
          """
          lambda 闭包检查原则：
          只需检测lambda expr body中args是否引用了lambda args list之外的参数
          """
          self._cur_lambda_args =[a.id for a in node.args.args]
          print astunparse.unparse(node)
          # print astunparse.dump(node)
          self.get_lambda_body_args(node.body)
          self.generic_visit(node)
      def record_args(self, name_node):
          if isinstance(name_node, ast.Name) and name_node.id not in self._cur_lambda_args:
              self.illegal_args_list.append((self._cur_file, &#39;line no:%s&#39; % name_node.lineno, &#39;var:%s&#39; % name_node.id))
      def _is_args(self, node):
          if isinstance(node, ast.Name):
              self.record_args(node)
              return True
          if isinstance(node, ast.Call):
              map(self.record_args, node.args)
              return True
          return False
      def get_lambda_body_args(self, node):
          if self._is_args(node): return
          # for cnode in ast.walk(node):
          for cnode in ast.iter_child_nodes(node):
              if not self._is_args(cnode):
                  self.get_lambda_body_args(cnode)

遍历工程文件：

  project_dir = &#39;./your project/script&#39;
  for root, dirs, files in os.walk(project_dir):
      py_files = filter(lambda file: file.endswith(&#39;.py&#39;), files)
      checker = LambdaCheck()
      for file in py_files:
          file_path = os.path.join(root, file)
          checker.set_cur_file(file_path)
          with open(file_path, &#39;r&#39;) as f:
              root_node = ast.parse(f.read())
              checker.visit(root_node)
      res = &#39;\n&#39;.join([&#39; ## &#39;.join(info) for info in checker.illegal_args_list])
      print res

由于Lambda(arguments args, expr body)中的body expression可能非常复杂，上面的例子中仅仅处理了比较简单的body expr。可根据自己工程特点修改和扩展检查规则。为了更加一般化可以单独写一个visitor类来遍历lambda节点。

위 내용은 Python Ast 추상 구문 트리는 어떻게 사용해야 합니까?의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

성명

이 기사는 亿速云에서 복제됩니다. 침해가 있는 경우 admin@php.cn으로 문의하시기 바랍니다. 삭제