Home  >  Q&A  >  body text

java - How to split a string of the form (operator arg1 arg2 ... argn)?

A function is in the shape of
(operator arg1 arg2 ... argn)
is the operation symbol, parameter 1, parameter 2, up to parameter n. The parameter itself can also be a function in this format.
For example, a string like this
String="(add (add 1 2) (mul 2 1) 2 )"
We need to split its operands and parameters, that is, split it into

["add","(add 1 2)","(mul 2 1)","2"]

How should such a character array be divided?

My current approach is to remove the outermost brackets every time, and then want to use spaces to split the string, but the spaces in the middle will also become the place to be split. If you use regular expressions, since each parameter may still have nested parentheses inside, how should this situation be matched?

世界只因有你世界只因有你2674 days ago917

reply all(2)I'll reply

  • 仅有的幸福

    仅有的幸福2017-06-23 09:15:59

    Prefix notation, S-expression, Lisp expression

    lisp's S-expression is a multi-layer nested tree structure, which is closer to Abstract Syntax Tree (AST).

    Regular is difficult to parse S-expression without recursive grammar.

    The following is a simple example in python. I have commented it and it should be easy to understand.

    def parse_sexp(string):
        sexp = [[]]
        word = ''
        in_str = False #是否在读取字符串
        for char in string: # 遍历每个字符
            if char == '(' and not in_str: # 左括号
                sexp.append([])
            elif char == ')' and not in_str: # 右括号
                if word:
                    sexp[-1].append(word)
                    word = ''
                temp = sexp.pop()
                sexp[-1].append(tuple(temp)) # 形成嵌套
            elif char in ' \n\t' and not in_str: # 空白符
                if word:
                    sexp[-1].append(word)
                    word = ''
            elif char == '"': # 双引号,字符串起止的标记
                in_str = not in_str
            else:
                word += char # 不是以上的分隔符,就是个合法的标记
        return sexp[0]
    
    >>> parse_sexp("(+ 5 (+ 3 5))")
    [('+', '5', ('+', '3', '5'))]
    >>> parse_sexp("(add (add 1 2) (mul 2 1) 2 )")
    [('add', ('add', '1', '2'), ('mul', '2', '1'), '2')]

    S-expression

    reply
    0
  • 阿神

    阿神2017-06-23 09:15:59

    Regular:

    \(\s*\w+(\s+\d+)+\s*\)|\w+|\d+

    Note that this regex has a Global parameter

    If there is only one layer of nested (op arg...) in arg1, arg2, arg3, ... argn, you can use this method

    reply
    0
  • Cancelreply