search

Home  >  Q&A  >  body text

Build a regular expression to add double quotes to parts of a JSON value that do not have double quotes

<p>I have a lot of malformed JSON strings, like this: </p> <pre class="brush:php;toolbar:false;">{ "id":23424938, "name":aN, "ref":aN, "jul":aN, "cat":{}, "src":[], "Code":"SA", "type":d, "spec":[i,j], "child":a }</pre> <p>I'm trying to build a regex to double quote a JSON value without success. </p> <p>I ended up using <code>/":([^"d{[] ?[^,}]?)/</code> which fixed everything except the values ​​inside the array, For example, <code>[i,j]</code> will not be converted to <code>["i","j"]</code>. </p> <p>Can you help me with the values ​​in brackets? </p> <p>https://regex101.com/r/CGskmy/1</p>
P粉231112437P粉231112437468 days ago521

reply all(1)I'll reply

  • P粉755863750

    P粉7558637502023-08-18 11:23:05

    This task will be somewhat difficult because of ambiguity. For example, does { "x": [y] } become { "x": "[y]" } or { "x": [" y"] }? I would assume that the unquoted string does not contain JSON control characters, such as '[', ']', '{', '}', '"', ':', ','.

    I think you can accomplish this using named capture groups, which is a feature in PHP that is possible using PCRE. This requires some programming to perform the replacement. The usual preg_replace operation is not enough because we don't replace all matches.

    This is the method I came up with. First, I match quoted strings and ignore them. Second, I match numbers and ignore them. Finally, I match the unquoted string and store it in a capturing group called "unquoted". Note that PCRE will try to match these alternatives in the order in which they are matched. Unquoted strings are matched only if quoted strings and numbers cannot be matched. This is the key to this approach.

    Once I've matched all unquoted strings, I just need to concatenate the output string with the replacement. This is done by iterating over the matches and copying the string fragments into the output.

    <?php
    
    $in = <<<'IN'
    {
        "id":23424938,
        "name":aN,
        "ref":aN,
        "jul":aN,
        "cat":{},
        "src":[],
        "Code":"SA",
        "type":d,
        "spec":[i,j],
        "child":a
    }
    IN;
    
    // 在输入字符串上匹配。我们特别关注“unquoted”匹配组。
    $pattern = '/(?:"(?:\\"|[^"])+")|(?:[\d.]+)|(?P<unquoted>[^{}\[\]":,\s][^{}\[\]":,]*(?<!\s))/';
    preg_match_all($pattern, $in, $matches, PREG_UNMATCHED_AS_NULL | PREG_OFFSET_CAPTURE);
    
    // 输出字符串
    $out = '';
    
    // 跟踪输入字符串的当前索引
    $ix = 0;
    
    // 循环遍历所有未加引号的匹配项
    foreach ($matches['unquoted'] as $match) {
        $str = $match[0];
        $pos = $match[1];
        if ($str !== NULL) {
            // 将输入字符串复制到输出字符串
            $out .= substr($in, $ix, $pos - $ix);
            // 将匹配的字符串复制到输出字符串,用引号括起来
            $out .= '"' . $str . '"';
            // 更新输入字符串索引
            $ix = $pos + strlen($str);
        }
    }
    
    // 将输入字符串的尾部复制到输出字符串
    $out .= substr($in, $ix, strlen($in) - $ix);
    
    // 输出字符串
    echo $out;
    

    I'm not dealing with the full JSON number syntax, nor with JSON syntax such as true, false, or null. Hopefully this answer is a starting point and you can tweak it to suit your needs.

    InSync provides a nice regular expression that does not use named capture groups but instead instructs PCRE to skip unwanted matches.

    (?:
      "(?:[^\"rrreee-\x1F\x7F]|\["\/bfnrt]|\u[\dA-Fa-f]{4})*"
    |
      -?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?
    )
    (*SKIP)(*FAIL)
    |
    [^{}[\]:,\s]+

    reply
    0
  • Cancelreply