Understanding String.replaceAll() Double Replacement Anomaly
The question arises as to why the following code:
<code class="java">System.out.println("test".replaceAll(".*", "a"));</code>
results in "aa" instead of the expected "a". This anomaly also occurs when using ".*$".
Cause:
The anomaly arises from the nature of the ".*" regex. It matches any character sequence, including an empty string. Therefore:
-
First Match: “.*” matches the entire string "test" and replaces it with "a."
-
Second Match: “.*” can also match an empty string, which it finds at the end of the input. This second match replaces the empty string with "a," resulting in the double replacement.
Solution:
To avoid this issue, consider using:
-
.replaceFirst(): Only replaces the first occurrence, preventing the double replacement.
-
. : Matches any character sequence of one or more characters, excluding empty strings.
Regex Behavior:
While .* can match an empty string, it cannot match more than twice. This is because:
- After the first match, the regex engine shifts one character ahead.
- The second match uses the .* to match the empty string that is now at the end of the input.
- The regex engine exhausts the input and shifts another character ahead.
- There is no more input for .* to match, preventing a third replacement.
The above is the detailed content of Why does `String.replaceAll(\'.*\', \'a\')` result in \'aa\' instead of \'a\'?. For more information, please follow other related articles on the PHP Chinese website!
Statement:The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn