Home > Article > Backend Development > How to remove HTML tags using regular expression in PHP
When writing web applications, we often need to remove HTML tags from user input and convert it to plain text format. This prevents cross-site scripting (XSS) attacks and improves the readability of text content. In PHP, you can use regular expressions to achieve this goal.
A common method is to use PHP's strip_tags() function. This function removes all HTML tags from the string. However, there are cases where you may want to retain some tags, such as link and image tags. In this case, regular expressions come in handy.
First, let’s take a look at how to use regular expressions to remove HTML tags and convert them to plain text. The following is a simple PHP code example:
$string = "<p>这是带有<strong> HTML标记 </strong>的文本。</p>"; $text = preg_replace("/<[^>]+>/", '', $string); echo $text; // 输出: 这是带有 HTML标记 的文本。
This regular expression means: find all text starting with "73b7bca4967b2e7f05ccd6b660bfe28c" in the string and replace them with empty (i.e. delete).
Now, let’s see how to keep only certain HTML tags. Suppose we want to keep the 16e3673c8f9326235a5217cdf963cf19 and a482408f4345e9b47492d085db8fef6a tags. The following is a sample code:
$string = "<p>这是带有<strong> HTML标记 </strong>的文本,其中有<a href='http://example.com'>链接</a>和<img src='image.jpg'>。</p>"; $text = preg_replace("/<(?!a|img)[^>]*>/", '', $string); echo $text; // 输出: 这是带有 HTML标记 的文本,其中有<a href='http://example.com'>链接</a>和<img src='image.jpg'>。
The meaning of this regular expression is: find all text starting with "73b7bca4967b2e7f05ccd6b660bfe28c" in the string, but they must not be 16e3673c8f9326235a5217cdf963cf19 or a482408f4345e9b47492d085db8fef6a tag.
We can use (?!) negative lookahead to achieve this goal. This expression tells the regular expression engine, "Look for a tag that starts with "5a3deacd799b0670a96d6eedb063c72d" but is not a 16e3673c8f9326235a5217cdf963cf19 or a482408f4345e9b47492d085db8fef6a tag."
Note that we also use " " to match the label's bounds. This is because if we omit it, for example using "3d3d70c85a9d36c1cf5113a0b337af3a" and "2fbe6547fcb2fa6a5faf70a81ceaf6e4".
By using the above method, you can use regular expressions in PHP to remove HTML tags and convert them to plain text. Please note that this is only one step in preventing XSS attacks. It is often necessary to use other technologies such as input validation, output filtering, and session management to enhance the security of web applications.
The above is the detailed content of How to remove HTML tags using regular expression in PHP. For more information, please follow other related articles on the PHP Chinese website!