P粉3230507802023-09-03 16:42:37
There is no reasonable way to save a document as corrupted as what you posted, but assuming you replace the >
and similar characters in the text with their related entities, e.g.: > ;
, you can put the document you want to accept into an appropriate library, such as DomDocument which will handle the rest.
$input = <<<_E_ < div class='test' >1 > 0 is < b >true</ b> and apples >>> bananas< / div > _E_; $input = preg_replace([ '#<\s+#', '#</\s+#' ], [ '<', '</' ], $input); $d = new DomDocument(); $d->loadHTML($input, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); var_dump($d->saveHTML());
Output:
string(80) "<div class="test">1 > 0 is <b>true</b> and apples >>> bananas</div>"
P粉0644484492023-09-03 11:17:47
This regular expression is also valid:
It divides the valid part in the HTML tag into four parts and replaces the remaining parts (spaces) with them.
/(<)\s*(\/?)\s*([^<>]*\S)\s*(>)/g
(<)<)
- Capture the opening angle bracket (section 1) \s*
- matches any whitespace(\/?)
- Capturing optional backslashes (Part 2)\s*
- matches any space after a backslash([^<>]*\S)
- captures content within tags without trailing spaces (section 3) \s*
- Matches spaces after the content and before the closing angle bracket(>)
- Capture right angle bracket (section 4) const reg = /(<)\s*(\/?)\s*([^<>]*\S)\s*(>)/g const str = "< div class='test' >1 > 0 is < b >true< / b > and apples >>> bananas< / div >" const newStr = str.replace(reg, ""); console.log(newStr);