Home  >  Article  >  Backend Development  >  asp.net regular expression to delete the code of the specified HTML tag

asp.net regular expression to delete the code of the specified HTML tag

高洛峰
高洛峰Original
2017-02-03 15:14:111430browse

If you delete all the HTML tags inside, it may cause difficulty in reading (such as a, img tags), it is best to delete some and keep some.

In regular expressions, it is judged whether a certain These strings are very easy to understand, but how to judge whether they contain certain strings (a string, not a character, something, not a certain one) is indeed a puzzling thing.

<(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+>

This regular rule is To determine that the HTML tags do not contain li / ul / a / img / br / span / b, as far as the above requirements are concerned, the HTML tags other than those listed here need to be deleted. This is what I figured out after a long time. .
(?!exp) matches a position that is not followed by exp
/?\s? I initially tried to write it after the front <, but the test failed.

The following is a simple function that strings together the TAGs to be retained, generates a regular expression, and then deletes the unnecessary TAGs...

private static string RemoveSpecifyHtml(string ctx) { 
string[] holdTags = { "a", "img", "br", "strong", "b", "span" };//要保留的 tag 
// <(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+> 
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@")|(/?\s?", holdTags)); 
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); 


return reg.Replace(ctx, ""); 
}

Correction:
The above regular expression, if After retaining li, you will find that link is also retained during actual operation. Retaining a will also retain addr. The solution is to add \b assertion.

<(?!((/?\s?li\b)|(/?\s?ul)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+> 

private static string RemoveSpecifyHtml(string ctx) { 
string[] holdTags = { "a", "img", "br", "strong", "b", "span", "li" };//保留的 tag 
// <(?!((/?\s?li\b)|(/?\s?ul\b)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+> 
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@"\b)|(/?\s?", holdTags)); 
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); 

return reg.Replace(ctx, ""); 
}

More asp.net regular expression deletion specifications For articles related to the code of HTML tags, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn