在网站使用input或textarea提供给用户可输入内容的功能,比如发帖子,发文章,发评论等等。这时候需要后端程序对输入内容作安全过滤,比如<script>等可造成安全隐患的标签。</script>
java中有个开源包叫Jsoup,本身用来解析html,xml文档的,特点是可以使用类似jquery的选择权语法。
最近在解决内容安全过滤的时候,通过google发现Jsoup通过自定义Whitelist(安全标签白名单)提供了这样的功能,非常好用。
简单演示如下:
//HTML cleanString unsafe = "<table><tr><td>1</td></tr></table>" + "<img src='' alt='' />" + "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a>" + "<object></object>" + "<script>alert(1);</script>" + "</p>";String safe = Jsoup.clean(unsafe, Whitelist.relaxed());System.out.println("safe: " + safe);
官方API地址: http://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html
发现来源:
http://www.oschina.net/question/12_10232 , 据此自己写了个自定义的帮助类:
package com.cssor.safety; import org.jsoup.Jsoup;import org.jsoup.helper.StringUtil;import org.jsoup.safety.Whitelist; public class ContentSafeFilter { private final static Whitelist user_content_filter = Whitelist.relaxed(); static { //增加可信标签到白名单 user_content_filter.addTags("embed","object","param","span","div"); //增加可信属性 user_content_filter.addAttributes(":all", "style", "class", "id", "name"); user_content_filter.addAttributes("object", "width", "height","classid","codebase"); user_content_filter.addAttributes("param", "name", "value"); user_content_filter.addAttributes("embed", "src","quality","width","height","allowFullScreen","allowScriptAccess","flashvars","name","type","pluginspage"); } /** * 对用户输入内容进行过滤 * @param html * @return */ public static String filter(String html) { if(StringUtil.isBlank(html)) return ""; return Jsoup.clean(html, user_content_filter); //return filterScriptAndStyle(html); } /** * 比较宽松的过滤,但是会过滤掉object,script, span,div等标签,适用于富文本编辑器内容或其他html内容 * @param html * @return */ public static String relaxed(String html) { return Jsoup.clean(html, Whitelist.relaxed()); } /** * 去掉所有标签,返回纯文字.适用于textarea,input * @param html * @return */ public static String pureText(String html) { return Jsoup.clean(html, Whitelist.none()); } /** * @param args */ public static void main(String[] args) { String unsafe = "<table><tr><td>1</td></tr></table>" + "<img src='' alt='' />" + "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a>" + "<object></object>" + "<script>alert(1);</script>" + "</p>"; String safe = ContentSafeFilter.filter(unsafe); System.out.println("safe: " + safe); } }
Jsoup不支持相对路径图片的过滤,比如会被去掉src属性,想了个简单的方法避免:
/** * 自定义对用户输入内容进行过滤的标签 * @param html * @return */public static String filter(String html) { if(StringUtil.isBlank(html)) return ""; String baseUri = "http://baseuri"; return Jsoup.clean(html, baseUri, user_content_filter).replaceAll("src=\"http://baseuri", "src=\"");}
http://cssor.com/jsoup-whitelist-clean-html-for-user-content.html

Boolean attributes are special attributes in HTML that are activated without a value. 1. The Boolean attribute controls the behavior of the element by whether it exists or not, such as disabled disable the input box. 2.Their working principle is to change element behavior according to the existence of attributes when the browser parses. 3. The basic usage is to directly add attributes, and the advanced usage can be dynamically controlled through JavaScript. 4. Common mistakes are mistakenly thinking that values need to be set, and the correct writing method should be concise. 5. The best practice is to keep the code concise and use Boolean properties reasonably to optimize web page performance and user experience.

HTML code can be cleaner with online validators, integrated tools and automated processes. 1) Use W3CMarkupValidationService to verify HTML code online. 2) Install and configure HTMLHint extension in VisualStudioCode for real-time verification. 3) Use HTMLTidy to automatically verify and clean HTML files in the construction process.

HTML, CSS and JavaScript are the core technologies for building modern web pages: 1. HTML defines the web page structure, 2. CSS is responsible for the appearance of the web page, 3. JavaScript provides web page dynamics and interactivity, and they work together to create a website with a good user experience.

The function of HTML is to define the structure and content of a web page, and its purpose is to provide a standardized way to display information. 1) HTML organizes various parts of the web page through tags and attributes, such as titles and paragraphs. 2) It supports the separation of content and performance and improves maintenance efficiency. 3) HTML is extensible, allowing custom tags to enhance SEO.

The future trends of HTML are semantics and web components, the future trends of CSS are CSS-in-JS and CSSHoudini, and the future trends of JavaScript are WebAssembly and Serverless. 1. HTML semantics improve accessibility and SEO effects, and Web components improve development efficiency, but attention should be paid to browser compatibility. 2. CSS-in-JS enhances style management flexibility but may increase file size. CSSHoudini allows direct operation of CSS rendering. 3.WebAssembly optimizes browser application performance but has a steep learning curve, and Serverless simplifies development but requires optimization of cold start problems.

The roles of HTML, CSS and JavaScript in web development are: 1. HTML defines the web page structure, 2. CSS controls the web page style, and 3. JavaScript adds dynamic behavior. Together, they build the framework, aesthetics and interactivity of modern websites.

The future of HTML is full of infinite possibilities. 1) New features and standards will include more semantic tags and the popularity of WebComponents. 2) The web design trend will continue to develop towards responsive and accessible design. 3) Performance optimization will improve the user experience through responsive image loading and lazy loading technologies.

The roles of HTML, CSS and JavaScript in web development are: HTML is responsible for content structure, CSS is responsible for style, and JavaScript is responsible for dynamic behavior. 1. HTML defines the web page structure and content through tags to ensure semantics. 2. CSS controls the web page style through selectors and attributes to make it beautiful and easy to read. 3. JavaScript controls web page behavior through scripts to achieve dynamic and interactive functions.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

WebStorm Mac version
Useful JavaScript development tools

Dreamweaver CS6
Visual web development tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.
