浏览器Lexer与XSS-HTML编码_html/css_WEB-ITnose-HTML Tutorial-php.cn

Home

Web Front-end

HTML Tutorial

浏览器Lexer与XSS-HTML编码_html/css_WEB-ITnose

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 21, 2016 am 08:52 AM

0×00 简介

0×01 解码过程总述

0×02 浏览器中的词法分析

0×03 HTML编码与HTML解析

0×04 常见误区

0×05 浏览器有趣的容错行为

0×06 结语

*原创作者：VillanCh

0×00 简介

编码问题一直是一个痛点，在wooyun有一篇XSS编码的文章，讲到一些痛点，既然准备再次完成一篇对XSS中的编码讲解，同时也对得起这个文章的名字，本文就比较系统的讲一下浏览器Lexer中HTML编码处理的问题与XSS的html编码原理剖析。

0×01 解码过程总述

在开始XSS之前，我们如果不清楚编码解码的过程，将会对XSS造成非常大的困难，不懂得编码而乱插一气，如果你是自动化工具还好但是如果你是手动XSS，那么你可就遭殃了，运气好做出来，运气不好就怎么样也解决不了编码问题了。

了解编码过程首先从浏览器解析来讲吧

对浏览器解析HTML有过了解的同学，肯定是清楚浏览的的这些工具原理，一般来讲浏览器通过Lexer-Parser来解析生成Dom树然后再对CSS元素进行渲染，最后执行javascript（浏览器脚本），但是为什么要讲这一部分呢？原因就是这和解码的顺序是有关系的！

举一个简单的例子吧：在HTML（非XHTML）环境下如果你的xss输出点在<script>标签内，你采用了HTML实体编码的形式，怎么可能触发XSS漏洞呢？如果你不懂这个问题，也许你会做很多无用功。</script>

0×02 浏览器中的词法分析

熟悉编译原理的读者可以自由选择快速略过第一第二段或者短暂复习一下。

关于一个计算机工作人员是否需要学习编译原理这个话题，我相信大家各持己见。但是我相信如果你是要做一个优秀的程序员或者是IT工作者，编译原理不一定要精通，但是至少应该有所了解，限于篇幅的原因，我并不打算在这里讲太多的编译原理的只是简单提及一下让大家知道编译原理到底是干什么的，在浏览器中是怎么被应用的。

Parser-Lexer Combination（解析器-词法分析器）

这个结构负责对html文档进行解析，解析的过程分为两个过程：词法分析和语法分析

本部分，主要讲词法分析部分

词法分析就是将输入的句子（语句，内容）分解为有顺序的单词和符号:具体例子就是如果输入1+2-3，那么经过词法分析，就应该按顺序得到五个token：分别是1（int），+（option），2（int），-（option），3（int）。然后得到的结果交给语法分析进行上下文无关语法判别。

如果有兴趣了解如何实现词法分析，可以参考编译原理及实践这本书。

那么在浏览器中，词法分析的特性还是值得注意的，例如，它会自动跳过HTML中的空格和换行或者制表符，这样也就是有些条件下仅仅是多个空格或者换行符制表符就能起到过waf，的原理了，（但是现在这种bypass方法已经很out了）。除此之外呢，在词法分析中，也许还会忽略注释部分，那么大家是不是又有一些想法了呢？那么，我们结合以前XSS的经验，笔者结合符号算法的简单叙述，大家可以理解检测一下自己的猜测是不是正确。

众所周知，我们的浏览器解析html时，是把

<img  src = 1/ alt="浏览器Lexer与XSS-HTML编码_html/css_WEB-ITnose" >

这个标签解析成

<img src=1/ alt="浏览器Lexer与XSS-HTML编码_html/css_WEB-ITnose" >

这六个符号（token）的。

那么仅仅就是这么简单么？答案当然是否定的。

解析过程简单例子：

1. 在解析

2. 然后解析到

3. 找到标签名，状态变为Tag name state，这个状态就表示已经识别了标签名，

4. 然后知道读取到最近的一个>时，结束tag name state的状态，重新进入Data State。

如果嵌套有标签的话重复上述解析步骤，关于

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

HTML: Building the Structure of Web PagesApr 14, 2025 am 12:14 AM

HTML is the cornerstone of building web page structure. 1. HTML defines the content structure and semantics, and uses, etc. tags. 2. Provide semantic markers, such as, etc., to improve SEO effect. 3. To realize user interaction through tags, pay attention to form verification. 4. Use advanced elements such as, combined with JavaScript to achieve dynamic effects. 5. Common errors include unclosed labels and unquoted attribute values, and verification tools are required. 6. Optimization strategies include reducing HTTP requests, compressing HTML, using semantic tags, etc.

From Text to Websites: The Power of HTMLApr 13, 2025 am 12:07 AM

HTML is a language used to build web pages, defining web page structure and content through tags and attributes. 1) HTML organizes document structure through tags, such as,. 2) The browser parses HTML to build the DOM and renders the web page. 3) New features of HTML5, such as, enhance multimedia functions. 4) Common errors include unclosed labels and unquoted attribute values. 5) Optimization suggestions include using semantic tags and reducing file size.

Understanding HTML, CSS, and JavaScript: A Beginner's GuideApr 12, 2025 am 12:02 AM

WebdevelopmentreliesonHTML,CSS,andJavaScript:1)HTMLstructurescontent,2)CSSstylesit,and3)JavaScriptaddsinteractivity,formingthebasisofmodernwebexperiences.

The Role of HTML: Structuring Web ContentApr 11, 2025 am 12:12 AM

The role of HTML is to define the structure and content of a web page through tags and attributes. 1. HTML organizes content through tags such as , making it easy to read and understand. 2. Use semantic tags such as, etc. to enhance accessibility and SEO. 3. Optimizing HTML code can improve web page loading speed and user experience.

HTML and Code: A Closer Look at the TerminologyApr 10, 2025 am 09:28 AM

HTMLisaspecifictypeofcodefocusedonstructuringwebcontent,while"code"broadlyincludeslanguageslikeJavaScriptandPythonforfunctionality.1)HTMLdefineswebpagestructureusingtags.2)"Code"encompassesawiderrangeoflanguagesforlogicandinteract

HTML, CSS, and JavaScript: Essential Tools for Web DevelopersApr 09, 2025 am 12:12 AM

HTML, CSS and JavaScript are the three pillars of web development. 1. HTML defines the web page structure and uses tags such as, etc. 2. CSS controls the web page style, using selectors and attributes such as color, font-size, etc. 3. JavaScript realizes dynamic effects and interaction, through event monitoring and DOM operations.

The Roles of HTML, CSS, and JavaScript: Core ResponsibilitiesApr 08, 2025 pm 07:05 PM

HTML defines the web structure, CSS is responsible for style and layout, and JavaScript gives dynamic interaction. The three perform their duties in web development and jointly build a colorful website.

Is HTML easy to learn for beginners?Apr 07, 2025 am 12:11 AM

HTML is suitable for beginners because it is simple and easy to learn and can quickly see results. 1) The learning curve of HTML is smooth and easy to get started. 2) Just master the basic tags to start creating web pages. 3) High flexibility and can be used in combination with CSS and JavaScript. 4) Rich learning resources and modern tools support the learning process.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Linux new version

SublimeText3 Linux latest version

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Chinese version

Chinese version, very easy to use

Dreamweaver Mac version

Visual web development tools

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7499

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers