search
HomeWeb Front-endHTML Tutorial让我们一起来构建一个模板引擎(四)_html/css_WEB-ITnose

在 上篇文章 中我们的模板引擎实现了对 include 和 extends 的支持, 到此为止我们已经实现了模板引擎所需的大部分功能。 在本文中我们将解决一些用于生成 html 的模板引擎需要面对的一些安全问题。

转义

首先要解决的就是转义问题。到目前为止我们的模板引擎并没有对变量和表达式结果进行转义处理, 如果用于生成 html 源码的话就会出现下面这样的问题 ( template3c.py ):

>>> from template3c import Template>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-br-world">hello<br />world</h1>'

很明显 title 中包含的标签需要被转义,不然就会出现非预期的结果。 这里我们只对 & " ' >

html_escape_table = {    '&': '&',    '"': '"',    '\'': '&apos;',    '>': '>',    '<': '<',}def html_escape(text):    return ''.join(html_escape_table.get(c, c) for c in text)

转义效果:

>>> html_escape('hello<br />world')'hello<br />world'

既然有转义自然也要有禁止转义的功能,毕竟不能一刀切否则就丧失灵活性了。

class NoEscape:    def __init__(self, raw_text):        self.raw_text = raw_textdef escape(text):    if isinstance(text, NoEscape):        return str(text.raw_text)    else:        text = str(text)        return html_escape(text)def noescape(text):    return NoEscape(text)

最终我们的模板引擎针对转义所做的修改如下(可以下载 template4a.py ):

class Template:    def __init__(self, ..., auto_escape=True):        ...        self.auto_escape = auto_escape        self.default_context.setdefault('escape', escape)        self.default_context.setdefault('noescape', noescape)        ...    def _handle_variable(self, token):        if self.auto_escape:            self.buffered.append('escape({})'.format(variable))        else:            self.buffered.append('str({})'.format(variable))    def _parse_another_template_file(self, filename):        ...        template = self.__class__(                ...,                auto_escape=self.auto_escape        )        ...class NoEscape:    def __init__(self, raw_text):        self.raw_text = raw_texthtml_escape_table = {    '&': '&',    '"': '"',    '\'': '&apos;',    '>': '>',    '<': '<',}def html_escape(text):    return ''.join(html_escape_table.get(c, c) for c in text)def escape(text):    if isinstance(text, NoEscape):        return str(text.raw_text)    else:        text = str(text)        return html_escape(text)def noescape(text):    return NoEscape(text)

效果:

>>> from template4a import Template>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-lt-br-gt-world">hello<br />world</h1>'>>> t = Template('<h1 id="noescape-title">{{ noescape(title) }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-br-world">hello<br />world</h1>'>>>

exec 的安全问题

由于我们的模板引擎是使用 exec 函数来执行生成的代码的,所以就需要注意一下 exec 函数的安全问题,预防可能的服务端模板注入攻击(详见 使用 exec 函数时需要注意的一些安全问题 )。

首先要限制的是在模板中使用内置函数和执行时上下文变量( template4b.py ):

class Template:    ...    def render(self, context=None):        """渲染模版"""        namespace = {}        namespace.update(self.default_context)        namespace.setdefault('__builtins__', {})   # <---        if context:            namespace.update(context)        exec(str(self.code_builder), namespace)        result = namespace[self.func_name]()        return result

效果:

>>> from template4b import Template>>> t = Template('{{ open("/etc/passwd").read() }}')>>> t.render()Traceback (most recent call last):  File "", line 1, in module>  File "/Users/mg/develop/lsbate/part4/template4b.py", line 245, in render    result = namespace[self.func_name]()  File "", line 3, in __func_nameNameError: name 'open' is not defined

然后就是要限制通过其他方式调用内置函数的行为:

>>> from template4b import Template>>> t = Template('{{ escape.__globals__["__builtins__"]["open"]("/etc/passwd").read()[0] }}')>>> t.render()'#'>>>>>> t = Template("{{ [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0].__init__.__globals__['path'].os.system('date') }}")>>> t.render()Mon May 30 22:10:46 CST 2016'0'

一种解决办法就是不允许在模板中访问以下划线 _ 开头的属性。 为什么要包括单下划线呢,因为约定单下划线开头的属性是约定的私有属性, 不应该在外部访问这些属性。

这里我们使用 dis 模块来帮助我们解析生成的代码,然后再找出其中的特殊属性。

import disimport ioclass Template:    def __init__(self, ..., safe_attribute=True):        ...        self.safe_attribute = safe_attribute    def render(self, ...):        ...        func = namespace[self.func_name]        if self.safe_attribute:            check_unsafe_attributes(func)        result = func()def check_unsafe_attributes(code):    writer = io.StringIO()    dis.dis(code, file=writer)    output = writer.getvalue()    match = re.search(r'\d+\s+LOAD_ATTR\s+\d+\s+<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_528889fac10d588d0c4bcca5fc09d9ee.gif'   style="max-width:90%" class='tex' alt="(?P<attr>_[^" /></span><script type='math/tex'>(?P<attr>_[^</script>]+)<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_8e9262bd0cfe5666042f5b56e0808688.gif'   style="max-width:90%" class='tex' alt="',                      output)    if match is not None:        attr = match.group('attr')        msg = "access to attribute '{0}' is unsafe.".format(attr)        raise AttributeError(msg)

效果:

>>> from template4c import Template>>> t = Template("{{ [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0].__init__.__globals__['path'].os.system('date') }}")>>> t.render()Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/xxx/lsbate/part4/template4c.py", line 250, in render    check_unsafe_attributes(func)  File "/xxx/lsbate/part4/template4c.py", line 296, in check_unsafe_attributes    raise AttributeError(msg)AttributeError: access to attribute '__class__' is unsafe.>>>>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-lt-br-gt-world">hello<br />world</h1>'

这个系列的文章到目前为止就已经全部完成了。

如果大家感兴趣的话可以尝试使用另外的方式来解析模板内容, 即: 使用词法分析/语法分析的方式来解析模板内容(欢迎分享实现过程)。

P.S. 整个系列的所有文章地址:

  • 让我们一起来构建一个模板引擎(三)
  • 让我们一起来构建一个模板引擎(二)
  • 让我们一起来构建一个模板引擎(一)

P.S. 文章中涉及的代码已经放到 GitHub 上了: https://github.com/mozillazg/lsbate

打赏支持我写出更多好文章,谢谢!

打赏作者

打赏支持我写出更多好文章,谢谢!

关于作者:mozillazg

好好学习,天天向上。 个人主页 · 我的文章 · 1 ·    

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
The Future of HTML, CSS, and JavaScript: Web Development TrendsThe Future of HTML, CSS, and JavaScript: Web Development TrendsApr 19, 2025 am 12:02 AM

The future trends of HTML are semantics and web components, the future trends of CSS are CSS-in-JS and CSSHoudini, and the future trends of JavaScript are WebAssembly and Serverless. 1. HTML semantics improve accessibility and SEO effects, and Web components improve development efficiency, but attention should be paid to browser compatibility. 2. CSS-in-JS enhances style management flexibility but may increase file size. CSSHoudini allows direct operation of CSS rendering. 3.WebAssembly optimizes browser application performance but has a steep learning curve, and Serverless simplifies development but requires optimization of cold start problems.

HTML: The Structure, CSS: The Style, JavaScript: The BehaviorHTML: The Structure, CSS: The Style, JavaScript: The BehaviorApr 18, 2025 am 12:09 AM

The roles of HTML, CSS and JavaScript in web development are: 1. HTML defines the web page structure, 2. CSS controls the web page style, and 3. JavaScript adds dynamic behavior. Together, they build the framework, aesthetics and interactivity of modern websites.

The Future of HTML: Evolution and Trends in Web DesignThe Future of HTML: Evolution and Trends in Web DesignApr 17, 2025 am 12:12 AM

The future of HTML is full of infinite possibilities. 1) New features and standards will include more semantic tags and the popularity of WebComponents. 2) The web design trend will continue to develop towards responsive and accessible design. 3) Performance optimization will improve the user experience through responsive image loading and lazy loading technologies.

HTML vs. CSS vs. JavaScript: A Comparative OverviewHTML vs. CSS vs. JavaScript: A Comparative OverviewApr 16, 2025 am 12:04 AM

The roles of HTML, CSS and JavaScript in web development are: HTML is responsible for content structure, CSS is responsible for style, and JavaScript is responsible for dynamic behavior. 1. HTML defines the web page structure and content through tags to ensure semantics. 2. CSS controls the web page style through selectors and attributes to make it beautiful and easy to read. 3. JavaScript controls web page behavior through scripts to achieve dynamic and interactive functions.

HTML: Is It a Programming Language or Something Else?HTML: Is It a Programming Language or Something Else?Apr 15, 2025 am 12:13 AM

HTMLisnotaprogramminglanguage;itisamarkuplanguage.1)HTMLstructuresandformatswebcontentusingtags.2)ItworkswithCSSforstylingandJavaScriptforinteractivity,enhancingwebdevelopment.

HTML: Building the Structure of Web PagesHTML: Building the Structure of Web PagesApr 14, 2025 am 12:14 AM

HTML is the cornerstone of building web page structure. 1. HTML defines the content structure and semantics, and uses, etc. tags. 2. Provide semantic markers, such as, etc., to improve SEO effect. 3. To realize user interaction through tags, pay attention to form verification. 4. Use advanced elements such as, combined with JavaScript to achieve dynamic effects. 5. Common errors include unclosed labels and unquoted attribute values, and verification tools are required. 6. Optimization strategies include reducing HTTP requests, compressing HTML, using semantic tags, etc.

From Text to Websites: The Power of HTMLFrom Text to Websites: The Power of HTMLApr 13, 2025 am 12:07 AM

HTML is a language used to build web pages, defining web page structure and content through tags and attributes. 1) HTML organizes document structure through tags, such as,. 2) The browser parses HTML to build the DOM and renders the web page. 3) New features of HTML5, such as, enhance multimedia functions. 4) Common errors include unclosed labels and unquoted attribute values. 5) Optimization suggestions include using semantic tags and reducing file size.

Understanding HTML, CSS, and JavaScript: A Beginner's GuideUnderstanding HTML, CSS, and JavaScript: A Beginner's GuideApr 12, 2025 am 12:02 AM

WebdevelopmentreliesonHTML,CSS,andJavaScript:1)HTMLstructurescontent,2)CSSstylesit,and3)JavaScriptaddsinteractivity,formingthebasisofmodernwebexperiences.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool