让我们一起来构建一个模板引擎（四）_html/css

首页

web前端

html教程

让我们一起来构建一个模板引擎（四）_html/css_WEB-ITnose

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 24, 2016 am 11:15 AM

在上篇文章中我们的模板引擎实现了对 include 和 extends 的支持，到此为止我们已经实现了模板引擎所需的大部分功能。在本文中我们将解决一些用于生成 html 的模板引擎需要面对的一些安全问题。

转义

首先要解决的就是转义问题。到目前为止我们的模板引擎并没有对变量和表达式结果进行转义处理，如果用于生成 html 源码的话就会出现下面这样的问题 ( template3c.py ):

>>> from template3c import Template>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-br-world">hello<br />world</h1>'

很明显 title 中包含的标签需要被转义，不然就会出现非预期的结果。这里我们只对 & " ' >

html_escape_table = {    '&': '&',    '"': '"',    '\'': '&apos;',    '>': '>',    '<': '<',}def html_escape(text):    return ''.join(html_escape_table.get(c, c) for c in text)

转义效果:

>>> html_escape('hello<br />world')'hello<br />world'

既然有转义自然也要有禁止转义的功能，毕竟不能一刀切否则就丧失灵活性了。

class NoEscape:    def __init__(self, raw_text):        self.raw_text = raw_textdef escape(text):    if isinstance(text, NoEscape):        return str(text.raw_text)    else:        text = str(text)        return html_escape(text)def noescape(text):    return NoEscape(text)

最终我们的模板引擎针对转义所做的修改如下(可以下载 template4a.py ):

class Template:    def __init__(self, ..., auto_escape=True):        ...        self.auto_escape = auto_escape        self.default_context.setdefault('escape', escape)        self.default_context.setdefault('noescape', noescape)        ...    def _handle_variable(self, token):        if self.auto_escape:            self.buffered.append('escape({})'.format(variable))        else:            self.buffered.append('str({})'.format(variable))    def _parse_another_template_file(self, filename):        ...        template = self.__class__(                ...,                auto_escape=self.auto_escape        )        ...class NoEscape:    def __init__(self, raw_text):        self.raw_text = raw_texthtml_escape_table = {    '&': '&',    '"': '"',    '\'': '&apos;',    '>': '>',    '<': '<',}def html_escape(text):    return ''.join(html_escape_table.get(c, c) for c in text)def escape(text):    if isinstance(text, NoEscape):        return str(text.raw_text)    else:        text = str(text)        return html_escape(text)def noescape(text):    return NoEscape(text)

效果:

>>> from template4a import Template>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-lt-br-gt-world">hello<br />world</h1>'>>> t = Template('<h1 id="noescape-title">{{ noescape(title) }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-br-world">hello<br />world</h1>'>>>

exec 的安全问题

由于我们的模板引擎是使用 exec 函数来执行生成的代码的，所以就需要注意一下 exec 函数的安全问题，预防可能的服务端模板注入攻击（详见使用 exec 函数时需要注意的一些安全问题）。

首先要限制的是在模板中使用内置函数和执行时上下文变量( template4b.py ):

class Template:    ...    def render(self, context=None):        """渲染模版"""        namespace = {}        namespace.update(self.default_context)        namespace.setdefault('__builtins__', {})   # <---        if context:            namespace.update(context)        exec(str(self.code_builder), namespace)        result = namespace[self.func_name]()        return result

效果:

>>> from template4b import Template>>> t = Template('{{ open("/etc/passwd").read() }}')>>> t.render()Traceback (most recent call last):  File "", line 1, in module>  File "/Users/mg/develop/lsbate/part4/template4b.py", line 245, in render    result = namespace[self.func_name]()  File "", line 3, in __func_nameNameError: name 'open' is not defined

然后就是要限制通过其他方式调用内置函数的行为:

>>> from template4b import Template>>> t = Template('{{ escape.__globals__["__builtins__"]["open"]("/etc/passwd").read()[0] }}')>>> t.render()'#'>>>>>> t = Template("{{ [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0].__init__.__globals__['path'].os.system('date') }}")>>> t.render()Mon May 30 22:10:46 CST 2016'0'

一种解决办法就是不允许在模板中访问以下划线 _ 开头的属性。为什么要包括单下划线呢，因为约定单下划线开头的属性是约定的私有属性，不应该在外部访问这些属性。

这里我们使用 dis 模块来帮助我们解析生成的代码，然后再找出其中的特殊属性。

import disimport ioclass Template:    def __init__(self, ..., safe_attribute=True):        ...        self.safe_attribute = safe_attribute    def render(self, ...):        ...        func = namespace[self.func_name]        if self.safe_attribute:            check_unsafe_attributes(func)        result = func()def check_unsafe_attributes(code):    writer = io.StringIO()    dis.dis(code, file=writer)    output = writer.getvalue()    match = re.search(r'\d+\s+LOAD_ATTR\s+\d+\s+<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_528889fac10d588d0c4bcca5fc09d9ee.gif'   style="max-width:90%" class='tex' alt="(?P<attr>_[^" /></span><script type='math/tex'>(?P<attr>_[^</script>]+)<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_8e9262bd0cfe5666042f5b56e0808688.gif'   style="max-width:90%" class='tex' alt="',                      output)    if match is not None:        attr = match.group('attr')        msg = "access to attribute '{0}' is unsafe.".format(attr)        raise AttributeError(msg)

效果:

>>> from template4c import Template>>> t = Template("{{ [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0].__init__.__globals__['path'].os.system('date') }}")>>> t.render()Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/xxx/lsbate/part4/template4c.py", line 250, in render    check_unsafe_attributes(func)  File "/xxx/lsbate/part4/template4c.py", line 296, in check_unsafe_attributes    raise AttributeError(msg)AttributeError: access to attribute '__class__' is unsafe.>>>>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-lt-br-gt-world">hello<br />world</h1>'

这个系列的文章到目前为止就已经全部完成了。

如果大家感兴趣的话可以尝试使用另外的方式来解析模板内容, 即: 使用词法分析/语法分析的方式来解析模板内容（欢迎分享实现过程）。

P.S. 整个系列的所有文章地址：

让我们一起来构建一个模板引擎（三）
让我们一起来构建一个模板引擎（二）
让我们一起来构建一个模板引擎（一）

P.S. 文章中涉及的代码已经放到 GitHub 上了: https://github.com/mozillazg/lsbate

打赏支持我写出更多好文章，谢谢！
打赏作者

打赏支持我写出更多好文章，谢谢！

关于作者：mozillazg

好好学习，天天向上。个人主页 · 我的文章 · 1 ·

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

如何验证您的HTML代码？Apr 24, 2025 am 12:04 AM

HTML代码可以通过在线验证器、集成工具和自动化流程来确保其清洁度。1)使用W3CMarkupValidationService在线验证HTML代码。2)在VisualStudioCode中安装并配置HTMLHint扩展进行实时验证。3)利用HTMLTidy在构建流程中自动验证和清理HTML文件。

HTML与CSS和JavaScript：比较Web技术Apr 23, 2025 am 12:05 AM

HTML、CSS和JavaScript是构建现代网页的核心技术：1.HTML定义网页结构，2.CSS负责网页外观，3.JavaScript提供网页动态和交互性，它们共同作用，打造出用户体验良好的网站。

HTML作为标记语言：其功能和目的Apr 22, 2025 am 12:02 AM

HTML的功能是定义网页的结构和内容，其目的在于提供一种标准化的方式来展示信息。1）HTML通过标签和属性组织网页的各个部分，如标题和段落。2）它支持内容与表现分离，提升维护效率。3）HTML具有可扩展性，允许自定义标签增强SEO。

HTML，CSS和JavaScript的未来：网络开发趋势Apr 19, 2025 am 12:02 AM

HTML的未来趋势是语义化和Web组件，CSS的未来趋势是CSS-in-JS和CSSHoudini，JavaScript的未来趋势是WebAssembly和Serverless。1.HTML的语义化提高可访问性和SEO效果，Web组件提升开发效率但需注意浏览器兼容性。2.CSS-in-JS增强样式管理灵活性但可能增大文件体积，CSSHoudini允许直接操作CSS渲染。3.WebAssembly优化浏览器应用性能但学习曲线陡，Serverless简化开发但需优化冷启动问题。

HTML：结构，CSS：样式，JavaScript：行为Apr 18, 2025 am 12:09 AM

HTML、CSS和JavaScript在Web开发中的作用分别是：1.HTML定义网页结构，2.CSS控制网页样式，3.JavaScript添加动态行为。它们共同构建了现代网站的框架、美观和交互性。

HTML的未来：网络设计的发展和趋势Apr 17, 2025 am 12:12 AM

HTML的未来充满了无限可能。1)新功能和标准将包括更多的语义化标签和WebComponents的普及。2)网页设计趋势将继续向响应式和无障碍设计发展。3)性能优化将通过响应式图片加载和延迟加载技术提升用户体验。

HTML与CSS vs. JavaScript：比较概述Apr 16, 2025 am 12:04 AM

HTML、CSS和JavaScript在网页开发中的角色分别是：HTML负责内容结构，CSS负责样式，JavaScript负责动态行为。1.HTML通过标签定义网页结构和内容，确保语义化。2.CSS通过选择器和属性控制网页样式，使其美观易读。3.JavaScript通过脚本控制网页行为，实现动态和交互功能。

HTML：是编程语言还是其他？Apr 15, 2025 am 12:13 AM

HTMLISNOTAPROGRAMMENGUAGE; ITISAMARKUMARKUPLAGUAGE.1）htmlStructures andFormatSwebContentusingtags.2）itworkswithcsssforstylingandjavascript for Interactivity，增强WebevebDevelopment。

See all articles