Home > Article > Backend Development > Analysis of Python's new string format vulnerabilities and solutions
Recently a python string formatting vulnerability caught my attention. Today I will talk about the security vulnerability of a new syntax for formatting strings introduced by Python. I conducted an in-depth analysis and provided corresponding security measures. solution.
When we use str.format for untrusted user input, it will bring security risks - I have actually known about this problem for a long time, but I didn't really realize it until today severity. Because attackers can use it to bypass the Jinja2 sandbox, this will cause serious information leakage problems. In the meantime, I provide a new safe version of str.format at the end of this article.
It should be reminded that this is a quite serious security risk. The reason why I write an article here is because most people probably don’t know how easy it is to be exploited.
Starting from Python 2.6, Python has introduced a new syntax for formatting strings inspired by .NET. Of course, in addition to Python, Rust and some other programming languages also support this syntax. With the help of the .format() method, this syntax can be applied to both byte and unicode strings (in Python 3, only unicode strings), and it can also be mapped to more customizable strings. Formatter API.
A feature of this syntax is that it allows one to determine the positional and keyword parameters of the string format and to explicitly reorder the data items at any time. Furthermore, it can even access the object's properties and data items - which is the root cause of the security issue here.
Overall, one can use this to do the following things:
>>> 'class of {0} is {0.__class__}'.format(42) "class of 42 is "
Essentially, anyone with control over the format string has the potential to access various internal properties of the object.
The first question is, how to control the format string. You can start from the following places:
1. Untrusted translator in string file. We're likely to get away with them, because many applications translated into multiple languages use this new Python string formatting method, but not everyone will perform a thorough review of all strings entered.
2. User exposed configuration. Because some system users can configure certain behaviors, these configurations may be exposed in the form of format strings. As a special note, I have seen some users configure notification emails, log message formats, or other basic templates through the web application.
If you just pass the C interpreter object to the format string, there will not be much danger, because in this case, you will expose some integer classes at most. s things.
However, once a Python object is passed to this format string, it becomes troublesome. This is because the amount of stuff that can be exposed from Python functions is pretty staggering. Here is the scenario of a hypothetical web application that could leak the key:
CONFIG = { 'SECRET_KEY': 'super secret key' } class Event(object): def __init__(self, id, level, message): self.id = id self.level = level self.message = message def format_event(format_string, event): return format_string.format(event=event)
If the user can inject the format_string here, then they can discover the secret characters like this String:
{event.__init__.__globals__[CONFIG][SECRET_KEY]}
So, what should you do if you need to let others provide the formatting string? In fact, some undocumented internal mechanisms can be used to change the string formatting behavior.
from string import Formatter from collections import Mapping class MagicFormatMapping(Mapping): """This class implements a dummy wrapper to fix a bug in the Python standard library for string formatting. See http://bugs.python.org/issue13598 for information about why this is necessary. """ def __init__(self, args, kwargs): self._args = args self._kwargs = kwargs self._last_index = 0 def __getitem__(self, key): if key == '': idx = self._last_index self._last_index += 1 try: return self._args[idx] except LookupError: pass key = str(idx) return self._kwargs[key] def __iter__(self): return iter(self._kwargs) def __len__(self): return len(self._kwargs) # This is a necessary API but it's undocumented and moved around # between Python releases try: from _string import formatter_field_name_split except ImportError: formatter_field_name_split = lambda \ x: x._formatter_field_name_split() {C} class SafeFormatter(Formatter): def get_field(self, field_name, args, kwargs): first, rest = formatter_field_name_split(field_name) obj = self.get_value(first, args, kwargs) for is_attr, i in rest: if is_attr: obj = safe_getattr(obj, i) else: obj = obj[i] return obj, first def safe_getattr(obj, attr): # Expand the logic here. For instance on 2.x you will also need # to disallow func_globals, on 3.x you will also need to hide # things like cr_frame and others. So ideally have a list of # objects that are entirely unsafe to access. if attr[:1] == '_': raise AttributeError(attr) return getattr(obj, attr) def safe_format(_string, *args, **kwargs): formatter = SafeFormatter() kwargs = MagicFormatMapping(args, kwargs) return formatter.vformat(_string, args, kwargs)
Now, we can use the safe_format method to replace str.format:
>>> '{0.__class__}'.format(42) "" >>> safe_format('{0.__class__}', 42) Traceback (most recent call last): File "", line 1, in AttributeError: __class__
There is such a saying in program development: Do not trust the user at any time input of! Now it seems that this sentence makes perfect sense. So students, please keep this in mind!
The above is the detailed content of Analysis of Python's new string format vulnerabilities and solutions. For more information, please follow other related articles on the PHP Chinese website!