Home >Backend Development >Python Tutorial >How to Selectively Remove Non-ASCII Characters Preserving Spaces and Periods?
Selective Removal of Non-ASCII Characters
Working with textual data often involves the need to remove non-ASCII characters, while preserving certain symbols like spaces and periods. While basic filtering methods may remove all non-ASCII characters, this might not be desirable in some cases.
Let's consider the following code:
<code class="python">def onlyascii(char): if ord(char) < 48 or ord(char) > 127: return '' else: return char</code>
This code removes all characters with ASCII values less than 48 or greater than 127, effectively stripping the text of non-ASCII characters. However, it also removes spaces (ASCII 32) and periods (ASCII 46).
To selectively remove non-ASCII characters while preserving spaces and periods, we can leverage Python's string.printable module:
<code class="python">import string printable = set(string.printable) filtered_data = filter(lambda x: x in printable, data)</code>
The string.printable set contains all printable characters on the system, including digits, letters, symbols, spaces, and periods. Using this set as a filter, we can remove all non-printable characters from the string.
For example, if we have the string "somex00string. withx15 funny characters":
<code class="python">s = "some\x00string. with\x15 funny characters" ''.join(filter(lambda x: x in printable, s))</code>
The result will be:
'somestring. with funny characters'
This method effectively removes non-ASCII characters while preserving spaces and periods, providing a clean string for further processing.
The above is the detailed content of How to Selectively Remove Non-ASCII Characters Preserving Spaces and Periods?. For more information, please follow other related articles on the PHP Chinese website!