Home >Backend Development >Python Tutorial >Detailed explanation of memory leaks and Chinese garbled characters in the mysql module in python
When connecting to mysql-python, by default everyone will write
con=MySQLdb.connect(user='xxx',passwd='xxx',host='xxx',port=6600,charset='gbk')
Once "gbk" is specified, mysql-python will set use_unicode=True by default. The result is that mysql-python will use python's own codec module to do character decoding, but in practice it is found that the mysql library gbk encoding character set is larger than python's gbk encoding set. Some characters that can be stored in MySQL will throw errors when parsed using Python's codec. A more serious problem is that before mysql-python1.2.3, use_unicode=True caused mysql-python to decode this memory leak bug. All decoded database strings come out as unicode objects through mysql-python. To output them to a file, they need to be encoded again.
The solution is to force use_unicode=False. That is:
con=MySQLdb.connect(user='xxx',passwd='xxx',host='xxx',port=6600,charset='gbk',use_unicode=False)
This way there will be no memory leaks and no need to encode when outputting the file. It also avoids the problem that python's codec cannot parse the strings stored in mysql gbk. Finally, for mysql4, we can leave the charset parameter blank:
con=MySQLdb.connect(user='xxx',passwd='xxx',host='xxx',port=6600,use_unicode=False)
This perfectly solves this problem, haha