search
HomeBackend DevelopmentPython TutorialPython-solve the problem of UnicodeDecodeError when querying Cx_Oracle

In a recent project, I had to query a table with more than 1 million records, and then perform some data statistics. However, during this process, I found that a UnicodeDecodeError occurred after only a few pieces of data were queried.
Here, we use the sqlalchemy library for query, and Cx_Oracle is used internally to perform corresponding operations. The Python version used is 3.5.0, the host system is Windows 2008 Server, and then operations similar to the following are performed:

from sqlalchemy import create_engine

engine = create_engine('oracle://demo:123456@192.168.1.202/TEST')
conn = engine.connect()
sql = 'select t.type from TS t'
result = conn.execute(sql)
for row in result:
    print(row['type'])

Here, we first create a connection to the database, and then perform the corresponding query operation. Unfortunately, a UnicodeDecodeError occurred after querying less than 10 records.
Originally thought it was a server encoding problem in the database, so the encoding parameter was added to the create_engine function and changed to:

engine = create_engine('oracle://demo:123456@192.168.1.202/TEST',encoding="UTF-8")

The other available method is to directly specify the encoding in the connection path, similar to As follows:

engine = create_engine('oracle://demo:123456@192.168.1.202/TEST?charset=utf-8')

But the problem is still not solved. I searched the Internet but couldn't find a suitable solution. I suddenly remembered that when using Mysql database (I personally like Postgresql more), when garbled characters appeared, we often performed the following operations:

set names gbk;

We set the client's encoding in this way, rather than the server's encoding, to solve the problem of garbled characters on the terminal (since the default database of Postgresql is UTF-8, the possibility of garbled characters is low). In addition, when installing the Oracle client under Linux, an environment variable of NLS_LANG is set. For details, please refer to the article Installing Oracle Instant Client in Ubuntu 14.04. Of course, this article has some details. No introduction.
Generally, we make the following settings in cmd:

setenv NLS_LANG=SIMPLIFIED CHINESE_CHINA.ZHS16GBK

We specify that the language used by Oracle messages is Simplified Chinese, and the client's character set is GBK.
In addition, we can also execute the following statement to ensure that the above operation is correct:

SELECT * FROM v$nls_parameters;

Since the above database server is deployed on Windows, the result is naturally GBK, so if our customer If the client uses the UTF8 character set for decoding, decoding errors will naturally occur.
What we need to note is that only when the encoding of the database server and client are consistent, we can display non-ASCII encoding normally, and sqlalchemy will force the query string to be converted to Unicode by default. Therefore, it is similar to the following process in Python3:

>>> a='中国'.encode('gbk')
>>> a
b'\xd6\xd0\xb9\xfa'

. In sqlalchemy, due to forced encoding conversion, it is similar to the following process:

>>> a.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 0: invalid continuation byte</module></stdin>

Therefore, the above problem occurs. Under normal circumstances, the encoding should be specified as GBK:

>>> a.decode('gbk')
'中国'

, and setting NLS_LANG is equivalent to modifying the above encoding to GBK.

For more Python-solve the problem of UnicodeDecodeError when querying Cx_Oracle, please pay attention to the PHP Chinese website for related articles!


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
什么是oracle asm什么是oracle asmApr 18, 2022 pm 04:16 PM

oracle asm指的是“自动存储管理”,是一种卷管理器,可自动管理磁盘组并提供有效的数据冗余功能;它是做为单独的Oracle实例实施和部署。asm的优势:1、配置简单、可最大化推动数据库合并的存储资源利用;2、支持BIGFILE文件等。

oracle怎么查询所有索引oracle怎么查询所有索引May 13, 2022 pm 05:23 PM

方法:1、利用“select*from user_indexes where table_name=表名”语句查询表中索引;2、利用“select*from all_indexes where table_name=表名”语句查询所有索引。

oracle全角怎么转半角oracle全角怎么转半角May 13, 2022 pm 03:21 PM

在oracle中,可以利用“TO_SINGLE_BYTE(String)”将全角转换为半角;“TO_SINGLE_BYTE”函数可以将参数中所有多字节字符都替换为等价的单字节字符,只有当数据库字符集同时包含多字节和单字节字符的时候有效。

Oracle怎么查询端口号Oracle怎么查询端口号May 13, 2022 am 10:10 AM

在Oracle中,可利用lsnrctl命令查询端口号,该命令是Oracle的监听命令;在启动、关闭或重启oracle监听器之前可使用该命令检查oracle监听器的状态,语法为“lsnrctl status”,结果PORT后的内容就是端口号。

oracle怎么删除sequenceoracle怎么删除sequenceMay 13, 2022 pm 03:35 PM

在oracle中,可以利用“drop sequence sequence名”来删除sequence;sequence是自动增加数字序列的意思,也就是序列号,序列号自动增加不能重置,因此需要利用drop sequence语句来删除序列。

oracle查询怎么不区分大小写oracle查询怎么不区分大小写May 10, 2022 pm 05:45 PM

方法:1、利用“LOWER(字段值)”将字段转为小写,或者利用“UPPER(字段值)”将字段转为大写;2、利用“REGEXP_LIKE(字符串,正则表达式,'i')”,当参数设置为“i”时,说明进行匹配不区分大小写。

oracle怎么查询数据类型oracle怎么查询数据类型May 13, 2022 pm 04:19 PM

在oracle中,可以利用“select ... From all_tab_columns where table_name=upper('表名') AND owner=upper('数据库登录用户名');”语句查询数据库表的数据类型。

Oracle怎么修改sessionOracle怎么修改sessionMay 13, 2022 pm 05:06 PM

方法:1、利用“alter system set sessions=修改后的数值 scope=spfile”语句修改session参数;2、修改参数之后利用“shutdown immediate – startup”语句重启服务器即可生效。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.