search

Home  >  Q&A  >  body text

python在linux下运行的编码问题

程序是去本地execel表格中提取数据。问题是,当采用linux默认编码(LANG=en_US)的时候,在读取excel文件中表的名字的时候会报错(表的名字中有数字和汉字)。

最前面已经写# -- coding:utf-8 --

def get_standard_template_infos():

excel_files = get_excel_files(config.get_standard_template_files())#list存放excel文件路径
standard_template_infos = {}
for file in excel_files:
    wb = xlrd.open_workbook(file)
    sheet_names = wb.sheet_names()
    for sheet_name in sheet_names:
        standard_template_id = get_standard_template_id(sheet_name)#调用下面函数

def get_standard_template_id(sheet_name):

pattern = u'^(\d{5})'
match = re.match(pattern, sheet_name)
if match is not None:
    code = sheet_name[0:5]
    return code
else:
    print sheet_name#这里报错
return None

报错,报的错误为 :
unicodeEncodeError:"latin-1" codec can't encode characters in position 4-6:ordinal not in range(256)

可当把linux控制台编码方式改为LANG=zh_CN.UTF-8之后,在通过os.walk获取excel文件的时候就会报错(目录为英文,excel名为汉字,也带-)。

代码:

def get_excel_files(dir)

files = []
if not os.path.exists(dir):
    return files
for item in os.walk(dir):
    file_names = item[2]
    if file_names is None or len(file_names) == 0:
        continue

    dir_path = item[0]
    for file_name in file_names:
        if file_name[0] == '.' or file_name[0] == '~':
            continue
        if file_name[-5:] == '.xlsx' or file_name[-4:] == '.xls':
            files.append(os.path.join(dir_path, file_name))

return files

报错:

"/home/users/zhangzhida/o_platform/import_to_hdp/check_data/share_function.py", line 27, in get_excel_files

for item in os.walk(dir):

File "/home/users/zhangzhida/.jumbo/lib/python2.7/os.py", line 284, in walk

if isdir(join(top, name)):

File "/home/users/zhangzhida/.jumbo/lib/python2.7/posixpath.py", line 71, in join

path += '/' + b

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 3: invalid start byte

windows下运行的时候,正常,不报错。求教,为什么?是因为我的excel文件名的编码格式的问题么?该如何解决

注:linux中python版本为2.7.3
widows下为2.7.13

PHP中文网PHP中文网2779 days ago731

reply all(3)I'll reply

  • 天蓬老师

    天蓬老师2017-04-18 10:07:49

    Change to print sheet_name.encode('utf-8') and try it.

    reply
    0
  • ringa_lee

    ringa_lee2017-04-18 10:07:49

    Try changing LANG to GBK

    reply
    0
  • 高洛峰

    高洛峰2017-04-18 10:07:49

    I found the reason. The encoding format of the file name is wrong. Thank you everyone, hehe

    reply
    0
  • Cancelreply