python编码问题

python

win下的Dos乱码

utf-8保存的文件,在win中映射为gbk,输出文字就是乱码的,或者读取网页的时候在dos下输出,因为dos是用gbk编码,这样就容易导致出错

解决办法:

print "大家好".decode('utf-8').encode('GBK')

另外还有一种情况是一些软件(notepad)在保存utf-8会在文件开头插入不可见字符BOM(0xEF 0xBB 0xBF)

可以用codecs模块来处理

python">import codecs

content = open("test.txt",'r').read()

filehandle.close()

if content[:3] == codecs.BOM_UTF8:

content = content[3:]

print content.decode("utf-8")

ps:bom可以用来绕过一些文件内容的判断(xdcms 2015 代码审计第四题)

    private function check_content($name)

{

if(isset($_FILES[$name]["tmp_name"])) {

$content = file_get_contents($_FILES[$name]["tmp_name"]);

if(strpos($content, "<?") === 0) {

return false;

}

}

return true;

}

py头未设置字符集

s = "测试"

print s

  File "/Users/l3m0n/study/program/python/code_study/test3.py", line 1

SyntaxError: Non-ASCII character '\xe6' in file /Users/l3m0n/study/program/python/code_study/test3.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

python默认编码是ascii,所以print的时候会把中文当ascii处理导致出错

解决办法:

# coding=utf-8

或者

#!/usr/bin/python

# -*- coding: utf-8 -*-

字符连接出现错误

# coding=utf-8

s = "测试" + u"1下"

print s

Traceback (most recent call last):

File "/Users/l3m0n/study/program/python/code_study/test3.py", line 2, in <module>

s = "测试" + u"一下"

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

左边是中文字符串类型str,右边是unicode,这样str转换为unicode的时候会用系统默认ascii编码去解码,0-127,ascii能够处理,但是当str解出的大于128的时候,ascii就处理不来,于是抛出异常

两种方法解决:

1、str转换为unicode:

s = “测试".decode("gbk") + u"1下"

2、unicode进行utf-8编码

s = "测试" + u"1下”.decode("utf-8")

默认字符集出问题

Traceback (most recent call last):

File "/Users/l3m0n/study/program/python/code_study/mangzhu.py", line 14, in <module>

print sqli(1);

UnicodeEncodeError: 'ascii' codec can't encode characters in position 275-281: ordinal not in range(128)

解决:

import sys

reload(sys)

sys.setdefaultencoding('utf8')

以上是 python编码问题 的全部内容, 来源链接: utcz.com/z/388836.html

回到顶部