文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么打印到utf-8文件失败？

问为什么打印到utf-8文件失败？
EN

Stack Overflow用户

提问于 2011-06-29 21:22:42

回答 3查看 3.2K关注 0票数 2

所以今天下午我遇到了一个问题，我能够解决它，但我不太明白为什么它会起作用。

这与我前几周遇到的一个问题有关：python check if utf-8 string is uppercase

基本上，以下内容将行不通：

#!/usr/bin/python

import codecs
from lxml import etree

outFile = codecs.open('test.xml', 'w', 'utf-8') #cannot use codecs.open()

root = etree.Element('root')
sect = etree.SubElement(root,'sect')


words = (   u'\u041c\u041e\u0421\u041a\u0412\u0410', # capital of Russia, all uppercase
            u'R\xc9SUM\xc9',    # RESUME with accents
            u'R\xe9sum\xe9',    # Resume with accents
            u'R\xe9SUM\xe9', )  # ReSUMe with accents

for word in words:
    print word
    if word.encode('utf8').decode('utf8').isupper(): #.isupper won't function on utf8 
        title = etree.SubElement(sect,'title')
        title.text = word
    else:
       item = etree.SubElement(sect,'item')
       item.text = word 

print>>outFile,etree.tostring(root,pretty_print=True,xml_declaration=True,encoding='utf-8')

它失败的原因如下：

跟踪(最近一次调用)：

文件“/tem.py”，第25行，在

打印>>outFile，etree.tostring(根，pretty_print=True，xml_declaration=True，编码=‘utf-8’)

文件"/usr/lib/python2.7/codecs.py"，

第691行，写

返回self.writer.write(数据) File "/usr/lib/python2.7/codecs.py"，

第351行，书写

data，data= self.encode(object，self.errors)

UnicodeDecodeError: ascii编解码器

无法解码位置66的字节0xd0：

序数不在(128) 范围内

但是，如果我在没有codecs.open('test.xml', 'w', 'utf-8')的情况下打开新文件，而是使用outFile = open('test.xml', 'w')，那么它就能很好地工作。

，发生什么事了？？

，因为encoding='utf-8'是在etree.tostring()中指定的，它是否再次编码该文件?如果离开codecs.open()并删除encoding='utf-8'，则该文件将成为ascii文件。为什么？因为etree.tostring()的默认编码是ascii i
，但是etree.tostring()只是简单地被写入stdout，然后重定向到作为utf-8文件？？

创建的文件。

- is `print>>` not workings as I expect? `outFile.write(etree.tostring())` behaves the same way.

基本上，为什么这不起作用？这是怎么回事。这可能是微不足道的，但我显然有点困惑，我想弄清楚为什么我的解决方案有效，

python

unicode

encoding

utf-8

lxml

回答 3

Stack Overflow用户

回答已采纳

发布于 2011-06-29 22:21:08

您已经用UTF-8编码打开了该文件，这意味着它需要Unicode字符串。

tostring是编码到UTF-8 (以字节字符串的形式，str)，您正在写入该文件。

因为文件需要Unicode，所以它使用默认的ASCII编码将字节字符串解码为Unicode，这样它就可以将Unicode编码到UTF-8。

不幸的是，字节字符串不是ASCII。

编辑:避免这种问题的最佳建议是在内部使用Unicode，对输入进行解码，对输出进行编码。

票数 3

Stack Overflow用户

发布于 2011-06-30 00:28:01

使用print>>outFile有点奇怪。我没有安装lxml，但是内置的xml.etree库类似(但不支持pretty_print)。将root元素包装在一个ElementTree中，并使用写方法。

此外，如果使用# coding行声明源文件的编码，则可以使用可读的Unicode字符串而不是转义代码：

#!/usr/bin/python
# coding: utf8

import codecs
from xml.etree import ElementTree as etree

root = etree.Element(u'root')
sect = etree.SubElement(root,u'sect')


words = [u'МОСКВА',u'RÉSUMÉ',u'Résumé',u'RéSUMé']

for word in words:
    print word
    if word.isupper():
        title = etree.SubElement(sect,u'title')
        title.text = word
    else:
       item = etree.SubElement(sect,u'item')
       item.text = word 

tree = etree.ElementTree(root)
tree.write('text.xml',xml_declaration=True,encoding='utf-8')

票数 1

Stack Overflow用户

发布于 2012-05-04 18:32:34

除了MRAB之外，还回答了一些代码行：

import codecs
from lxml import etree

root = etree.Element('root')
sect = etree.SubElement(root,'sect')

# do some other xml building here

with codecs.open('test.xml', 'w', encoding='utf-8') as f:
    f.write(etree.tostring(root, encoding=unicode))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/6527420

复制

相似问题

问为什么打印到utf-8文件失败？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么打印到utf-8文件失败？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么打印到utf-8文件失败？
EN