Python将ISO编码为UTF8(Python encoding ISO to UTF8)

我试图使用Python脚本(Python 2.5和PyPy)阅读我的电子邮件我的一些结果不是ASCII,我得到这样的字符串:

=?ISO-8859-7 2 B 4 0OXm7 / Dv8d / hIPP07 + 0gyuno4enx / u3h?=”

有没有办法解码它并转换为utf-8,以便我可以处理它? 我试过.decode('ISO-8859-7'),但我得到了相同的字符串

I am trying to read my emails using a Python script (Python 2.5 and PyPy) Some of my results are not in ASCII and i get strings like this:

=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?='

Is there any way to decode it and convert to utf-8 so that i can process it? I tried .decode('ISO-8859-7') but i got the same string

最满意答案

import email.header as eh unicode_data= u''.join( str_data.decode(codec or 'ascii') for str_data, codec in eh.decode_header('=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?=')) # unicode_data now is u'Πεζοπορία στον Κιθαιρώνα'

你应该在这里使用unicode_data。 但是,如果您(认为您)需要UTF-8编码的字符串,您可以:

utf8data= unicode_data.encode('utf-8')

更新:我更改了.decode调用以满足codec为None (例如eh.decode_header('plain text') )

import email.header as eh unicode_data= u''.join( str_data.decode(codec or 'ascii') for str_data, codec in eh.decode_header('=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?=')) # unicode_data now is u'Πεζοπορία στον Κιθαιρώνα'

You should work with unicode_data here. However, if you (think you) need UTF-8 encoded string, you can:

utf8data= unicode_data.encode('utf-8')

Update: I changed the .decode call to cater for cases where the codec is None (e.g. eh.decode_header('plain text'))

Python将ISO编码为UTF8(Python encoding ISO to UTF8)

我试图使用Python脚本(Python 2.5和PyPy)阅读我的电子邮件我的一些结果不是ASCII,我得到这样的字符串:

=?ISO-8859-7 2 B 4 0OXm7 / Dv8d / hIPP07 + 0gyuno4enx / u3h?=”

有没有办法解码它并转换为utf-8,以便我可以处理它? 我试过.decode('ISO-8859-7'),但我得到了相同的字符串

I am trying to read my emails using a Python script (Python 2.5 and PyPy) Some of my results are not in ASCII and i get strings like this:

=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?='

Is there any way to decode it and convert to utf-8 so that i can process it? I tried .decode('ISO-8859-7') but i got the same string

最满意答案

import email.header as eh unicode_data= u''.join( str_data.decode(codec or 'ascii') for str_data, codec in eh.decode_header('=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?=')) # unicode_data now is u'Πεζοπορία στον Κιθαιρώνα'

你应该在这里使用unicode_data。 但是,如果您(认为您)需要UTF-8编码的字符串,您可以:

utf8data= unicode_data.encode('utf-8')

更新:我更改了.decode调用以满足codec为None (例如eh.decode_header('plain text') )

import email.header as eh unicode_data= u''.join( str_data.decode(codec or 'ascii') for str_data, codec in eh.decode_header('=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?=')) # unicode_data now is u'Πεζοπορία στον Κιθαιρώνα'

You should work with unicode_data here. However, if you (think you) need UTF-8 encoded string, you can:

utf8data= unicode_data.encode('utf-8')

Update: I changed the .decode call to cater for cases where the codec is None (e.g. eh.decode_header('plain text'))