. - Characters are decomposed by compatibility212B“(”U+212B“)和"U+00C5”(瑞典字母“U+212B”)都被NFD (或NFKD)扩展为序列"U+0041 U+030A“(拉丁语字母"A”,并将“°”以上的环组合),然后由NFC (或NFKC>>> from unicodedata import normalize
>>&
我有两个unicode字符,两个都有相同的含义。compat字符是对origin字符的引用,这意味着两者应该是相同的值,但是当我试图将它们的相等性断言为一个条件时,它会返回False。compat = 'ㅐ' # korean letter for: AE
print('compat', ascii(compat), '\n')
decompose_origin = un
尝试在iso-8859-1中对假定的utf-8字符串进行编码时,python脚本失败UnicodeEncodeError: 'latin-1' codec can't encode character '\u0300' in position 1: ordinal not in range(256)>>> 'à'.encode('utf-8'