

There is - of course - a third option, for gedit to "really" detect the encodings automatically by using other library (libuchardet being one). The 2nd option can be automatically done by the system as a workaround so an average Chinese user doesn't have to know all about these encoding stuffs. Add GB18030 to the "auto-detected" before ISO-8859-15.Add GB18030 explicitly in the Open Files dialog window.And to correctly edit Chinese text, a user has two options: Thus by default, locally encoded Chinese text will be converted to UTF-8 as ISO-8859-15. CURRENT is the charset of the current locale, so most likely UTF-8.
GEDIT ENCODING NOT SUPPORTED TRIAL
The trial conversions will be based on the order above. /usr/share/glib-2.0/schemas/.xmlĪnd the list is under key named "auto-detected", and defaults to:.While "Automatically Detected" isn't purely charset auto detection, it's based on a list of candidate charset encodings, and based on whether the candidate charset can be successfully converted to utf-8, which is used internally. Charset defaults to be "Automatically Detected".Support to explicitly select charset in "Open File" dialog window.Gedit actually supports charset conversion very well in several ways: In the text below, "locally encoded" Chinese text means text encoded in any of GB2312, GBK or GB18030. So we only need to focus on GB18030 _only_. GB18030 has such a large character set, that it has to use 4 bytes in some cases, but still remains as backward compatible with GBK and GB2312. And most recently GB18030, which is the superset of all.
GEDIT ENCODING NOT SUPPORTED WINDOWS
GB2312 was the standard since 1980s', and later superceded by GBK, which has a larger number of characters, Windows calls it CP936, which is equivalent to GBK. Note this doesn't happen with Chinese only, but other non-English languages as well, however, I did only review those Chinese specific ones.Ī bit background on simplified Chinese charset encodings I spent a few days trying to get a summary of the related problems. Moreover and needless to say, Windows plays a very bad part in this, as it's still using largely non-unicode encodings in their local versions. As we do have a strong preference of UTF-8 over other local encodings, this makes it ambigous when doing encoding conversions. The major root cause is, as most of us know, the overlapping of non-unicode charset encodings. There are a bunch of charset/encoding issues related to Chinese, which mostly resulted in Chinese text being displayed as garbage. A Summary of charset/encoding problems for Chinese
