character encoding - Terminology and concepts surrounding the use of code pages -


I am in the process of researching code pages and among various Wikipedia entries too, there have been many conflicting uses of terminology . I just can not find a source of information that reflects the process of handling the entire character from the beginning to the end. Can there be some well-versed in this area that the following information is wrong or incorrect:

The process of character representation as far as I understand:

  • We start with a set of symbols (not sure about correct terminology, possibly 'script') which are not connected to a particular platform. For example, 'Cyrillic alphabet' is understood to refer to a single entity in the context of Windows in Linux.

  • Members of these sets are selected, to create a platform specific character set, the Vendors platforms these different codes, such as GDI values ​​on Windows (such as 0 for ANSI_ACASTT and others Code here). I can not find more information on these sets, as if they are actually characters set or if they are just without any precarious and abstract

  • From these sets, individual code pages Has been developed which is one mapping from one to the GDI values. Since these GDI values ​​represent the set, which depend on the platform, does it mean that the Windows code page is basically a coded version of each individual set?

I am having trouble mixing this idea with the link shown in front of me (which I have lost), which in these different GDI characters and code Show many mapping between pages. Is it correct, is it used to set these GDI values ​​to different code pages of different platforms?

  • Each code page maps a member of an abstract character set to an integer set to represent its position in the set, in the case of 'simple' code pages mentioned above on the webpage above. , They can be referred to the use of the more accurate 'letter map' word. Is this word worth considering or is it very subtle and unimportant?

  • A font resolves a glyph to a code point, if it contains one for that code point, otherwise it fails to report failure. I have also read that any font can return its blank glyph to those code points which it does not support. Can an application differentiate between this blank glyph and a successful resolution, i.e. does the font code type change with this empty glyph?

I believe that this is the limit of my delusion. Any explanation in this regard will be invaluable. thank you in advanced.

You are essentially right:

  • Get started with known Number of characters.
  • Select a subset of this character (a character set)
  • Map these bit patterns (code pages and encoding)
  • Render one character Glyph (i.e. the font, a bit pattern, and a codepap / encoding that is used by the map bit pattern for the character) is for an output device.

Across the platform, there are similar code pages and even many code pages have mapping similar to the character. For example, Windows Latin, Mac Roman and Unicode share characters for the first 127 values.

Generally for new development, you should use a Unicode codepage with a popular encoding. UTF 8 is popular in most modern systems, UTF 16LE to W. Used for Windows system calls ending in


Comments

Popular posts from this blog

Eclipse CDT variable colors in editor -

AJAX doesn't send POST query -

wpf - Custom Message Box Advice -