История изменений
Исправление
gh0stwizard,
(текущая версия)
:
Ну что значит «нет поддержки Юникода»?
Ну, то и значит. UCS-2 != Unicode. Цитирую:
As of Unicode 8.0 there are 120,520 graphic characters.
https://en.wikipedia.org/wiki/Unicode
Что было до 2000 года (если что, сейчас 2016 год):
The UCS has over 1.1 million code points available for use, but only the first 65,536 (the Basic Multilingual Plane, or BMP) had entered into common use before 2000.
https://en.wikipedia.org/wiki/Universal_Coded_Character_Set
По этим двум причинам и придумали char16_t и char32_t.
Вот, что написано в unicode/umachine.h (libicu):
/* UChar and UChar32 definitions -------------------------------------------- */
/** Number of bytes in a UChar. @stable ICU 2.0 */
#define U_SIZEOF_UCHAR 2
/**
* \var UChar
* Define UChar to be UCHAR_TYPE, if that is #defined (for example, to char16_t),
* or wchar_t if that is 16 bits wide; always assumed to be unsigned.
* If neither is available, then define UChar to be uint16_t.
*
* This makes the definition of UChar platform-dependent
* but allows direct string type compatibility with platforms with
* 16-bit wchar_t types.
*
* @stable ICU 4.4
*/
#if defined(UCHAR_TYPE)
typedef UCHAR_TYPE UChar;
/* Not #elif U_HAVE_CHAR16_T -- because that is type-incompatible with pre-C++11 callers
typedef char16_t UChar; */
#elif U_SIZEOF_WCHAR_T==2
typedef wchar_t UChar;
#elif defined(__CHAR16_TYPE__)
typedef __CHAR16_TYPE__ UChar;
#else
typedef uint16_t UChar;
#endif
/**
* Define UChar32 as a type for single Unicode code points.
* UChar32 is a signed 32-bit integer (same as int32_t).
*
* The Unicode code point range is 0..0x10ffff.
* All other values (negative or >=0x110000) are illegal as Unicode code points.
* They may be used as sentinel values to indicate "done", "error"
* or similar non-code point conditions.
*
* Before ICU 2.4 (Jitterbug 2146), UChar32 was defined
* to be wchar_t if that is 32 bits wide (wchar_t may be signed or unsigned)
* or else to be uint32_t.
* That is, the definition of UChar32 was platform-dependent.
*
* @see U_SENTINEL
* @stable ICU 2.4
*/
typedef int32_t UChar32;
/usr/include/uchar.h:7:typedef unsigned short char16_t;
/usr/include/uchar.h:8:typedef unsigned char32_t;
Может чего не догоняю, но использовать char16_t, char32_t также противопоказано. По-хорошему, надо писать свою либу или использовать готовую. Одними только типами ничего не добиться.
Исходная версия
gh0stwizard,
:
Ну что значит «нет поддержки Юникода»?
Ну, то и значит. UCS-2 != Unicode. Цитирую:
As of Unicode 8.0 there are 120,520 graphic characters.
https://en.wikipedia.org/wiki/Unicode Что было до 2000 года (если что, сейчас 2016 год):
The UCS has over 1.1 million code points available for use, but only the first 65,536 (the Basic Multilingual Plane, or BMP) had entered into common use before 2000.
https://en.wikipedia.org/wiki/Universal_Coded_Character_Set
По этим двум причинам и придумали char16_t и char32_t.
Вот, что написано в unicode/umachine.h (libicu):
/* UChar and UChar32 definitions -------------------------------------------- */
/** Number of bytes in a UChar. @stable ICU 2.0 */
#define U_SIZEOF_UCHAR 2
/**
* \var UChar
* Define UChar to be UCHAR_TYPE, if that is #defined (for example, to char16_t),
* or wchar_t if that is 16 bits wide; always assumed to be unsigned.
* If neither is available, then define UChar to be uint16_t.
*
* This makes the definition of UChar platform-dependent
* but allows direct string type compatibility with platforms with
* 16-bit wchar_t types.
*
* @stable ICU 4.4
*/
#if defined(UCHAR_TYPE)
typedef UCHAR_TYPE UChar;
/* Not #elif U_HAVE_CHAR16_T -- because that is type-incompatible with pre-C++11 callers
typedef char16_t UChar; */
#elif U_SIZEOF_WCHAR_T==2
typedef wchar_t UChar;
#elif defined(__CHAR16_TYPE__)
typedef __CHAR16_TYPE__ UChar;
#else
typedef uint16_t UChar;
#endif
/**
* Define UChar32 as a type for single Unicode code points.
* UChar32 is a signed 32-bit integer (same as int32_t).
*
* The Unicode code point range is 0..0x10ffff.
* All other values (negative or >=0x110000) are illegal as Unicode code points.
* They may be used as sentinel values to indicate "done", "error"
* or similar non-code point conditions.
*
* Before ICU 2.4 (Jitterbug 2146), UChar32 was defined
* to be wchar_t if that is 32 bits wide (wchar_t may be signed or unsigned)
* or else to be uint32_t.
* That is, the definition of UChar32 was platform-dependent.
*
* @see U_SENTINEL
* @stable ICU 2.4
*/
typedef int32_t UChar32;
/usr/include/uchar.h:7:typedef unsigned short char16_t;
/usr/include/uchar.h:8:typedef unsigned char32_t;
Может чего не догоняю, но использовать char16_t, char32_t также противопоказано. По-хорошему, надо писать свою либу или использовать готовую. Одними только типами ничего не добиться.