История изменений
Исправление Xenius, (текущая версия) :
Пожалуйста:
4.4.4 general purpose bit flag: (2 bytes) Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D)
И собственно Appendix D:
D.1 The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437. This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages. To address this limitation, this specification will support the following change.
D.2 If general purpose bit 11 is unset, the file name and comment should conform to the original ZIP character encoding. If general purpose bit 11 is set, the filename and comment must support The Unicode Standard, Version 4.1.0 or greater using the character encoding form defined by the UTF-8 storage specification. The Unicode Standard is published by the The Unicode Consortium (http://www.unicode.org). UTF-8 encoded data stored within ZIP files is expected to not include a byte order mark (BOM).
То есть в корректных zip-файлах кодировка должна быть CP437 или UTF-8 в зависимости от флага в бите 11. А хранение имён файлов в cp1251 (хотя я подозреваю что там всё же cp866) некорректно. Причём поддержка CP437 оставлена только для совместимости с древним софтом.
Исправление Xenius, :
Пожалуйста:
4.4.4 general purpose bit flag: (2 bytes) Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D)
И собственно Appendix D:
D.1 The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437. This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages. To address this limitation, this specification will support the following change.
D.2 If general purpose bit 11 is unset, the file name and comment should conform to the original ZIP character encoding. If general purpose bit 11 is set, the filename and comment must support The Unicode Standard, Version 4.1.0 or greater using the character encoding form defined by the UTF-8 storage specification. The Unicode Standard is published by the The Unicode Consortium (http://www.unicode.org). UTF-8 encoded data stored within ZIP files is expected to not include a byte order mark (BOM).
То есть в корректных zip-файлах кодировка должна быть CP437 или UTF-8 в зависимости от флага в бите 11. А хранение имён файлов в cp1251 (хотя я подозреваю что там всё же cp866) некорректно. Причём поддержка CP437 поддерживается только для совместимости с древним софтом.
Исправление Xenius, :
Пожалуйста:
4.4.4 general purpose bit flag: (2 bytes) Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D)
И собственно Appendix D:
D.1 The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437. This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages. To address this limitation, this specification will support the following change.
D.2 If general purpose bit 11 is unset, the file name and comment should conform to the original ZIP character encoding. If general purpose bit 11 is set, the filename and comment must support The Unicode Standard, Version 4.1.0 or greater using the character encoding form defined by the UTF-8 storage specification. The Unicode Standard is published by the The Unicode Consortium (http://www.unicode.org). UTF-8 encoded data stored within ZIP files is expected to not include a byte order mark (BOM).
То есть в корректных zip-файлах кодировка должна быть CP437 или UTF-8 в зависимости от флага в бите 11. А хранение имён файлов в cp1251 (хотя я подозреваю что там всё же cp866) некорректно.
Исходная версия Xenius, :
Пожалуйста: 4.4.4 general purpose bit flag: (2 bytes) Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D)
И собственно Appendix D:
D.1 The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437. This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages. To address this limitation, this specification will support the following change.
D.2 If general purpose bit 11 is unset, the file name and comment should conform to the original ZIP character encoding. If general purpose bit 11 is set, the filename and comment must support The Unicode Standard, Version 4.1.0 or greater using the character encoding form defined by the UTF-8 storage specification. The Unicode Standard is published by the The Unicode Consortium (http://www.unicode.org). UTF-8 encoded data stored within ZIP files is expected to not include a byte order mark (BOM).
То есть в корректных zip-файлах кодировка должна быть CP437 или UTF-8 в зависимости от флага в бите 11. А хранение имён файлов в cp1251 (хотя я подозреваю что там всё же cp866) некорректно.