public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/10093] New: iconv accepts UTF-8-encoded UTF-16 surrogates
@ 2009-04-23 21:40 aurelien at aurel32 dot net
2009-04-24 19:01 ` [Bug libc/10093] " drepper at redhat dot com
0 siblings, 1 reply; 3+ messages in thread
From: aurelien at aurel32 dot net @ 2009-04-23 21:40 UTC (permalink / raw)
To: glibc-bugs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]
According to 'man utf-8':
| The UCS code values 0xd800–0xdfff (UTF-16 surrogates) as well as 0xfffe
| and 0xffff (UCS non-characters) should not appear in conforming UTF-8
| streams.
This is confirmed by RFC2279:
| The algorithm for encoding UCS-2 (or Unicode) to UTF-8 can be
| obtained from the above, in principle, by simply extending each
| UCS-2 character with two zero-valued octets. However, pairs of
| UCS-2 values between D800 and DFFF (surrogate pairs in Unicode
| parlance), being actually UCS-4 characters transformed through
| UTF-16, need special treatment: the UTF-16 transformation must be
| undone, yielding a UCS-4 character that is then transformed as
| above.
However the following code shows however that iconv accepts suchs invalid
characters:
$ s='\xed\xa0\x88\xed\xbd\x85' # 0xd808 + 0xdf45
$ for e in UTF-8 UTF-16 UTF-32 UCS-4 ; do printf "$e\t" ; printf $s | iconv -f
UTF-8 -t $e > /dev/null && printf 'OK\n' ; done
UTF-8 OK
UTF-16 iconv: illegal input sequence at position 0
UTF-32 iconv: illegal input sequence at position 0
UCS-4 OK
--
Summary: iconv accepts UTF-8-encoded UTF-16 surrogates
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: drepper at redhat dot com
ReportedBy: aurelien at aurel32 dot net
CC: glibc-bugs at sources dot redhat dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://sourceware.org/bugzilla/show_bug.cgi?id=10093
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug libc/10093] iconv accepts UTF-8-encoded UTF-16 surrogates
2009-04-23 21:40 [Bug libc/10093] New: iconv accepts UTF-8-encoded UTF-16 surrogates aurelien at aurel32 dot net
@ 2009-04-24 19:01 ` drepper at redhat dot com
0 siblings, 0 replies; 3+ messages in thread
From: drepper at redhat dot com @ 2009-04-24 19:01 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From drepper at redhat dot com 2009-04-24 19:01 -------
I'm changing this only because we already have tests for security reasons which
prevent non-canonical representations of code points.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=10093
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug libc/10093] iconv accepts UTF-8-encoded UTF-16 surrogates
[not found] <bug-10093-131@http.sourceware.org/bugzilla/>
@ 2014-07-01 20:38 ` fweimer at redhat dot com
0 siblings, 0 replies; 3+ messages in thread
From: fweimer at redhat dot com @ 2014-07-01 20:38 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=10093
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-07-01 20:38 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-23 21:40 [Bug libc/10093] New: iconv accepts UTF-8-encoded UTF-16 surrogates aurelien at aurel32 dot net
2009-04-24 19:01 ` [Bug libc/10093] " drepper at redhat dot com
[not found] <bug-10093-131@http.sourceware.org/bugzilla/>
2014-07-01 20:38 ` fweimer at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).