public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/10093] New: iconv accepts UTF-8-encoded UTF-16 surrogates
@ 2009-04-23 21:40 aurelien at aurel32 dot net
  2009-04-24 19:01 ` [Bug libc/10093] " drepper at redhat dot com
  0 siblings, 1 reply; 3+ messages in thread
From: aurelien at aurel32 dot net @ 2009-04-23 21:40 UTC (permalink / raw)
  To: glibc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]

According to 'man utf-8':

| The UCS code values 0xd800–0xdfff (UTF-16 surrogates) as well as 0xfffe
| and 0xffff (UCS non-characters) should not appear in  conforming  UTF-8
| streams.

This is confirmed by RFC2279:
| The algorithm for encoding UCS-2 (or Unicode) to UTF-8 can be
| obtained from the above, in principle, by simply extending each
| UCS-2 character with two zero-valued octets.  However, pairs of
| UCS-2 values between D800 and DFFF (surrogate pairs in Unicode
| parlance), being actually UCS-4 characters transformed through
| UTF-16, need special treatment: the UTF-16 transformation must be
| undone, yielding a UCS-4 character that is then transformed as
| above.

However the following code shows however that iconv accepts suchs invalid 
characters:

$ s='\xed\xa0\x88\xed\xbd\x85' # 0xd808 + 0xdf45 
$ for e in UTF-8 UTF-16 UTF-32 UCS-4 ; do printf "$e\t" ; printf $s | iconv -f 
UTF-8 -t $e > /dev/null && printf 'OK\n' ; done
UTF-8   OK
UTF-16  iconv: illegal input sequence at position 0
UTF-32  iconv: illegal input sequence at position 0
UCS-4   OK

-- 
           Summary: iconv accepts UTF-8-encoded UTF-16 surrogates
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: aurelien at aurel32 dot net
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://sourceware.org/bugzilla/show_bug.cgi?id=10093

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libc/10093] iconv accepts UTF-8-encoded UTF-16 surrogates
  2009-04-23 21:40 [Bug libc/10093] New: iconv accepts UTF-8-encoded UTF-16 surrogates aurelien at aurel32 dot net
@ 2009-04-24 19:01 ` drepper at redhat dot com
  0 siblings, 0 replies; 3+ messages in thread
From: drepper at redhat dot com @ 2009-04-24 19:01 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2009-04-24 19:01 -------
I'm changing this only because we already have tests for security reasons which
prevent non-canonical representations of code points.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=10093

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libc/10093] iconv accepts UTF-8-encoded UTF-16 surrogates
       [not found] <bug-10093-131@http.sourceware.org/bugzilla/>
@ 2014-07-01 20:38 ` fweimer at redhat dot com
  0 siblings, 0 replies; 3+ messages in thread
From: fweimer at redhat dot com @ 2014-07-01 20:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=10093

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-07-01 20:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-23 21:40 [Bug libc/10093] New: iconv accepts UTF-8-encoded UTF-16 surrogates aurelien at aurel32 dot net
2009-04-24 19:01 ` [Bug libc/10093] " drepper at redhat dot com
     [not found] <bug-10093-131@http.sourceware.org/bugzilla/>
2014-07-01 20:38 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).