public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/19932] New: mbrtowc returns (size_t) -1 in C locale
@ 2016-04-09  8:15 eggert at gnu dot org
  2016-04-09 20:09 ` [Bug localedata/19932] " jim at meyering dot net
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: eggert at gnu dot org @ 2016-04-09  8:15 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=19932

            Bug ID: 19932
           Summary: mbrtowc returns (size_t) -1 in C locale
           Product: glibc
           Version: 2.22
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: eggert at gnu dot org
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 9173
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9173&action=edit
test mbrtowc in the C locale

This follows up on a bug reported by Björn Jacke against GNU grep 2.23; see
<http://bugs.gnu.org/23234>. The bug occurs because GNU grep uses mbrtowc to
detect encoding errors, and because glibc mbrtowc reports an encoding error in
the C locale when given a byte in the range 128-255 decimal.

It was always the intent of POSIX that all 256 bytes be valid characters in the
C locale, as that was the traditional behavior. This wasn't clearly stated in
the standard, but this is a bug that is planned to be fixed in a future version
of POSIX; see <http://austingroupbugs.net/view.php?id=663#c2738> (2015-07-02).
Glibc should be fixed to conform to this, i.e., mbrtowc should never return
(size_t) -1 in the C locale.

I plan to work around this bug in the gnulib mbrtowc module, which should fix
the grep bug; but this is a hack and will slow grep down a bit. The bug should
be fixed in glibc.

Please see the attached program for an illustration of the bug. The program
should output nothing and exit with status 0, but on glibc it outputs lines
like the following:

byte 0x80 (0200) encoding error
byte 0x81 (0201) encoding error
...
byte 0xff (0377) encoding error

and exits with status 1.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-28 20:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-09  8:15 [Bug localedata/19932] New: mbrtowc returns (size_t) -1 in C locale eggert at gnu dot org
2016-04-09 20:09 ` [Bug localedata/19932] " jim at meyering dot net
2016-04-09 20:10 ` eggert at gnu dot org
2016-04-09 20:10 ` bruno at clisp dot org
2016-04-11 11:48 ` fweimer at redhat dot com
2016-04-22  4:46 ` [Bug localedata/19932] C locale: mbrtowc returns (size_t) -1 vapier at gentoo dot org
2023-03-29  9:42 ` bruno at clisp dot org
2023-05-23 10:08 ` carenas at gmail dot com
2023-06-28 20:12 ` sam at gentoo dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).