From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25491 invoked by alias); 9 Apr 2016 08:15:47 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 24456 invoked by uid 48); 9 Apr 2016 08:14:48 -0000 From: "eggert at gnu dot org" To: libc-locales@sourceware.org Subject: [Bug localedata/19932] New: mbrtowc returns (size_t) -1 in C locale Date: Sat, 09 Apr 2016 08:15:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.22 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: eggert at gnu dot org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2016-q2/txt/msg00016.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D19932 Bug ID: 19932 Summary: mbrtowc returns (size_t) -1 in C locale Product: glibc Version: 2.22 Status: NEW Severity: normal Priority: P2 Component: localedata Assignee: unassigned at sourceware dot org Reporter: eggert at gnu dot org CC: libc-locales at sourceware dot org Target Milestone: --- Created attachment 9173 --> https://sourceware.org/bugzilla/attachment.cgi?id=3D9173&action=3Dedit test mbrtowc in the C locale This follows up on a bug reported by Bj=C3=B6rn Jacke against GNU grep 2.23= ; see . The bug occurs because GNU grep uses mbrtowc to detect encoding errors, and because glibc mbrtowc reports an encoding error= in the C locale when given a byte in the range 128-255 decimal. It was always the intent of POSIX that all 256 bytes be valid characters in= the C locale, as that was the traditional behavior. This wasn't clearly stated = in the standard, but this is a bug that is planned to be fixed in a future ver= sion of POSIX; see (2015-07= -02). Glibc should be fixed to conform to this, i.e., mbrtowc should never return (size_t) -1 in the C locale. I plan to work around this bug in the gnulib mbrtowc module, which should f= ix the grep bug; but this is a hack and will slow grep down a bit. The bug sho= uld be fixed in glibc. Please see the attached program for an illustration of the bug. The program should output nothing and exit with status 0, but on glibc it outputs lines like the following: byte 0x80 (0200) encoding error byte 0x81 (0201) encoding error ... byte 0xff (0377) encoding error and exits with status 1. --=20 You are receiving this mail because: You are on the CC list for the bug.