From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 97050 invoked by alias); 6 Sep 2015 22:21:36 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 97018 invoked by uid 48); 6 Sep 2015 22:21:33 -0000 From: "egmont at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug locale/18927] New: Different strings should never collate as equal Date: Sun, 06 Sep 2015 22:21:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: locale X-Bugzilla-Version: 2.21 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: egmont at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-09/txt/msg00063.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=18927 Bug ID: 18927 Summary: Different strings should never collate as equal Product: glibc Version: 2.21 Status: NEW Severity: normal Priority: P2 Component: locale Assignee: unassigned at sourceware dot org Reporter: egmont at gmail dot com Target Milestone: --- Bug 13547 manually fixed a case where two distinct strings collated as equal. Bug 16527 is another, currently unresolved case. Probably there are other, yet undiscovered cases as well, and new ones might appear in the future. This causes confusion with programs such as sort (the order is undefined, might vary from run to run), or uniq (different lines being reported as equal). I think there should be a safeguard code so that no locale definition can result in this ever happening. One possible approach I can imagine: Change the current strxfrm() magic to produce an output that's restricted to bytes in the 2-255 range. Then append a 0x01 byte followed by the original string's literal copy. -- You are receiving this mail because: You are on the CC list for the bug.