From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5D226386EC42; Mon, 29 Jun 2020 11:21:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D226386EC42 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593429701; bh=Y8V9tFb++Mu4H4Acav8Zhf9p65sVzUgdcAN+hQ1hjmo=; h=From:To:Subject:Date:From; b=d1KpI1d//x+/wpXZYR8XMXu+T/YWyeuaangWGZYTycrYznkmtNkEp6WeX/Xj01sNp Q0d5rAcFiodgJkva22No3zFSZUYWFfYO+75AR82WZKERxrDrTCcYhOj9/LLg92zV+l TP07L4LRZXkC8tSWULCBfJXCAkIAACiylrg126mY= From: "simon at pushface dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug ada/95959] New: Error in conversion from UTF16 to UTF8 Date: Mon, 29 Jun 2020 11:21:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: ada X-Bugzilla-Version: 10.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: simon at pushface dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 11:21:41 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95959 Bug ID: 95959 Summary: Error in conversion from UTF16 to UTF8 Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: simon at pushface dot org Target Milestone: --- Created attachment 48799 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D48799&action=3Dedit Demonstration There's an error in converting from UTF16 to UTF8 for code points in U+10000 to u+10FFFF (which require 4 UTF8 bytes). The attached demonstration shows this by taking a UTF8 character (Clef, U+1D11E), converting to UTF16, and converting back to UTF8, which should round-trip back to the same character, but doesn't. The third byte of the final UTF8 is wrong $ ./utftest=20 Codepoint: 16#1D11E# UTF-8: 4: 2#11110000# 2#10011101# 2#10000100# 2#10011110# UTF-16: 2: 2#1101100000110100# 2#1101110100011110# UTF-8: 4: 2#11110000# 2#10011101# 2#10010000# 2#10011110# Bug The attached patch corrects the problem.=