From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 143E93858D39; Thu, 16 Mar 2023 09:28:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 143E93858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,265,1673942400"; d="scan'208";a="104574508" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 16 Mar 2023 01:28:47 -0800 IronPort-SDR: 86gtC+vrIvYg/DKwp4JYVsiNdLnKRs8iiJmbfcW4QYU/OE5H1zXq85yj6U2wl99/et9rIyEuYm My9SuBzXEyqmhMl3iNqzzPx7mGJ8u9X9GLxQ0Y/ARInSTLvXQ7t7ze9LVAROMMxU34JfDe7D3Z GE1lgxndUC3ENB7xf3WaAAIM1+NEv0G6T3cnf2Naox1HgdZFKG6RhF6Znm311SzJE1y8RdVVNV 6/huDV7KTFLRP1N4OIjlfRI7jvNkCZ8rCkFnPGr/vHg0bJIvIsru4ez0yq5z8XUOv69o+S+C08 oXQ= From: Thomas Schwinge To: Raiki Tamura , Jakub Jelinek , Philip Herron CC: , , David Edelsohn , Arthur Cohen , Arsen =?utf-8?Q?Arsenovi=C4=87?= , "Mark Wielaard" Subject: Re: [GSoC] gccrs Unicode support In-Reply-To: References: User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Thu, 16 Mar 2023 10:28:39 +0100 Message-ID: <87lejxujso.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! (By the way, this GSoC project is being discussed in GCC/Rust Zulip: .) I'm now also putting Mark Wielaard in CC; he once also started discussing this topic, "thinking of importing a couple of gnulib modules to help with UTF-8 processing [unless] other gcc frontends handle [these things] already in a way that might be reusable". See the thread starting at "rust frontend and UTF-8/unicode processing/properties". On 2023-03-15T16:18:18+0100, Jakub Jelinek via Gcc wrote: > On Wed, Mar 15, 2023 at 11:00:19AM +0000, Philip Herron via Gcc wrote: >> Excellent work on getting up to speed on the rust front-end. From my >> perspective I am interested to see what the wider GCC community thinks >> about using https://www.gnu.org/software/libunistring/ library within GC= C >> instead of rolling our own, this means it will be another dependency on = GCC. >> >> The other option is there is already code in the other front-ends to do >> this so in the worst case it should be possible to extract something out= of >> them and possibly make this a shared piece of functionality which we can >> mentor you through. > > I don't know what exactly Rust FE needs in this area, but e.g. libcpp > already handles whatever C/C++ need from Unicode support POV and can hand= le > it without any extra libraries. > So, if we could avoid the extra dependency, it would be certainly better, > unless you really need massive amounts of code from those libraries. > libcpp already e.g. provides mapping of unicode character names to code > points, determining which unicode characters can appear at the start or > in the middle of identifiers, etc. So that's exactly the answer that I supposed you or someone else would give. ;-) That means, GCC/Rust has some investigation to do: whether what libcpp contains is (a) sufficient for its needs, and (b) whether that code can be reused/extracted/refactored in a sensible way, into GCC-level shared source code file, to be used by several front ends (possibly via libcpp). (I suppose GCC/Rust shouldn't link in libcpp directly.) Thanks for the input, all! Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955