From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id EFF5C3858C3A for ; Mon, 10 Jan 2022 22:13:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EFF5C3858C3A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 9V82Q3PavED+agH6v4WAQ9cRB4F++sf22Qr5M4yCvntu8G/Ue4prxm3sMmjrMbQsiFBC33l0oC jFjfGfukhiWZdIEQBjuf0BFdcsk2nZTWGwkz32mVmWT760rcCIUWVIza6ZH9PV2xekogmHyvTO oW3L5tHRTeemk2qabHoByiWHEWANMcJw7jJqU4w3XzRuI0pr7VcTSGEN9VdOmuS3GvnXR97h4r cieg1ifLZjpsQxqYpNIusBQ3gkNMeUUOYnnWIJ5kw9oCjgbCXvKbLbwoDBSpLBOa054bLQBDZw ArfNY8KJZ5uMluauYvogqPP7 X-IronPort-AV: E=Sophos;i="5.88,278,1635235200"; d="scan'208";a="70463504" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 10 Jan 2022 14:13:24 -0800 IronPort-SDR: bhgQvcWRMdaJ+kxtaVQHSvs/ybYfPimGgENfHzyloInzFqfoFSswsemeeGGXc6eV//FaHgnQBj g4Zh/di3pCn6bdn6XN06rSPePv6VCQW4JeyVb5opYqxEdPFwGmxbJeMDj/cCZhmvb7W2nJ/JzP 6ZEdTVYssk8QqREIv3E/g6M4bAJEwCyT28dbizWnIDivzsfCzwwR9X+eACLy7l8cVV3F+7urSV yQsydn/5YyC0BxaExMu0kqY30VCM8woFHMC0LMeLNHkq02oobECgvc+5Tcd425W0oDeFR1CjhX DEM= Date: Mon, 10 Jan 2022 22:13:17 +0000 From: Joseph Myers X-X-Sender: jsm28@digraph.polyomino.org.uk To: Paul Koning CC: binutils Subject: Re: Unicode security In-Reply-To: <515C0921-3E2A-4E7A-96EC-BF0B4719543D@comcast.net> Message-ID: References: <515C0921-3E2A-4E7A-96EC-BF0B4719543D@comcast.net> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-3115.6 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jan 2022 22:13:25 -0000 On Mon, 10 Jan 2022, Paul Koning via Binutils wrote: > A standard that needs to handle Unicode and have a definition of "equal > strings" will want to refer to a particular normalization. For the purposes of ELF, equal strings are equal octet sequences, with no further interpretation. The ELF bindings to C do not need a concept of "equal", they just need to say that UTF-8 is used to encode the sequence of Unicode code points in the C symbol. Those bindings need to handle multiple C versions with different sets of allowed characters in identifiers, some of which allow identifiers that are different as sequences of Unicode code points, and thus different in C and in UTF-8, although the same in NFC. In those cases, the bindings need to result in different octet sequences in ELF symbols for those different (but normalized the same) C identifiers. When a C identifier is written in NFC, so must the ELF symbol be; when a C identifier is written in NFD, so must the ELF symbol be; when a C identifier is in neither normalization form, so must the ELF symbol be. -- Joseph S. Myers joseph@codesourcery.com