From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 7F3483858CD1 for ; Wed, 5 Jul 2023 07:56:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7F3483858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="6.01,182,1684828800"; d="scan'208,223";a="10950792" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 04 Jul 2023 23:56:35 -0800 IronPort-SDR: oMYV/tpegoqo/7oHgXh5w3Xkz/ywyCykFJr9uho2k9h16ftsVP0J+De/q4DzheZC7JNk6WoRw8 4B5XQrpQGGgDYCP/q37U+MPKPXVPz2KJwCiqDxShKr3xWvLuYp/RXzxTaj9imvFf+OyZNA/kpl tRp2+X9UuS7cyn3OlTlS9V8QELat4GH71+b0+9oehzUF1LCVOXUmJZiZTpVtfh5HxIlKeov4fo L+LRBEbdaX4bmV6T3EL7xGkn5LAtW1OMXHU8F4eCOFjnsrfVrFabVkcTdbKq0BV2p2lEyv9Bv/ Mcg= From: Thomas Schwinge To: Lewis Hyatt , CC: Richard Sandiford , Jakub Jelinek , David Malcolm Subject: GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)) In-Reply-To: References: <87h6qjvfp1.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Wed, 5 Jul 2023 09:56:23 +0200 Message-ID: <878rbuvljs.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi! On 2023-07-04T15:56:23-0400, Lewis Hyatt via Gcc-patches wrote: > On Tue, Jul 4, 2023 at 11:50=E2=80=AFAM Thomas Schwinge wrote: >> I came across this one here on my way working through another (somewhat >> related) GTY issue. I generally do understand the issue here, but do >> have a question about 'unsigned int len' field in >> 'libcpp/include/symtab.h:struct ht_identifier': [...] > I don't think there is currently any possibility for a null byte to > end up in an ht_identifier's string. I assumed that ht_identifier > stores the length as an optimization (especially since it doesn't take > up any extra space on 64-bit platforms, given the 32-bit hash code is > stored as well there.) I created the string_length GTY markup mainly > to support another patch that I have still pending review, which I > thought would increase the likelihood of PCH needing to handle null > bytes in general. When I did that, I added the markup to ht_identifier > simply because the length was already there, so there was no reason > not to add it. It does save a few cycles when streaming out the PCH, > but I doubt it is meaningful. Thanks for confirming. OK thus to push the attached "GTY: Enhance 'string_length' option documentation"? Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0001-GTY-Enhance-string_length-option-documentation.patch" >From a31b6657c26ac70c6e03b8ad81cdcb873f905716 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 5 Jul 2023 08:38:49 +0200 Subject: [PATCH] GTY: Enhance 'string_length' option documentation We're (currently) not aware of any actual use of 'ht_identifier's with NUL characters embedded; its 'len' field appears to exist for optimization purposes, since "forever". Before 'struct ht_identifier' was added in commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we had in 'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or earlier 'length', earlier in 'gcc/cpphash.h:struct hashnode': 'unsigned short length', earlier 'size_t length' with comment: "length of token, for quick comparison", earlier 'int length', ever since the 'gcc/cpp*' files were added in commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191). This amends commit f3b957ea8b9dadfb1ed30f24f463529684b7a36a "pch: Fix streaming of strings with embedded null bytes". gcc/ * doc/gty.texi (GTY Options) : Enhance. libcpp/ * include/symtab.h (struct ht_identifier): Document different rationale. --- gcc/doc/gty.texi | 11 +++++++++++ libcpp/include/symtab.h | 4 +--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi index 7bd064b5781..15f9fa07405 100644 --- a/gcc/doc/gty.texi +++ b/gcc/doc/gty.texi @@ -217,6 +217,17 @@ struct GTY(()) non_terminated_string @{ @}; @end smallexample +Similarly, this is useful for (regular NUL-terminated) strings with +NUL characters embedded (that the default @code{strlen} use would run +afoul of): + +@smallexample +struct GTY(()) multi_string @{ + const char * GTY((string_length ("%h.len + 1"))) str; + size_t len; +@}; +@end smallexample + The @code{string_length} option currently is not supported for (fields in) global variables. @c diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h index c7ccc6db9f0..0c713f2ad30 100644 --- a/libcpp/include/symtab.h +++ b/libcpp/include/symtab.h @@ -29,9 +29,7 @@ along with this program; see the file COPYING3. If not see typedef struct ht_identifier ht_identifier; typedef struct ht_identifier *ht_identifier_ptr; struct GTY(()) ht_identifier { - /* This GTY markup arranges that the null-terminated identifier would still - stream to PCH correctly, if a null byte were to make its way into an - identifier somehow. */ + /* We know the 'len'gth of the 'str'ing; use it in the GTY markup. */ const unsigned char * GTY((string_length ("1 + %h.len"))) str; unsigned int len; unsigned int hash_value; -- 2.34.1 --=-=-=--