From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 33BC73858D39 for ; Wed, 19 Oct 2022 12:17:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 33BC73858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1BDD814BF; Wed, 19 Oct 2022 05:17:10 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4B33A3F67D; Wed, 19 Oct 2022 05:17:03 -0700 (PDT) From: Richard Sandiford To: Jakub Jelinek Mail-Followup-To: Jakub Jelinek ,Lewis Hyatt via Gcc-patches , Lewis Hyatt , richard.sandiford@arm.com Cc: Lewis Hyatt via Gcc-patches , Lewis Hyatt Subject: Re: [PATCH] pch: Fix streaming of strings with embedded null bytes References: Date: Wed, 19 Oct 2022 13:17:02 +0100 In-Reply-To: (Jakub Jelinek's message of "Wed, 19 Oct 2022 14:08:47 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-38.0 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jakub Jelinek writes: > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote: >> Lewis Hyatt via Gcc-patches writes: >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains >> > (whether they live in GC-controlled memory or not) will be marked for PCH >> > output by the routine gt_pch_note_object in ggc-common.cc. This routine >> > special-cases plain char* strings, and in particular it uses strlen() to get >> > their length. Thus it does not handle strings with embedded null bytes, but it >> > is possible for something PCH cares about (such as a string literal token in a >> > macro definition) to contain such embedded nulls. To fix that up, add a new >> > GTY option "string_length" so that gt_pch_note_object can be informed the >> > actual length it ought to use, and use it in the relevant libcpp structs >> > (cpp_string and ht_identifier) accordingly. >> >> This isn't really my area, as I'm about to demonstrate with this >> question, but: regarding >> >> if (note_ptr_fn == gt_pch_p_S) >> (*slot)->size = strlen ((const char *)obj) + 1; >> else >> (*slot)->size = ggc_get_size (obj); >> >> do you know why the PCH code goes out of its way to handle the sizes of >> strings specially? Are there enough garbage strings in the string pool >> that it's worth optimising the size of the saved memory for strings but >> not for other types of object? Or is the gt_pch_p_S test needed for >> correctness, rather than just being an optimisation? > > Just guessing, not all GC strings live in the stringpool. > Isn't e.g. ggc_strdup just a GC allocation where the string length > isn't stored anywhere? Is that different from other GC VLA allocations though? I thought ultimately we just tried to save and restore the containing pages. > And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just > returns ""; I guess const char * pointers in GC memory can also point > to string literals in .rodata and for PCH we move them. Ah, OK, that would definitely explain it, thanks. In that case, are you OK with the patch, as a way of continuing to support rodata string pointers while also allowing embedded nuls? Richard