From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by sourceware.org (Postfix) with ESMTPS id 5F0C33858D39 for ; Wed, 19 Oct 2022 12:47:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5F0C33858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x232.google.com with SMTP id m23so22007534lji.2 for ; Wed, 19 Oct 2022 05:47:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=NRlcvF/VZugb407EpzSe6GSrbFzoyw3ic0RphGhFtQo=; b=F0pXIVlCGLDbjNwi+hXcPK8vEaRolomNR064Br4oefQrhBfUkok75s196w/XX8d22B d1okqe9juJsajm6SoXjhJvyhQ4S7AjtsxarOShPqI2Pt9wD9mGoERnI4agkJgq6HFnYV Yy675/f4wk7LMLI0MrjNX7JqOdJNHYcm1wiEET1Wkj5ahLeSfNz6JVW5/92n3tUZBpWE zPXwz1G/AWofNJAdkhQkkhq62xFTbGsA1VQtKlWJvssSLKzC1fY0uB5jzJHva8/FL8C3 D0M56BTLJlatfh3uTuGmhMGCkP6ZvucHblosp4dFstJ73hefYEySVskt678TowoeU8UR t6lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NRlcvF/VZugb407EpzSe6GSrbFzoyw3ic0RphGhFtQo=; b=FzJTVCWSdxIk4yDvkX4FEUhkosZDUyd9HQFQBnVq7T3eGXbiAzUYOy7uYNwlZ6qv5K SGBkKXtrzfs730eDNegxJZdIsZUw4PI4/Ryg7UTkq+hvqG3AUY+QNn+ryo5hblaFoGdS wk8akUxzzjB4yIQkryISP/jZQMR+LaLMPhnOxhw6h2R7XRV3QmEYvrR8KpWntvnkJMFx VsROPwISg5kvlHqa1xWY0g6DWSojo4iCfs7slUMxvjgV0ORUma+UpT9Z5Iver3oGPi8t lypr+uQIEArzEBZNvmfE7bJ2Y/46bN2wT9ryT/oNuAczczdL5+tLbvkhSjFF36QXii4H ziIw== X-Gm-Message-State: ACrzQf3XpG1YVcitAuy+kHaWKtyVMAVEHWpyn4Ng66sTY0t4235YhZuz vBrTbF+OCunSaSiyQkhsGw8Id1GQzvEvL7rlu0U= X-Google-Smtp-Source: AMsMyM5klB1gJbUihBJNiwYsVD9BnST+HXSTB18ov35k31eKWCyWv0Tsw88o43e+mzDJftkU0UKEvcJyVs2Th1hFzcw= X-Received: by 2002:a2e:bd0e:0:b0:261:e718:e902 with SMTP id n14-20020a2ebd0e000000b00261e718e902mr3017260ljq.435.1666183636576; Wed, 19 Oct 2022 05:47:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Lewis Hyatt Date: Wed, 19 Oct 2022 08:47:05 -0400 Message-ID: Subject: Re: [PATCH] pch: Fix streaming of strings with embedded null bytes To: Jakub Jelinek , richard.sandiford@arm.com Cc: Lewis Hyatt via Gcc-patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3029.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Oct 19, 2022 at 8:23 AM Jakub Jelinek wrote: > > On Wed, Oct 19, 2022 at 01:17:02PM +0100, Richard Sandiford wrote: > > Jakub Jelinek writes: > > > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote: > > >> Lewis Hyatt via Gcc-patches writes: > > >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains > > >> > (whether they live in GC-controlled memory or not) will be marked for PCH > > >> > output by the routine gt_pch_note_object in ggc-common.cc. This routine > > >> > special-cases plain char* strings, and in particular it uses strlen() to get > > >> > their length. Thus it does not handle strings with embedded null bytes, but it > > >> > is possible for something PCH cares about (such as a string literal token in a > > >> > macro definition) to contain such embedded nulls. To fix that up, add a new > > >> > GTY option "string_length" so that gt_pch_note_object can be informed the > > >> > actual length it ought to use, and use it in the relevant libcpp structs > > >> > (cpp_string and ht_identifier) accordingly. > > >> > > >> This isn't really my area, as I'm about to demonstrate with this > > >> question, but: regarding > > >> > > >> if (note_ptr_fn == gt_pch_p_S) > > >> (*slot)->size = strlen ((const char *)obj) + 1; > > >> else > > >> (*slot)->size = ggc_get_size (obj); > > >> > > >> do you know why the PCH code goes out of its way to handle the sizes of > > >> strings specially? Are there enough garbage strings in the string pool > > >> that it's worth optimising the size of the saved memory for strings but > > >> not for other types of object? Or is the gt_pch_p_S test needed for > > >> correctness, rather than just being an optimisation? > > > > > > Just guessing, not all GC strings live in the stringpool. > > > Isn't e.g. ggc_strdup just a GC allocation where the string length > > > isn't stored anywhere? > > > > Is that different from other GC VLA allocations though? I thought > > ultimately we just tried to save and restore the containing pages. > > I think just the objects in it, not entire pages (ggc_get_size (obj) > sized chunks for non-strings). > > > > And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just > > > returns ""; I guess const char * pointers in GC memory can also point > > > to string literals in .rodata and for PCH we move them. > > > > Ah, OK, that would definitely explain it, thanks. In that case, > > are you OK with the patch, as a way of continuing to support rodata > > string pointers while also allowing embedded nuls? > > LGTM. > Thank you both very much, I will push that then. My understanding is that a GTY()ed struct can contain arbitrary char* pointers as a special case, they need not be in the string pool. They will be silently ignored by GC marking routines if they are not within ggc's pages (as opposed to any other pointer, which will abort if it wasn't under ggc's control), and they will make it into PCH by the gt_pch_note_object mechanism. For example, struct line_maps contains a char* to store the file name for each map, which is just an ordinary malloc()ed string owned by the cpp_reader object. -Lewis