public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Lewis Hyatt <lhyatt@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>, richard.sandiford@arm.com
Cc: Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] pch: Fix streaming of strings with embedded null bytes
Date: Wed, 19 Oct 2022 08:47:05 -0400	[thread overview]
Message-ID: <CAA_5UQ49zieU4B42mq438T0ZepzmPyFztwTdfiywVo6SgZSYvQ@mail.gmail.com> (raw)
In-Reply-To: <Y0/sURo2itRDkfrM@tucnak>

On Wed, Oct 19, 2022 at 8:23 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Oct 19, 2022 at 01:17:02PM +0100, Richard Sandiford wrote:
> > Jakub Jelinek <jakub@redhat.com> writes:
> > > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote:
> > >> Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> > >> > (whether they live in GC-controlled memory or not) will be marked for PCH
> > >> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> > >> > special-cases plain char* strings, and in particular it uses strlen() to get
> > >> > their length. Thus it does not handle strings with embedded null bytes, but it
> > >> > is possible for something PCH cares about (such as a string literal token in a
> > >> > macro definition) to contain such embedded nulls. To fix that up, add a new
> > >> > GTY option "string_length" so that gt_pch_note_object can be informed the
> > >> > actual length it ought to use, and use it in the relevant libcpp structs
> > >> > (cpp_string and ht_identifier) accordingly.
> > >>
> > >> This isn't really my area, as I'm about to demonstrate with this
> > >> question, but: regarding
> > >>
> > >>   if (note_ptr_fn == gt_pch_p_S)
> > >>     (*slot)->size = strlen ((const char *)obj) + 1;
> > >>   else
> > >>     (*slot)->size = ggc_get_size (obj);
> > >>
> > >> do you know why the PCH code goes out of its way to handle the sizes of
> > >> strings specially?  Are there enough garbage strings in the string pool
> > >> that it's worth optimising the size of the saved memory for strings but
> > >> not for other types of object?  Or is the gt_pch_p_S test needed for
> > >> correctness, rather than just being an optimisation?
> > >
> > > Just guessing, not all GC strings live in the stringpool.
> > > Isn't e.g. ggc_strdup just a GC allocation where the string length
> > > isn't stored anywhere?
> >
> > Is that different from other GC VLA allocations though?  I thought
> > ultimately we just tried to save and restore the containing pages.
>
> I think just the objects in it, not entire pages (ggc_get_size (obj)
> sized chunks for non-strings).
>
> > > And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just
> > > returns ""; I guess const char * pointers in GC memory can also point
> > > to string literals in .rodata and for PCH we move them.
> >
> > Ah, OK, that would definitely explain it, thanks.  In that case,
> > are you OK with the patch, as a way of continuing to support rodata
> > string pointers while also allowing embedded nuls?
>
> LGTM.
>

Thank you both very much, I will push that then.
My understanding is that a GTY()ed struct can contain arbitrary char*
pointers as a special case, they need not be in the string pool. They
will be silently ignored by GC marking routines if they are not within
ggc's pages (as opposed to any other pointer, which will abort if it
wasn't under ggc's control), and they will make it into PCH by the
gt_pch_note_object mechanism. For example, struct line_maps contains a
char* to store the file name for each map, which is just an ordinary
malloc()ed string owned by the cpp_reader object.

-Lewis

  reply	other threads:[~2022-10-19 12:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18 22:14 Lewis Hyatt
2022-10-19 11:54 ` Richard Sandiford
2022-10-19 12:08   ` Jakub Jelinek
2022-10-19 12:17     ` Richard Sandiford
2022-10-19 12:23       ` Jakub Jelinek
2022-10-19 12:47         ` Lewis Hyatt [this message]
2023-07-04 15:50 ` 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
2023-07-04 19:56   ` Lewis Hyatt
2023-07-05  7:56     ` GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)) Thomas Schwinge
2023-07-05  8:15       ` Richard Biener
2023-07-05  7:50 ` GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
2023-07-05  8:13   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAA_5UQ49zieU4B42mq438T0ZepzmPyFztwTdfiywVo6SgZSYvQ@mail.gmail.com \
    --to=lhyatt@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).