public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] pch: Fix streaming of strings with embedded null bytes
@ 2022-10-18 22:14 Lewis Hyatt
  2022-10-19 11:54 ` Richard Sandiford
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Lewis Hyatt @ 2022-10-18 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Lewis Hyatt

When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
(whether they live in GC-controlled memory or not) will be marked for PCH
output by the routine gt_pch_note_object in ggc-common.cc. This routine
special-cases plain char* strings, and in particular it uses strlen() to get
their length. Thus it does not handle strings with embedded null bytes, but it
is possible for something PCH cares about (such as a string literal token in a
macro definition) to contain such embedded nulls. To fix that up, add a new
GTY option "string_length" so that gt_pch_note_object can be informed the
actual length it ought to use, and use it in the relevant libcpp structs
(cpp_string and ht_identifier) accordingly.

gcc/ChangeLog:

	* gengtype.cc (output_escaped_param): Add missing const.
	(get_string_option): Add missing check for option type.
	(walk_type): Support new "string_length" GTY option.
	(write_types_process_field): Likewise.
	* ggc-common.cc (gt_pch_note_object): Add optional length argument.
	* ggc.h (gt_pch_note_object): Adjust prototype for new argument.
	(gt_pch_n_S2): Declare...
	* stringpool.cc (gt_pch_n_S2): ...new function.
	* doc/gty.texi: Document new GTY((string_length)) option.

libcpp/ChangeLog:

	* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
	* include/symtab.h (struct ht_identifier): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/pch/pch-string-nulls.C: New test.
	* g++.dg/pch/pch-string-nulls.Hs: New test.
---

Notes:
    Hello-
    
    This fixes a small glitch with PCH files that I doubt matters in
    practice. However, the new GTY((string_length)) option I think should be also
    useful for other things (including for another patch I am working on), and it
    seems worth fixing to me anyway.  Please let me know if it looks OK, or if you'd
    prefer another approach? I did consider reusing GTY((length)) for this purpose
    but it seemed much more straightforward to do it with a new option, and it's
    really about something different since it isn't related to marking of
    GC-controlled memory.
    
    BTW, the testcase (pch-string-nulls.Hs) needs to have a literal null byte in
    it. That wasn't emailing well so I temporarily have it as the string "^@" in
    this patch, for illustration.
    
    Bootstrap + regtest all languages looks good on x86-64 Linux. Thanks!
    
    -Lewis

 gcc/doc/gty.texi                             | 21 +++++++++++++++-
 gcc/gengtype.cc                              | 25 ++++++++++++++++----
 gcc/ggc-common.cc                            |  7 ++++--
 gcc/ggc.h                                    |  4 +++-
 gcc/stringpool.cc                            |  7 ++++++
 gcc/testsuite/g++.dg/pch/pch-string-nulls.C  |  3 +++
 gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs |  2 ++
 libcpp/include/cpplib.h                      |  6 ++++-
 libcpp/include/symtab.h                      |  5 +++-
 9 files changed, 70 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.C
 create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 81aafd11ce3..4f791b300ba 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -196,7 +196,26 @@ static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
 Note that the @code{length} option is only meant for use with arrays of
 non-atomic objects, that is, objects that contain pointers pointing to
 other GTY-managed objects.  For other GC-allocated arrays and strings
-you should use @code{atomic}.
+you should use @code{atomic} or @code{string_length}.
+
+@findex string_length
+@item string_length ("@var{expression}")
+
+In order to simplify production of PCH, a structure member that is a plain
+array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char
+*}) is treated specially by the infrastructure. Even if such an array has not
+been allocated in GC-controlled memory, it will still be written properly into
+a PCH.  The machinery responsible for this needs to know the length of the
+data; by default, the length is determined by calling @code{strlen} on the
+pointer.  The @code{string_length} option specifies an alternate way to
+determine the length, such as by inspecting another struct member:
+
+@smallexample
+struct GTY(()) non_terminated_string @{
+  size_t sz;
+  const char * GTY((string_length ("%h.sz"))) data;
+@};
+@end smallexample
 
 @findex skip
 @item skip
diff --git a/gcc/gengtype.cc b/gcc/gengtype.cc
index 42363439bd8..28bf05e9c57 100644
--- a/gcc/gengtype.cc
+++ b/gcc/gengtype.cc
@@ -2403,7 +2403,7 @@ struct write_types_data
   enum write_types_kinds kind;
 };
 
-static void output_escaped_param (struct walk_type_data *d,
+static void output_escaped_param (const struct walk_type_data *d,
 				  const char *, const char *);
 static void output_mangled_typename (outf_p, const_type_p);
 static void walk_type (type_p t, struct walk_type_data *d);
@@ -2537,7 +2537,7 @@ output_mangled_typename (outf_p of, const_type_p t)
    print error messages.  */
 
 static void
-output_escaped_param (struct walk_type_data *d, const char *param,
+output_escaped_param (const struct walk_type_data *d, const char *param,
 		      const char *oname)
 {
   const char *p;
@@ -2576,7 +2576,7 @@ const char *
 get_string_option (options_p opt, const char *key)
 {
   for (; opt; opt = opt->next)
-    if (strcmp (opt->name, key) == 0)
+    if (opt->kind == OPTION_STRING && strcmp (opt->name, key) == 0)
       return opt->info.string;
   return NULL;
 }
@@ -2700,6 +2700,8 @@ walk_type (type_p t, struct walk_type_data *d)
       ;
     else if (strcmp (oo->name, "callback") == 0)
       ;
+    else if (strcmp (oo->name, "string_length") == 0)
+      ;
     else
       error_at_line (d->line, "unknown option `%s'\n", oo->name);
 
@@ -3251,7 +3253,22 @@ write_types_process_field (type_p f, const struct walk_type_data *d)
 	{
 	  oprintf (d->of, "%*sgt_%s_", d->indent, "", wtd->prefix);
 	  output_mangled_typename (d->of, f);
-	  oprintf (d->of, " (%s%s);\n", cast, d->val);
+
+	  /* Check if we need to call the special pch note version
+	     for strings that takes an explicit length.  */
+	  const auto length_override
+	    = (f->kind == TYPE_STRING && !strcmp (wtd->prefix, "pch_n")
+	       ? get_string_option (d->opt, "string_length")
+	       : nullptr);
+	  if (length_override)
+	    {
+	      oprintf (d->of, "2 (%s%s, ", cast, d->val);
+	      output_escaped_param (d, length_override, "string_length");
+	    }
+	  else
+	    oprintf (d->of, " (%s%s", cast, d->val);
+
+	  oprintf (d->of, ");\n");
 	  if (d->reorder_fn && wtd->reorder_note_routine)
 	    oprintf (d->of, "%*s%s (%s%s, %s%s, %s);\n", d->indent, "",
 		     wtd->reorder_note_routine, cast, d->val, cast, d->val,
diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
index 8b3389e8760..62da09d66a7 100644
--- a/gcc/ggc-common.cc
+++ b/gcc/ggc-common.cc
@@ -253,7 +253,8 @@ static vec<void *> reloc_addrs_vec;
 
 int
 gt_pch_note_object (void *obj, void *note_ptr_cookie,
-		    gt_note_pointers note_ptr_fn)
+		    gt_note_pointers note_ptr_fn,
+		    size_t length_override)
 {
   struct ptr_data **slot;
 
@@ -273,7 +274,9 @@ gt_pch_note_object (void *obj, void *note_ptr_cookie,
   (*slot)->obj = obj;
   (*slot)->note_ptr_fn = note_ptr_fn;
   (*slot)->note_ptr_cookie = note_ptr_cookie;
-  if (note_ptr_fn == gt_pch_p_S)
+  if (length_override != (size_t)-1)
+    (*slot)->size = length_override;
+  else if (note_ptr_fn == gt_pch_p_S)
     (*slot)->size = strlen ((const char *)obj) + 1;
   else
     (*slot)->size = ggc_get_size (obj);
diff --git a/gcc/ggc.h b/gcc/ggc.h
index aeec1bafb9b..7bc74ec82b5 100644
--- a/gcc/ggc.h
+++ b/gcc/ggc.h
@@ -44,7 +44,8 @@ typedef void (*gt_handle_reorder) (void *, void *, gt_pointer_operator,
 				   void *);
 
 /* Used by the gt_pch_n_* routines.  Register an object in the hash table.  */
-extern int gt_pch_note_object (void *, void *, gt_note_pointers);
+extern int gt_pch_note_object (void *, void *, gt_note_pointers,
+			       size_t length_override = (size_t)-1);
 
 /* Used by the gt_pch_p_* routines.  Register address of a callback
    pointer.  */
@@ -101,6 +102,7 @@ extern int ggc_marked_p	(const void *);
 
 /* PCH and GGC handling for strings, mostly trivial.  */
 extern void gt_pch_n_S (const void *);
+extern void gt_pch_n_S2 (const void *, size_t);
 extern void gt_ggc_m_S (const void *);
 
 /* End of GTY machinery API.  */
diff --git a/gcc/stringpool.cc b/gcc/stringpool.cc
index 57509d58e15..20dbef5580c 100644
--- a/gcc/stringpool.cc
+++ b/gcc/stringpool.cc
@@ -196,6 +196,13 @@ gt_pch_n_S (const void *x)
 		      &gt_pch_p_S);
 }
 
+void
+gt_pch_n_S2 (const void *x, size_t string_len)
+{
+  gt_pch_note_object (CONST_CAST (void *, x), CONST_CAST (void *, x),
+		      &gt_pch_p_S, string_len);
+}
+
 
 /* User-callable entry point for marking string X.  */
 
diff --git a/gcc/testsuite/g++.dg/pch/pch-string-nulls.C b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
new file mode 100644
index 00000000000..dfeb21adf71
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
@@ -0,0 +1,3 @@
+// { dg-do compile { target c++11 } }
+#include "pch-string-nulls.H"
+static_assert (X[4] == '[' && X[5] == '!' && X[6] == ']', "error");
diff --git a/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
new file mode 100644
index 00000000000..8f8bc187f8c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
@@ -0,0 +1,2 @@
+/* Note that there is a null byte following "ABC".  */
+#define X R"(ABC^@[!])"
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index d5ef12a30ea..1d34c00669f 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -179,7 +179,11 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, CLK_GNUC17, CLK_GNUC2X,
 /* Payload of a NUMBER, STRING, CHAR or COMMENT token.  */
 struct GTY(()) cpp_string {
   unsigned int len;
-  const unsigned char *text;
+
+  /* TEXT is always null terminated (terminator not included in len); but this
+     GTY markup arranges that PCH streaming works properly even if there is a
+     null byte in the middle of the string.  */
+  const unsigned char * GTY((string_length ("1 + %h.len"))) text;
 };
 
 /* Flags for the cpp_token structure.  */
diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h
index 53efe6c3943..8b45fd5c2ce 100644
--- a/libcpp/include/symtab.h
+++ b/libcpp/include/symtab.h
@@ -29,7 +29,10 @@ along with this program; see the file COPYING3.  If not see
 typedef struct ht_identifier ht_identifier;
 typedef struct ht_identifier *ht_identifier_ptr;
 struct GTY(()) ht_identifier {
-  const unsigned char *str;
+  /* This GTY markup arranges that the null-terminated identifier would still
+     stream to PCH correctly, if a null byte were to make its way into an
+     identifier somehow.  */
+  const unsigned char * GTY((string_length ("1 + %h.len"))) str;
   unsigned int len;
   unsigned int hash_value;
 };

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pch: Fix streaming of strings with embedded null bytes
  2022-10-18 22:14 [PATCH] pch: Fix streaming of strings with embedded null bytes Lewis Hyatt
@ 2022-10-19 11:54 ` Richard Sandiford
  2022-10-19 12:08   ` Jakub Jelinek
  2023-07-04 15:50 ` 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
  2023-07-05  7:50 ` GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Sandiford @ 2022-10-19 11:54 UTC (permalink / raw)
  To: Lewis Hyatt via Gcc-patches; +Cc: Lewis Hyatt

Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> (whether they live in GC-controlled memory or not) will be marked for PCH
> output by the routine gt_pch_note_object in ggc-common.cc. This routine
> special-cases plain char* strings, and in particular it uses strlen() to get
> their length. Thus it does not handle strings with embedded null bytes, but it
> is possible for something PCH cares about (such as a string literal token in a
> macro definition) to contain such embedded nulls. To fix that up, add a new
> GTY option "string_length" so that gt_pch_note_object can be informed the
> actual length it ought to use, and use it in the relevant libcpp structs
> (cpp_string and ht_identifier) accordingly.

This isn't really my area, as I'm about to demonstrate with this
question, but: regarding

  if (note_ptr_fn == gt_pch_p_S)
    (*slot)->size = strlen ((const char *)obj) + 1;
  else
    (*slot)->size = ggc_get_size (obj);

do you know why the PCH code goes out of its way to handle the sizes of
strings specially?  Are there enough garbage strings in the string pool
that it's worth optimising the size of the saved memory for strings but
not for other types of object?  Or is the gt_pch_p_S test needed for
correctness, rather than just being an optimisation?

The special case has been there since the initial version, so there's
not much history to go from.

If using specific sizes for strings is still needed or worthwhile, then
the patch looks good to me.  (And I agree using a new option is better
than overloading "length".)  But since the patch involves adding an
extra argument to gt_pch_note_object, I just wanted to check that we
still had a good reason for handling strings differently.

Thanks,
Richard

> gcc/ChangeLog:
>
> 	* gengtype.cc (output_escaped_param): Add missing const.
> 	(get_string_option): Add missing check for option type.
> 	(walk_type): Support new "string_length" GTY option.
> 	(write_types_process_field): Likewise.
> 	* ggc-common.cc (gt_pch_note_object): Add optional length argument.
> 	* ggc.h (gt_pch_note_object): Adjust prototype for new argument.
> 	(gt_pch_n_S2): Declare...
> 	* stringpool.cc (gt_pch_n_S2): ...new function.
> 	* doc/gty.texi: Document new GTY((string_length)) option.
>
> libcpp/ChangeLog:
>
> 	* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
> 	* include/symtab.h (struct ht_identifier): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.dg/pch/pch-string-nulls.C: New test.
> 	* g++.dg/pch/pch-string-nulls.Hs: New test.
> ---
>
> Notes:
>     Hello-
>     
>     This fixes a small glitch with PCH files that I doubt matters in
>     practice. However, the new GTY((string_length)) option I think should be also
>     useful for other things (including for another patch I am working on), and it
>     seems worth fixing to me anyway.  Please let me know if it looks OK, or if you'd
>     prefer another approach? I did consider reusing GTY((length)) for this purpose
>     but it seemed much more straightforward to do it with a new option, and it's
>     really about something different since it isn't related to marking of
>     GC-controlled memory.
>     
>     BTW, the testcase (pch-string-nulls.Hs) needs to have a literal null byte in
>     it. That wasn't emailing well so I temporarily have it as the string "^@" in
>     this patch, for illustration.
>     
>     Bootstrap + regtest all languages looks good on x86-64 Linux. Thanks!
>     
>     -Lewis
>
>  gcc/doc/gty.texi                             | 21 +++++++++++++++-
>  gcc/gengtype.cc                              | 25 ++++++++++++++++----
>  gcc/ggc-common.cc                            |  7 ++++--
>  gcc/ggc.h                                    |  4 +++-
>  gcc/stringpool.cc                            |  7 ++++++
>  gcc/testsuite/g++.dg/pch/pch-string-nulls.C  |  3 +++
>  gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs |  2 ++
>  libcpp/include/cpplib.h                      |  6 ++++-
>  libcpp/include/symtab.h                      |  5 +++-
>  9 files changed, 70 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.C
>  create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
>
> diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
> index 81aafd11ce3..4f791b300ba 100644
> --- a/gcc/doc/gty.texi
> +++ b/gcc/doc/gty.texi
> @@ -196,7 +196,26 @@ static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
>  Note that the @code{length} option is only meant for use with arrays of
>  non-atomic objects, that is, objects that contain pointers pointing to
>  other GTY-managed objects.  For other GC-allocated arrays and strings
> -you should use @code{atomic}.
> +you should use @code{atomic} or @code{string_length}.
> +
> +@findex string_length
> +@item string_length ("@var{expression}")
> +
> +In order to simplify production of PCH, a structure member that is a plain
> +array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char
> +*}) is treated specially by the infrastructure. Even if such an array has not
> +been allocated in GC-controlled memory, it will still be written properly into
> +a PCH.  The machinery responsible for this needs to know the length of the
> +data; by default, the length is determined by calling @code{strlen} on the
> +pointer.  The @code{string_length} option specifies an alternate way to
> +determine the length, such as by inspecting another struct member:
> +
> +@smallexample
> +struct GTY(()) non_terminated_string @{
> +  size_t sz;
> +  const char * GTY((string_length ("%h.sz"))) data;
> +@};
> +@end smallexample
>  
>  @findex skip
>  @item skip
> diff --git a/gcc/gengtype.cc b/gcc/gengtype.cc
> index 42363439bd8..28bf05e9c57 100644
> --- a/gcc/gengtype.cc
> +++ b/gcc/gengtype.cc
> @@ -2403,7 +2403,7 @@ struct write_types_data
>    enum write_types_kinds kind;
>  };
>  
> -static void output_escaped_param (struct walk_type_data *d,
> +static void output_escaped_param (const struct walk_type_data *d,
>  				  const char *, const char *);
>  static void output_mangled_typename (outf_p, const_type_p);
>  static void walk_type (type_p t, struct walk_type_data *d);
> @@ -2537,7 +2537,7 @@ output_mangled_typename (outf_p of, const_type_p t)
>     print error messages.  */
>  
>  static void
> -output_escaped_param (struct walk_type_data *d, const char *param,
> +output_escaped_param (const struct walk_type_data *d, const char *param,
>  		      const char *oname)
>  {
>    const char *p;
> @@ -2576,7 +2576,7 @@ const char *
>  get_string_option (options_p opt, const char *key)
>  {
>    for (; opt; opt = opt->next)
> -    if (strcmp (opt->name, key) == 0)
> +    if (opt->kind == OPTION_STRING && strcmp (opt->name, key) == 0)
>        return opt->info.string;
>    return NULL;
>  }
> @@ -2700,6 +2700,8 @@ walk_type (type_p t, struct walk_type_data *d)
>        ;
>      else if (strcmp (oo->name, "callback") == 0)
>        ;
> +    else if (strcmp (oo->name, "string_length") == 0)
> +      ;
>      else
>        error_at_line (d->line, "unknown option `%s'\n", oo->name);
>  
> @@ -3251,7 +3253,22 @@ write_types_process_field (type_p f, const struct walk_type_data *d)
>  	{
>  	  oprintf (d->of, "%*sgt_%s_", d->indent, "", wtd->prefix);
>  	  output_mangled_typename (d->of, f);
> -	  oprintf (d->of, " (%s%s);\n", cast, d->val);
> +
> +	  /* Check if we need to call the special pch note version
> +	     for strings that takes an explicit length.  */
> +	  const auto length_override
> +	    = (f->kind == TYPE_STRING && !strcmp (wtd->prefix, "pch_n")
> +	       ? get_string_option (d->opt, "string_length")
> +	       : nullptr);
> +	  if (length_override)
> +	    {
> +	      oprintf (d->of, "2 (%s%s, ", cast, d->val);
> +	      output_escaped_param (d, length_override, "string_length");
> +	    }
> +	  else
> +	    oprintf (d->of, " (%s%s", cast, d->val);
> +
> +	  oprintf (d->of, ");\n");
>  	  if (d->reorder_fn && wtd->reorder_note_routine)
>  	    oprintf (d->of, "%*s%s (%s%s, %s%s, %s);\n", d->indent, "",
>  		     wtd->reorder_note_routine, cast, d->val, cast, d->val,
> diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
> index 8b3389e8760..62da09d66a7 100644
> --- a/gcc/ggc-common.cc
> +++ b/gcc/ggc-common.cc
> @@ -253,7 +253,8 @@ static vec<void *> reloc_addrs_vec;
>  
>  int
>  gt_pch_note_object (void *obj, void *note_ptr_cookie,
> -		    gt_note_pointers note_ptr_fn)
> +		    gt_note_pointers note_ptr_fn,
> +		    size_t length_override)
>  {
>    struct ptr_data **slot;
>  
> @@ -273,7 +274,9 @@ gt_pch_note_object (void *obj, void *note_ptr_cookie,
>    (*slot)->obj = obj;
>    (*slot)->note_ptr_fn = note_ptr_fn;
>    (*slot)->note_ptr_cookie = note_ptr_cookie;
> -  if (note_ptr_fn == gt_pch_p_S)
> +  if (length_override != (size_t)-1)
> +    (*slot)->size = length_override;
> +  else if (note_ptr_fn == gt_pch_p_S)
>      (*slot)->size = strlen ((const char *)obj) + 1;
>    else
>      (*slot)->size = ggc_get_size (obj);
> diff --git a/gcc/ggc.h b/gcc/ggc.h
> index aeec1bafb9b..7bc74ec82b5 100644
> --- a/gcc/ggc.h
> +++ b/gcc/ggc.h
> @@ -44,7 +44,8 @@ typedef void (*gt_handle_reorder) (void *, void *, gt_pointer_operator,
>  				   void *);
>  
>  /* Used by the gt_pch_n_* routines.  Register an object in the hash table.  */
> -extern int gt_pch_note_object (void *, void *, gt_note_pointers);
> +extern int gt_pch_note_object (void *, void *, gt_note_pointers,
> +			       size_t length_override = (size_t)-1);
>  
>  /* Used by the gt_pch_p_* routines.  Register address of a callback
>     pointer.  */
> @@ -101,6 +102,7 @@ extern int ggc_marked_p	(const void *);
>  
>  /* PCH and GGC handling for strings, mostly trivial.  */
>  extern void gt_pch_n_S (const void *);
> +extern void gt_pch_n_S2 (const void *, size_t);
>  extern void gt_ggc_m_S (const void *);
>  
>  /* End of GTY machinery API.  */
> diff --git a/gcc/stringpool.cc b/gcc/stringpool.cc
> index 57509d58e15..20dbef5580c 100644
> --- a/gcc/stringpool.cc
> +++ b/gcc/stringpool.cc
> @@ -196,6 +196,13 @@ gt_pch_n_S (const void *x)
>  		      &gt_pch_p_S);
>  }
>  
> +void
> +gt_pch_n_S2 (const void *x, size_t string_len)
> +{
> +  gt_pch_note_object (CONST_CAST (void *, x), CONST_CAST (void *, x),
> +		      &gt_pch_p_S, string_len);
> +}
> +
>  
>  /* User-callable entry point for marking string X.  */
>  
> diff --git a/gcc/testsuite/g++.dg/pch/pch-string-nulls.C b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
> new file mode 100644
> index 00000000000..dfeb21adf71
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
> @@ -0,0 +1,3 @@
> +// { dg-do compile { target c++11 } }
> +#include "pch-string-nulls.H"
> +static_assert (X[4] == '[' && X[5] == '!' && X[6] == ']', "error");
> diff --git a/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
> new file mode 100644
> index 00000000000..8f8bc187f8c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
> @@ -0,0 +1,2 @@
> +/* Note that there is a null byte following "ABC".  */
> +#define X R"(ABC^@[!])"
> diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> index d5ef12a30ea..1d34c00669f 100644
> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -179,7 +179,11 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, CLK_GNUC17, CLK_GNUC2X,
>  /* Payload of a NUMBER, STRING, CHAR or COMMENT token.  */
>  struct GTY(()) cpp_string {
>    unsigned int len;
> -  const unsigned char *text;
> +
> +  /* TEXT is always null terminated (terminator not included in len); but this
> +     GTY markup arranges that PCH streaming works properly even if there is a
> +     null byte in the middle of the string.  */
> +  const unsigned char * GTY((string_length ("1 + %h.len"))) text;
>  };
>  
>  /* Flags for the cpp_token structure.  */
> diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h
> index 53efe6c3943..8b45fd5c2ce 100644
> --- a/libcpp/include/symtab.h
> +++ b/libcpp/include/symtab.h
> @@ -29,7 +29,10 @@ along with this program; see the file COPYING3.  If not see
>  typedef struct ht_identifier ht_identifier;
>  typedef struct ht_identifier *ht_identifier_ptr;
>  struct GTY(()) ht_identifier {
> -  const unsigned char *str;
> +  /* This GTY markup arranges that the null-terminated identifier would still
> +     stream to PCH correctly, if a null byte were to make its way into an
> +     identifier somehow.  */
> +  const unsigned char * GTY((string_length ("1 + %h.len"))) str;
>    unsigned int len;
>    unsigned int hash_value;
>  };

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pch: Fix streaming of strings with embedded null bytes
  2022-10-19 11:54 ` Richard Sandiford
@ 2022-10-19 12:08   ` Jakub Jelinek
  2022-10-19 12:17     ` Richard Sandiford
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2022-10-19 12:08 UTC (permalink / raw)
  To: Lewis Hyatt via Gcc-patches, Lewis Hyatt, richard.sandiford

On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote:
> Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> > (whether they live in GC-controlled memory or not) will be marked for PCH
> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> > special-cases plain char* strings, and in particular it uses strlen() to get
> > their length. Thus it does not handle strings with embedded null bytes, but it
> > is possible for something PCH cares about (such as a string literal token in a
> > macro definition) to contain such embedded nulls. To fix that up, add a new
> > GTY option "string_length" so that gt_pch_note_object can be informed the
> > actual length it ought to use, and use it in the relevant libcpp structs
> > (cpp_string and ht_identifier) accordingly.
> 
> This isn't really my area, as I'm about to demonstrate with this
> question, but: regarding
> 
>   if (note_ptr_fn == gt_pch_p_S)
>     (*slot)->size = strlen ((const char *)obj) + 1;
>   else
>     (*slot)->size = ggc_get_size (obj);
> 
> do you know why the PCH code goes out of its way to handle the sizes of
> strings specially?  Are there enough garbage strings in the string pool
> that it's worth optimising the size of the saved memory for strings but
> not for other types of object?  Or is the gt_pch_p_S test needed for
> correctness, rather than just being an optimisation?

Just guessing, not all GC strings live in the stringpool.
Isn't e.g. ggc_strdup just a GC allocation where the string length
isn't stored anywhere?  And sometimes it isn't even GC allocated,
e.g. ggc_strdup ("") just returns "";
I guess const char * pointers in GC memory can also point to string literals
in .rodata and for PCH we move them.

	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pch: Fix streaming of strings with embedded null bytes
  2022-10-19 12:08   ` Jakub Jelinek
@ 2022-10-19 12:17     ` Richard Sandiford
  2022-10-19 12:23       ` Jakub Jelinek
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Sandiford @ 2022-10-19 12:17 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Lewis Hyatt via Gcc-patches, Lewis Hyatt

Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote:
>> Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
>> > (whether they live in GC-controlled memory or not) will be marked for PCH
>> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
>> > special-cases plain char* strings, and in particular it uses strlen() to get
>> > their length. Thus it does not handle strings with embedded null bytes, but it
>> > is possible for something PCH cares about (such as a string literal token in a
>> > macro definition) to contain such embedded nulls. To fix that up, add a new
>> > GTY option "string_length" so that gt_pch_note_object can be informed the
>> > actual length it ought to use, and use it in the relevant libcpp structs
>> > (cpp_string and ht_identifier) accordingly.
>> 
>> This isn't really my area, as I'm about to demonstrate with this
>> question, but: regarding
>> 
>>   if (note_ptr_fn == gt_pch_p_S)
>>     (*slot)->size = strlen ((const char *)obj) + 1;
>>   else
>>     (*slot)->size = ggc_get_size (obj);
>> 
>> do you know why the PCH code goes out of its way to handle the sizes of
>> strings specially?  Are there enough garbage strings in the string pool
>> that it's worth optimising the size of the saved memory for strings but
>> not for other types of object?  Or is the gt_pch_p_S test needed for
>> correctness, rather than just being an optimisation?
>
> Just guessing, not all GC strings live in the stringpool.
> Isn't e.g. ggc_strdup just a GC allocation where the string length
> isn't stored anywhere?

Is that different from other GC VLA allocations though?  I thought
ultimately we just tried to save and restore the containing pages.

> And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just
> returns ""; I guess const char * pointers in GC memory can also point
> to string literals in .rodata and for PCH we move them.

Ah, OK, that would definitely explain it, thanks.  In that case,
are you OK with the patch, as a way of continuing to support rodata
string pointers while also allowing embedded nuls?

Richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pch: Fix streaming of strings with embedded null bytes
  2022-10-19 12:17     ` Richard Sandiford
@ 2022-10-19 12:23       ` Jakub Jelinek
  2022-10-19 12:47         ` Lewis Hyatt
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2022-10-19 12:23 UTC (permalink / raw)
  To: Lewis Hyatt via Gcc-patches, Lewis Hyatt, richard.sandiford

On Wed, Oct 19, 2022 at 01:17:02PM +0100, Richard Sandiford wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote:
> >> Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> >> > (whether they live in GC-controlled memory or not) will be marked for PCH
> >> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> >> > special-cases plain char* strings, and in particular it uses strlen() to get
> >> > their length. Thus it does not handle strings with embedded null bytes, but it
> >> > is possible for something PCH cares about (such as a string literal token in a
> >> > macro definition) to contain such embedded nulls. To fix that up, add a new
> >> > GTY option "string_length" so that gt_pch_note_object can be informed the
> >> > actual length it ought to use, and use it in the relevant libcpp structs
> >> > (cpp_string and ht_identifier) accordingly.
> >> 
> >> This isn't really my area, as I'm about to demonstrate with this
> >> question, but: regarding
> >> 
> >>   if (note_ptr_fn == gt_pch_p_S)
> >>     (*slot)->size = strlen ((const char *)obj) + 1;
> >>   else
> >>     (*slot)->size = ggc_get_size (obj);
> >> 
> >> do you know why the PCH code goes out of its way to handle the sizes of
> >> strings specially?  Are there enough garbage strings in the string pool
> >> that it's worth optimising the size of the saved memory for strings but
> >> not for other types of object?  Or is the gt_pch_p_S test needed for
> >> correctness, rather than just being an optimisation?
> >
> > Just guessing, not all GC strings live in the stringpool.
> > Isn't e.g. ggc_strdup just a GC allocation where the string length
> > isn't stored anywhere?
> 
> Is that different from other GC VLA allocations though?  I thought
> ultimately we just tried to save and restore the containing pages.

I think just the objects in it, not entire pages (ggc_get_size (obj)
sized chunks for non-strings).

> > And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just
> > returns ""; I guess const char * pointers in GC memory can also point
> > to string literals in .rodata and for PCH we move them.
> 
> Ah, OK, that would definitely explain it, thanks.  In that case,
> are you OK with the patch, as a way of continuing to support rodata
> string pointers while also allowing embedded nuls?

LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] pch: Fix streaming of strings with embedded null bytes
  2022-10-19 12:23       ` Jakub Jelinek
@ 2022-10-19 12:47         ` Lewis Hyatt
  0 siblings, 0 replies; 12+ messages in thread
From: Lewis Hyatt @ 2022-10-19 12:47 UTC (permalink / raw)
  To: Jakub Jelinek, richard.sandiford; +Cc: Lewis Hyatt via Gcc-patches

On Wed, Oct 19, 2022 at 8:23 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Oct 19, 2022 at 01:17:02PM +0100, Richard Sandiford wrote:
> > Jakub Jelinek <jakub@redhat.com> writes:
> > > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches wrote:
> > >> Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> > >> > (whether they live in GC-controlled memory or not) will be marked for PCH
> > >> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> > >> > special-cases plain char* strings, and in particular it uses strlen() to get
> > >> > their length. Thus it does not handle strings with embedded null bytes, but it
> > >> > is possible for something PCH cares about (such as a string literal token in a
> > >> > macro definition) to contain such embedded nulls. To fix that up, add a new
> > >> > GTY option "string_length" so that gt_pch_note_object can be informed the
> > >> > actual length it ought to use, and use it in the relevant libcpp structs
> > >> > (cpp_string and ht_identifier) accordingly.
> > >>
> > >> This isn't really my area, as I'm about to demonstrate with this
> > >> question, but: regarding
> > >>
> > >>   if (note_ptr_fn == gt_pch_p_S)
> > >>     (*slot)->size = strlen ((const char *)obj) + 1;
> > >>   else
> > >>     (*slot)->size = ggc_get_size (obj);
> > >>
> > >> do you know why the PCH code goes out of its way to handle the sizes of
> > >> strings specially?  Are there enough garbage strings in the string pool
> > >> that it's worth optimising the size of the saved memory for strings but
> > >> not for other types of object?  Or is the gt_pch_p_S test needed for
> > >> correctness, rather than just being an optimisation?
> > >
> > > Just guessing, not all GC strings live in the stringpool.
> > > Isn't e.g. ggc_strdup just a GC allocation where the string length
> > > isn't stored anywhere?
> >
> > Is that different from other GC VLA allocations though?  I thought
> > ultimately we just tried to save and restore the containing pages.
>
> I think just the objects in it, not entire pages (ggc_get_size (obj)
> sized chunks for non-strings).
>
> > > And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just
> > > returns ""; I guess const char * pointers in GC memory can also point
> > > to string literals in .rodata and for PCH we move them.
> >
> > Ah, OK, that would definitely explain it, thanks.  In that case,
> > are you OK with the patch, as a way of continuing to support rodata
> > string pointers while also allowing embedded nuls?
>
> LGTM.
>

Thank you both very much, I will push that then.
My understanding is that a GTY()ed struct can contain arbitrary char*
pointers as a special case, they need not be in the string pool. They
will be silently ignored by GC marking routines if they are not within
ggc's pages (as opposed to any other pointer, which will abort if it
wasn't under ggc's control), and they will make it into PCH by the
gt_pch_note_object mechanism. For example, struct line_maps contains a
char* to store the file name for each map, which is just an ordinary
malloc()ed string owned by the cpp_reader object.

-Lewis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)
  2022-10-18 22:14 [PATCH] pch: Fix streaming of strings with embedded null bytes Lewis Hyatt
  2022-10-19 11:54 ` Richard Sandiford
@ 2023-07-04 15:50 ` Thomas Schwinge
  2023-07-04 19:56   ` Lewis Hyatt
  2023-07-05  7:50 ` GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
  2 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2023-07-04 15:50 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches, Richard Sandiford, Jakub Jelinek,
	David Malcolm

[-- Attachment #1: Type: text/plain, Size: 4817 bytes --]

Hi!

I came across this one here on my way working through another (somewhat
related) GTY issue.  I generally do understand the issue here, but do
have a question about 'unsigned int len' field in
'libcpp/include/symtab.h:struct ht_identifier':

On 2022-10-18T18:14:54-0400, Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> (whether they live in GC-controlled memory or not) will be marked for PCH
> output by the routine gt_pch_note_object in ggc-common.cc. This routine
> special-cases plain char* strings, and in particular it uses strlen() to get
> their length.

Oh, wow, this special casing for strings...  8-|

> Thus it does not handle strings with embedded null bytes, but it
> is possible for something PCH cares about (such as a string literal token in a
> macro definition) to contain such embedded nulls. To fix that up, add a new
> GTY option "string_length" so that gt_pch_note_object can be informed the
> actual length it ought to use, and use it in the relevant libcpp structs
> (cpp_string and ht_identifier) accordingly.

For your test case:

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
> @@ -0,0 +1,3 @@
> +// { dg-do compile { target c++11 } }
> +#include "pch-string-nulls.H"
> +static_assert (X[4] == '[' && X[5] == '!' && X[6] == ']', "error");

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
> @@ -0,0 +1,2 @@
> +/* Note that there is a null byte following "ABC".  */
> +#define X R"(ABC^@[!])"

..., I understand how the following is necessary:

> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -179,7 +179,11 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, CLK_GNUC17, CLK_GNUC2X,
>  /* Payload of a NUMBER, STRING, CHAR or COMMENT token.  */
>  struct GTY(()) cpp_string {
>    unsigned int len;
> -  const unsigned char *text;
> +
> +  /* TEXT is always null terminated (terminator not included in len); but this
> +     GTY markup arranges that PCH streaming works properly even if there is a
> +     null byte in the middle of the string.  */
> +  const unsigned char * GTY((string_length ("1 + %h.len"))) text;
>  };

(That is, the test case FAILs with that one reverted.)

However, this one did confuse me:

> --- a/libcpp/include/symtab.h
> +++ b/libcpp/include/symtab.h
> @@ -29,7 +29,10 @@ along with this program; see the file COPYING3.  If not see
>  typedef struct ht_identifier ht_identifier;
>  typedef struct ht_identifier *ht_identifier_ptr;
>  struct GTY(()) ht_identifier {
> -  const unsigned char *str;
> +  /* This GTY markup arranges that the null-terminated identifier would still
> +     stream to PCH correctly, if a null byte were to make its way into an
> +     identifier somehow.  */
> +  const unsigned char * GTY((string_length ("1 + %h.len"))) str;
>    unsigned int len;
>    unsigned int hash_value;
>  };

I did wonder whether that's an actual or just a theoretical concern, to
have 'ht_identifier's with embedded NULs?  If an actual concern, can we
get a test case constructed?  Otherwise, should we revert this hunk,
given that we have this auto-'strlen' handling, ignorant of embedded
NULs, in a lot of other places?

But then I realized that possibly we do maintain 'len' here not for
correctness but as an optimization (trading an 'unsigned int' for
repeated 'strlen' calls)?  My quick testing with the attached
"[RFC] Verify no embedded NULs in 'struct ht_identifier'" might confirm
this theory: no regressions (..., but I didn't bootstrap, and ran only
parts of the testsuite).  (Not proposing that RFC for 'git push', of
course.)

If that's indeed the intention here, I shall change/add source code
commentary to describe this rationale for this dedicated 'len' field
(plus, that handling of embedded NULs falls out of that, automatically).

For reference, this 'len' field has existed "forever".  Before
'struct ht_identifier' was added in
commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we
had in 'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or
earlier 'length', earlier in 'gcc/cpphash.h:struct hashnode':
'unsigned short length', earlier 'size_t length' with comment:
"length of token, for quick comparison", erlier 'int length', ever since
the 'gcc/cpp*' files were added in
commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191).


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-RFC-Verify-no-embedded-NULs-in-struct-ht_identifier.patch --]
[-- Type: text/x-diff, Size: 3729 bytes --]

From 75b54da2f30b7ffa0c69c1659e3f3538bfdbd339 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 4 Jul 2023 14:07:47 +0200
Subject: [PATCH] [RFC] Verify no embedded NULs in 'struct ht_identifier'

---
 gcc/analyzer/pending-diagnostic.cc | 2 +-
 gcc/c-family/c-spellcheck.h        | 2 +-
 gcc/c/c-decl.cc                    | 2 +-
 gcc/tree.h                         | 2 +-
 libcpp/include/symtab.h            | 4 ++--
 libcpp/symtab.cc                   | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/analyzer/pending-diagnostic.cc b/gcc/analyzer/pending-diagnostic.cc
index e36ed4fd9c1..9452719a908 100644
--- a/gcc/analyzer/pending-diagnostic.cc
+++ b/gcc/analyzer/pending-diagnostic.cc
@@ -128,7 +128,7 @@ pending_diagnostic::same_tree_p (tree t1, tree t2)
 static bool
 ht_ident_eq (ht_identifier ident, const char *str)
 {
-  return (strlen (str) == ident.len
+  return (strlen (str) == HT_LEN (&ident)
 	  && 0 == strcmp (str, (const char *)ident.str));
 }
 
diff --git a/gcc/c-family/c-spellcheck.h b/gcc/c-family/c-spellcheck.h
index e3748430b18..0f0c264ef01 100644
--- a/gcc/c-family/c-spellcheck.h
+++ b/gcc/c-family/c-spellcheck.h
@@ -31,7 +31,7 @@ struct edit_distance_traits<cpp_hashnode *>
 {
   static size_t get_length (cpp_hashnode *hashnode)
   {
-    return hashnode->ident.len;
+    return HT_LEN (&hashnode->ident);
   }
 
   static const char *get_string (cpp_hashnode *hashnode)
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 1af51c4acfc..c67ae5be514 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -4499,7 +4499,7 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind, location_t loc)
     {
       const char *id = (const char *)best_macro->ident.str;
       tree macro_as_identifier
-	= get_identifier_with_length (id, best_macro->ident.len);
+	= get_identifier_with_length (id, HT_LEN (&best_macro->ident));
       bm.set_best_so_far (macro_as_identifier,
 			  bmm.get_best_distance (),
 			  bmm.get_best_candidate_length ());
diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4..20169e7a467 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1174,7 +1174,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 /* Unlike STRING_CST, in C terms this is strlen, not sizeof.  */
 #define IDENTIFIER_LENGTH(NODE) \
-  (IDENTIFIER_NODE_CHECK (NODE)->identifier.id.len)
+  (HT_LEN (&IDENTIFIER_NODE_CHECK (NODE)->identifier.id))
 #define IDENTIFIER_POINTER(NODE) \
   ((const char *) IDENTIFIER_NODE_CHECK (NODE)->identifier.id.str)
 #define IDENTIFIER_HASH_VALUE(NODE) \
diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h
index c7ccc6db9f0..ea23e3dbedb 100644
--- a/libcpp/include/symtab.h
+++ b/libcpp/include/symtab.h
@@ -33,11 +33,11 @@ struct GTY(()) ht_identifier {
      stream to PCH correctly, if a null byte were to make its way into an
      identifier somehow.  */
   const unsigned char * GTY((string_length ("1 + %h.len"))) str;
-  unsigned int len;
+  unsigned int len_;
   unsigned int hash_value;
 };
 
-#define HT_LEN(NODE) ((NODE)->len)
+#define HT_LEN(NODE) ({ gcc_assert ((NODE)->len_ == strlen ((const char *) HT_STR (NODE))); (NODE)->len_; })
 #define HT_STR(NODE) ((NODE)->str)
 
 typedef struct ht cpp_hash_table;
diff --git a/libcpp/symtab.cc b/libcpp/symtab.cc
index ea9f26905a9..be5c1da9d67 100644
--- a/libcpp/symtab.cc
+++ b/libcpp/symtab.cc
@@ -155,7 +155,7 @@ ht_lookup_with_hash (cpp_hash_table *table, const unsigned char *str,
   node = (*table->alloc_node) (table);
   table->entries[index] = node;
 
-  HT_LEN (node) = (unsigned int) len;
+  node->len_ = (unsigned int) len;
   node->hash_value = hash;
 
   if (table->alloc_subobject)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)
  2023-07-04 15:50 ` 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
@ 2023-07-04 19:56   ` Lewis Hyatt
  2023-07-05  7:56     ` GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)) Thomas Schwinge
  0 siblings, 1 reply; 12+ messages in thread
From: Lewis Hyatt @ 2023-07-04 19:56 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: gcc-patches, Richard Sandiford, Jakub Jelinek, David Malcolm

On Tue, Jul 4, 2023 at 11:50 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
>
> Hi!
>
> I came across this one here on my way working through another (somewhat
> related) GTY issue.  I generally do understand the issue here, but do
> have a question about 'unsigned int len' field in
> 'libcpp/include/symtab.h:struct ht_identifier':
>
> On 2022-10-18T18:14:54-0400, Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> > (whether they live in GC-controlled memory or not) will be marked for PCH
> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> > special-cases plain char* strings, and in particular it uses strlen() to get
> > their length.
>
> Oh, wow, this special casing for strings...  8-|
>
> > Thus it does not handle strings with embedded null bytes, but it
> > is possible for something PCH cares about (such as a string literal token in a
> > macro definition) to contain such embedded nulls. To fix that up, add a new
> > GTY option "string_length" so that gt_pch_note_object can be informed the
> > actual length it ought to use, and use it in the relevant libcpp structs
> > (cpp_string and ht_identifier) accordingly.
>
> For your test case:
>
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
> > @@ -0,0 +1,3 @@
> > +// { dg-do compile { target c++11 } }
> > +#include "pch-string-nulls.H"
> > +static_assert (X[4] == '[' && X[5] == '!' && X[6] == ']', "error");
>
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
> > @@ -0,0 +1,2 @@
> > +/* Note that there is a null byte following "ABC".  */
> > +#define X R"(ABC^@[!])"
>
> ..., I understand how the following is necessary:
>
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -179,7 +179,11 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, CLK_GNUC17, CLK_GNUC2X,
> >  /* Payload of a NUMBER, STRING, CHAR or COMMENT token.  */
> >  struct GTY(()) cpp_string {
> >    unsigned int len;
> > -  const unsigned char *text;
> > +
> > +  /* TEXT is always null terminated (terminator not included in len); but this
> > +     GTY markup arranges that PCH streaming works properly even if there is a
> > +     null byte in the middle of the string.  */
> > +  const unsigned char * GTY((string_length ("1 + %h.len"))) text;
> >  };
>
> (That is, the test case FAILs with that one reverted.)
>
> However, this one did confuse me:
>
> > --- a/libcpp/include/symtab.h
> > +++ b/libcpp/include/symtab.h
> > @@ -29,7 +29,10 @@ along with this program; see the file COPYING3.  If not see
> >  typedef struct ht_identifier ht_identifier;
> >  typedef struct ht_identifier *ht_identifier_ptr;
> >  struct GTY(()) ht_identifier {
> > -  const unsigned char *str;
> > +  /* This GTY markup arranges that the null-terminated identifier would still
> > +     stream to PCH correctly, if a null byte were to make its way into an
> > +     identifier somehow.  */
> > +  const unsigned char * GTY((string_length ("1 + %h.len"))) str;
> >    unsigned int len;
> >    unsigned int hash_value;
> >  };
>
> I did wonder whether that's an actual or just a theoretical concern, to
> have 'ht_identifier's with embedded NULs?  If an actual concern, can we
> get a test case constructed?  Otherwise, should we revert this hunk,
> given that we have this auto-'strlen' handling, ignorant of embedded
> NULs, in a lot of other places?
>
> But then I realized that possibly we do maintain 'len' here not for
> correctness but as an optimization (trading an 'unsigned int' for
> repeated 'strlen' calls)?  My quick testing with the attached
> "[RFC] Verify no embedded NULs in 'struct ht_identifier'" might confirm
> this theory: no regressions (..., but I didn't bootstrap, and ran only
> parts of the testsuite).  (Not proposing that RFC for 'git push', of
> course.)
>
> If that's indeed the intention here, I shall change/add source code
> commentary to describe this rationale for this dedicated 'len' field
> (plus, that handling of embedded NULs falls out of that, automatically).
>
> For reference, this 'len' field has existed "forever".  Before
> 'struct ht_identifier' was added in
> commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we
> had in 'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or
> earlier 'length', earlier in 'gcc/cpphash.h:struct hashnode':
> 'unsigned short length', earlier 'size_t length' with comment:
> "length of token, for quick comparison", erlier 'int length', ever since
> the 'gcc/cpp*' files were added in
> commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191).
>

I don't think there is currently any possibility for a null byte to
end up in an ht_identifier's string. I assumed that ht_identifier
stores the length as an optimization (especially since it doesn't take
up any extra space on 64-bit platforms, given the 32-bit hash code is
stored as well there.) I created the string_length GTY markup mainly
to support another patch that I have still pending review, which I
thought would increase the likelihood of PCH needing to handle null
bytes in general. When I did that, I added the markup to ht_identifier
simply because the length was already there, so there was no reason
not to add it. It does save a few cycles when streaming out the PCH,
but I doubt it is meaningful.

-Lewis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)
  2022-10-18 22:14 [PATCH] pch: Fix streaming of strings with embedded null bytes Lewis Hyatt
  2022-10-19 11:54 ` Richard Sandiford
  2023-07-04 15:50 ` 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
@ 2023-07-05  7:50 ` Thomas Schwinge
  2023-07-05  8:13   ` Richard Biener
  2 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2023-07-05  7:50 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

Hi!

On 2022-10-18T18:14:54-0400, Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> [...] add a new
> GTY option "string_length" so that gt_pch_note_object can be informed the
> actual length it ought to use, [...]

> --- a/gcc/doc/gty.texi
> +++ b/gcc/doc/gty.texi
> @@ -196,7 +196,26 @@ static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
>  Note that the @code{length} option is only meant for use with arrays of
>  non-atomic objects, that is, objects that contain pointers pointing to
>  other GTY-managed objects.  For other GC-allocated arrays and strings
> -you should use @code{atomic}.
> +you should use @code{atomic} or @code{string_length}.
> +
> +@findex string_length
> +@item string_length ("@var{expression}")
> +
> +In order to simplify production of PCH, a structure member that is a plain
> +array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char
> +*}) is treated specially by the infrastructure. Even if such an array has not
> +been allocated in GC-controlled memory, it will still be written properly into
> +a PCH.  The machinery responsible for this needs to know the length of the
> +data; by default, the length is determined by calling @code{strlen} on the
> +pointer.  The @code{string_length} option specifies an alternate way to
> +determine the length, such as by inspecting another struct member:
> +
> +@smallexample
> +struct GTY(()) non_terminated_string @{
> +  size_t sz;
> +  const char * GTY((string_length ("%h.sz"))) data;
> +@};
> +@end smallexample

In preparation for another thing I'm working on, OK to push the attached
"GTY: Explicitly reject 'string_length' option for (fields in) global variables"
(with <https://inbox.sourceware.org/[TODO]> pointing to this message)?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-GTY-Explicitly-reject-string_length-option-for-field.patch --]
[-- Type: text/x-diff, Size: 4483 bytes --]

From 9130fe7873c2e1b44ab2449bfe022837e26f710c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 4 Jul 2023 11:46:50 +0200
Subject: [PATCH] GTY: Explicitly reject 'string_length' option for (fields in)
 global variables

This is preparational for another thing that I'm working on.  No change in
behavior -- other than a more explicit error message.

The 'string_length' option currently is not supported for (fields in) global
variables.  For example, if we apply the following (made-up) changes:

    --- gcc/c-family/c-cppbuiltin.cc
    +++ gcc/c-family/c-cppbuiltin.cc
    @@ -1777 +1777 @@ struct GTY(()) lazy_hex_fp_value_struct
    -  const char *hex_str;
    +  const char * GTY((string_length("strlen(%h.hex_str) + 1"))) hex_str;

    --- gcc/varasm.cc
    +++ gcc/varasm.cc
    @@ -66 +66 @@ along with GCC; see the file COPYING3.  If not see
    -extern GTY(()) const char *first_global_object_name;
    +extern GTY((string_length("strlen(%h.first_global_object_name) + 1"))) const char *first_global_object_name;

..., we get:

    [...]
    build/gengtype  \
                        -S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state
    /bin/sh [...]/source-gcc/gcc/../move-if-change tmp-gtype.state gtype.state
    build/gengtype  \
                        -r gtype.state
    [...]/source-gcc/gcc/varasm.cc:66: global `first_global_object_name' has unknown option `string_length'
    [...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: field `hex_str' of global `lazy_hex_fp_values[0]' has unknown option `string_length'
    make[2]: *** [Makefile:2890: s-gtype] Error 1
    [...]

These errors occur when writing "GC roots", where -- per my understanding --
'string_length' isn't relevant for actual GC purposes.  However, like
elsewhere, it is for PCH purposes, and simply accepting 'string_length' here
isn't sufficient: we'll still get '(gt_pointer_walker) &gt_pch_n_S' used in the
'struct ggc_root_tab' instances, and there's no easy way to change that to
instead use 'gt_pch_n_S2' with explicit 'size_t string_len' argument.  (At
least not sufficiently easy to justify spending any further time on, given that
I don't have an actual use for that feature.)

So, until an actual need arises, and/or to avoid the next person looking into
this having to figure out the same thing again, let's just document this
limitation:

    [...]/source-gcc/gcc/varasm.cc:66: option `string_length' not supported for global `first_global_object_name'
    [...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: option `string_length' not supported for field `hex_str' of global `lazy_hex_fp_values[0]'

This amends commit f3b957ea8b9dadfb1ed30f24f463529684b7a36a
"pch: Fix streaming of strings with embedded null bytes".

	gcc/
	* gengtype.cc (write_root, write_roots): Explicitly reject
	'string_length' option.
	* doc/gty.texi (GTY Options) <string_length>: Document.
---
 gcc/doc/gty.texi |  4 ++++
 gcc/gengtype.cc  | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 9fc509128fe..24799aa6781 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -217,6 +217,10 @@ struct GTY(()) non_terminated_string @{
 @};
 @end smallexample
 
+The @code{string_length} option currently is not supported for (fields
+in) global variables.
+@c <https://inbox.sourceware.org/[TODO]>
+
 @findex skip
 @item skip
 
diff --git a/gcc/gengtype.cc b/gcc/gengtype.cc
index 7763f40e9ab..04dbb0de8bd 100644
--- a/gcc/gengtype.cc
+++ b/gcc/gengtype.cc
@@ -4337,6 +4337,11 @@ write_root (outf_p f, pair_p v, type_p type, const char *name, int has_length,
 	      else if (strcmp (o->name, "desc") == 0
 		       && o->kind == OPTION_STRING)
 		desc = o->info.string;
+	      else if (strcmp (o->name, "string_length") == 0)
+		/* See 'doc/gty.texi'.  */
+		error_at_line (line,
+			       "option `%s' not supported for field `%s' of global `%s'",
+			       o->name, fld->name, name);
 	      else
 		error_at_line (line,
 			       "field `%s' of global `%s' has unknown option `%s'",
@@ -4535,6 +4540,11 @@ write_roots (pair_p variables, bool emit_pch)
 	  deletable_p = 1;
 	else if (strcmp (o->name, "cache") == 0)
 	  ;
+	else if (strcmp (o->name, "string_length") == 0)
+	  /* See 'doc/gty.texi'.  */
+	  error_at_line (&v->line,
+			 "option `%s' not supported for global `%s'",
+			 o->name, v->name);
 	else
 	  error_at_line (&v->line,
 			 "global `%s' has unknown option `%s'",
-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes))
  2023-07-04 19:56   ` Lewis Hyatt
@ 2023-07-05  7:56     ` Thomas Schwinge
  2023-07-05  8:15       ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2023-07-05  7:56 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches; +Cc: Richard Sandiford, Jakub Jelinek, David Malcolm

[-- Attachment #1: Type: text/plain, Size: 1604 bytes --]

Hi!

On 2023-07-04T15:56:23-0400, Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> On Tue, Jul 4, 2023 at 11:50 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
>> I came across this one here on my way working through another (somewhat
>> related) GTY issue.  I generally do understand the issue here, but do
>> have a question about 'unsigned int len' field in
>> 'libcpp/include/symtab.h:struct ht_identifier': [...]

> I don't think there is currently any possibility for a null byte to
> end up in an ht_identifier's string. I assumed that ht_identifier
> stores the length as an optimization (especially since it doesn't take
> up any extra space on 64-bit platforms, given the 32-bit hash code is
> stored as well there.) I created the string_length GTY markup mainly
> to support another patch that I have still pending review, which I
> thought would increase the likelihood of PCH needing to handle null
> bytes in general. When I did that, I added the markup to ht_identifier
> simply because the length was already there, so there was no reason
> not to add it. It does save a few cycles when streaming out the PCH,
> but I doubt it is meaningful.

Thanks for confirming.  OK thus to push the attached
"GTY: Enhance 'string_length' option documentation"?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-GTY-Enhance-string_length-option-documentation.patch --]
[-- Type: text/x-diff, Size: 2723 bytes --]

From a31b6657c26ac70c6e03b8ad81cdcb873f905716 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 5 Jul 2023 08:38:49 +0200
Subject: [PATCH] GTY: Enhance 'string_length' option documentation

We're (currently) not aware of any actual use of 'ht_identifier's with NUL
characters embedded; its 'len' field appears to exist for optimization
purposes, since "forever".  Before 'struct ht_identifier' was added in
commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we had in
'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or earlier 'length',
earlier in 'gcc/cpphash.h:struct hashnode': 'unsigned short length', earlier
'size_t length' with comment: "length of token, for quick comparison", earlier
'int length', ever since the 'gcc/cpp*' files were added in
commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191).

This amends commit f3b957ea8b9dadfb1ed30f24f463529684b7a36a
"pch: Fix streaming of strings with embedded null bytes".

	gcc/
	* doc/gty.texi (GTY Options) <string_length>: Enhance.
	libcpp/
	* include/symtab.h (struct ht_identifier): Document different
	rationale.
---
 gcc/doc/gty.texi        | 11 +++++++++++
 libcpp/include/symtab.h |  4 +---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 7bd064b5781..15f9fa07405 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -217,6 +217,17 @@ struct GTY(()) non_terminated_string @{
 @};
 @end smallexample
 
+Similarly, this is useful for (regular NUL-terminated) strings with
+NUL characters embedded (that the default @code{strlen} use would run
+afoul of):
+
+@smallexample
+struct GTY(()) multi_string @{
+  const char * GTY((string_length ("%h.len + 1"))) str;
+  size_t len;
+@};
+@end smallexample
+
 The @code{string_length} option currently is not supported for (fields
 in) global variables.
 @c <https://inbox.sourceware.org/87bkgqvlst.fsf@euler.schwinge.homeip.net>
diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h
index c7ccc6db9f0..0c713f2ad30 100644
--- a/libcpp/include/symtab.h
+++ b/libcpp/include/symtab.h
@@ -29,9 +29,7 @@ along with this program; see the file COPYING3.  If not see
 typedef struct ht_identifier ht_identifier;
 typedef struct ht_identifier *ht_identifier_ptr;
 struct GTY(()) ht_identifier {
-  /* This GTY markup arranges that the null-terminated identifier would still
-     stream to PCH correctly, if a null byte were to make its way into an
-     identifier somehow.  */
+  /* We know the 'len'gth of the 'str'ing; use it in the GTY markup.  */
   const unsigned char * GTY((string_length ("1 + %h.len"))) str;
   unsigned int len;
   unsigned int hash_value;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)
  2023-07-05  7:50 ` GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
@ 2023-07-05  8:13   ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2023-07-05  8:13 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Lewis Hyatt, gcc-patches

On Wed, Jul 5, 2023 at 9:51 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
>
> Hi!
>
> On 2022-10-18T18:14:54-0400, Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > [...] add a new
> > GTY option "string_length" so that gt_pch_note_object can be informed the
> > actual length it ought to use, [...]
>
> > --- a/gcc/doc/gty.texi
> > +++ b/gcc/doc/gty.texi
> > @@ -196,7 +196,26 @@ static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
> >  Note that the @code{length} option is only meant for use with arrays of
> >  non-atomic objects, that is, objects that contain pointers pointing to
> >  other GTY-managed objects.  For other GC-allocated arrays and strings
> > -you should use @code{atomic}.
> > +you should use @code{atomic} or @code{string_length}.
> > +
> > +@findex string_length
> > +@item string_length ("@var{expression}")
> > +
> > +In order to simplify production of PCH, a structure member that is a plain
> > +array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char
> > +*}) is treated specially by the infrastructure. Even if such an array has not
> > +been allocated in GC-controlled memory, it will still be written properly into
> > +a PCH.  The machinery responsible for this needs to know the length of the
> > +data; by default, the length is determined by calling @code{strlen} on the
> > +pointer.  The @code{string_length} option specifies an alternate way to
> > +determine the length, such as by inspecting another struct member:
> > +
> > +@smallexample
> > +struct GTY(()) non_terminated_string @{
> > +  size_t sz;
> > +  const char * GTY((string_length ("%h.sz"))) data;
> > +@};
> > +@end smallexample
>
> In preparation for another thing I'm working on, OK to push the attached
> "GTY: Explicitly reject 'string_length' option for (fields in) global variables"
> (with <https://inbox.sourceware.org/[TODO]> pointing to this message)?

OK

>
> Grüße
>  Thomas
>
>
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes))
  2023-07-05  7:56     ` GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)) Thomas Schwinge
@ 2023-07-05  8:15       ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2023-07-05  8:15 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Lewis Hyatt, gcc-patches, Richard Sandiford, Jakub Jelinek,
	David Malcolm

On Wed, Jul 5, 2023 at 9:57 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
>
> Hi!
>
> On 2023-07-04T15:56:23-0400, Lewis Hyatt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > On Tue, Jul 4, 2023 at 11:50 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
> >> I came across this one here on my way working through another (somewhat
> >> related) GTY issue.  I generally do understand the issue here, but do
> >> have a question about 'unsigned int len' field in
> >> 'libcpp/include/symtab.h:struct ht_identifier': [...]
>
> > I don't think there is currently any possibility for a null byte to
> > end up in an ht_identifier's string. I assumed that ht_identifier
> > stores the length as an optimization (especially since it doesn't take
> > up any extra space on 64-bit platforms, given the 32-bit hash code is
> > stored as well there.) I created the string_length GTY markup mainly
> > to support another patch that I have still pending review, which I
> > thought would increase the likelihood of PCH needing to handle null
> > bytes in general. When I did that, I added the markup to ht_identifier
> > simply because the length was already there, so there was no reason
> > not to add it. It does save a few cycles when streaming out the PCH,
> > but I doubt it is meaningful.
>
> Thanks for confirming.  OK thus to push the attached
> "GTY: Enhance 'string_length' option documentation"?

OK.

> Grüße
>  Thomas
>
>
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-05  8:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18 22:14 [PATCH] pch: Fix streaming of strings with embedded null bytes Lewis Hyatt
2022-10-19 11:54 ` Richard Sandiford
2022-10-19 12:08   ` Jakub Jelinek
2022-10-19 12:17     ` Richard Sandiford
2022-10-19 12:23       ` Jakub Jelinek
2022-10-19 12:47         ` Lewis Hyatt
2023-07-04 15:50 ` 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
2023-07-04 19:56   ` Lewis Hyatt
2023-07-05  7:56     ` GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)) Thomas Schwinge
2023-07-05  8:15       ` Richard Biener
2023-07-05  7:50 ` GTY: Explicitly reject 'string_length' option for (fields in) global variables (was: [PATCH] pch: Fix streaming of strings with embedded null bytes) Thomas Schwinge
2023-07-05  8:13   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).