public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] expr: build string_constant only for a char type
@ 2020-07-27 12:32 Martin Liška
  2020-07-27 13:16 ` Jakub Jelinek
  2020-07-27 15:53 ` Martin Sebor
  0 siblings, 2 replies; 10+ messages in thread
From: Martin Liška @ 2020-07-27 12:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Martin Sebor, Jakub Jelinek

Hey.

As mentioned in the PR, we should not create a string constant for a type
that is different from char_type_node. Looking at expr.c, I was inspired
and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
type is a character type.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests. And it fixes chromium
build with gcc-10 branch with the patch applied.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

	PR tree-optimization/96058
	* expr.c (string_constant): Build string_constant only
	for a type that is main variant of char_type_node.
---
  gcc/expr.c | 22 +++++++++++++---------
  1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 5db0a7a8565..c3fdd82b319 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11828,17 +11828,21 @@ string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
  	chartype = TREE_TYPE (chartype);
        while (TREE_CODE (chartype) == ARRAY_TYPE)
  	chartype = TREE_TYPE (chartype);
-      /* Convert a char array to an empty STRING_CST having an array
-	 of the expected type and size.  */
-      if (!initsize)
-	  initsize = integer_zero_node;
  
-      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
-      init = build_string_literal (size, NULL, chartype, size);
-      init = TREE_OPERAND (init, 0);
-      init = TREE_OPERAND (init, 0);
+      if (TYPE_MAIN_VARIANT (chartype) == char_type_node)
+	{
+	  /* Convert a char array to an empty STRING_CST having an array
+	     of the expected type and size.  */
+	  if (!initsize)
+	    initsize = integer_zero_node;
+
+	  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
+	  init = build_string_literal (size, NULL, chartype, size);
+	  init = TREE_OPERAND (init, 0);
+	  init = TREE_OPERAND (init, 0);
  
-      *ptr_offset = integer_zero_node;
+	  *ptr_offset = integer_zero_node;
+	}
      }
  
    if (decl)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 12:32 [PATCH] expr: build string_constant only for a char type Martin Liška
@ 2020-07-27 13:16 ` Jakub Jelinek
  2020-07-27 14:12   ` Martin Liška
  2020-07-27 15:53 ` Martin Sebor
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2020-07-27 13:16 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches

On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote:
> As mentioned in the PR, we should not create a string constant for a type
> that is different from char_type_node. Looking at expr.c, I was inspired
> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
> type is a character type.

That doesn't look correct, there is char, signed char, unsigned char,
or say std::byte, and all of them are perfectly fine.
So, rather than requiring it is char and nothing else, you should instead
check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?),
which is complete and has the same TYPE_PRECISION as char_type_node.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 13:16 ` Jakub Jelinek
@ 2020-07-27 14:12   ` Martin Liška
  2020-07-27 14:14     ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Liška @ 2020-07-27 14:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On 7/27/20 3:16 PM, Jakub Jelinek wrote:
> On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote:
>> As mentioned in the PR, we should not create a string constant for a type
>> that is different from char_type_node. Looking at expr.c, I was inspired
>> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
>> type is a character type.
> 
> That doesn't look correct, there is char, signed char, unsigned char,
> or say std::byte, and all of them are perfectly fine.
> So, rather than requiring it is char and nothing else, you should instead
> check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?),
> which is complete and has the same TYPE_PRECISION as char_type_node.

All right, the following survives tests and bootstraps.

Ready to be installed?
Thanks,
Martin

> 
> 	Jakub
> 


[-- Attachment #2: 0001-expr-build-string_constant-only-for-a-char-type.patch --]
[-- Type: text/x-patch, Size: 1770 bytes --]

From d40967eb2ef27b512ae177b1aee5e85ac2246acd Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Mon, 27 Jul 2020 12:30:24 +0200
Subject: [PATCH] expr: build string_constant only for a char type

gcc/ChangeLog:

	PR tree-optimization/96058
	* expr.c (string_constant): Build string_constant only
	for a type that has same precision as char_type_node
	and is an integral type.
---
 gcc/expr.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 5db0a7a8565..a150fa0d3b5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11828,17 +11828,22 @@ string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
 	chartype = TREE_TYPE (chartype);
       while (TREE_CODE (chartype) == ARRAY_TYPE)
 	chartype = TREE_TYPE (chartype);
-      /* Convert a char array to an empty STRING_CST having an array
-	 of the expected type and size.  */
-      if (!initsize)
-	  initsize = integer_zero_node;
 
-      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
-      init = build_string_literal (size, NULL, chartype, size);
-      init = TREE_OPERAND (init, 0);
-      init = TREE_OPERAND (init, 0);
+      if (INTEGRAL_TYPE_P (chartype)
+	  && TYPE_PRECISION (chartype) == TYPE_PRECISION (char_type_node))
+	{
+	  /* Convert a char array to an empty STRING_CST having an array
+	     of the expected type and size.  */
+	  if (!initsize)
+	    initsize = integer_zero_node;
+
+	  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
+	  init = build_string_literal (size, NULL, chartype, size);
+	  init = TREE_OPERAND (init, 0);
+	  init = TREE_OPERAND (init, 0);
 
-      *ptr_offset = integer_zero_node;
+	  *ptr_offset = integer_zero_node;
+	}
     }
 
   if (decl)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 14:12   ` Martin Liška
@ 2020-07-27 14:14     ` Jakub Jelinek
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2020-07-27 14:14 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches

On Mon, Jul 27, 2020 at 04:12:09PM +0200, Martin Liška wrote:
> On 7/27/20 3:16 PM, Jakub Jelinek wrote:
> > On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote:
> > > As mentioned in the PR, we should not create a string constant for a type
> > > that is different from char_type_node. Looking at expr.c, I was inspired
> > > and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
> > > type is a character type.
> > 
> > That doesn't look correct, there is char, signed char, unsigned char,
> > or say std::byte, and all of them are perfectly fine.
> > So, rather than requiring it is char and nothing else, you should instead
> > check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?),
> > which is complete and has the same TYPE_PRECISION as char_type_node.
> 
> All right, the following survives tests and bootstraps.

LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 12:32 [PATCH] expr: build string_constant only for a char type Martin Liška
  2020-07-27 13:16 ` Jakub Jelinek
@ 2020-07-27 15:53 ` Martin Sebor
  2020-07-27 18:54   ` Martin Liška
  2020-07-27 20:49   ` Jakub Jelinek
  1 sibling, 2 replies; 10+ messages in thread
From: Martin Sebor @ 2020-07-27 15:53 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: Jakub Jelinek

On 7/27/20 6:32 AM, Martin Liška wrote:
> Hey.
> 
> As mentioned in the PR, we should not create a string constant for a type
> that is different from char_type_node. Looking at expr.c, I was inspired
> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that 
> underlying
> type is a character type.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. 
> And it fixes chromium
> build with gcc-10 branch with the patch applied.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>      PR tree-optimization/96058
>      * expr.c (string_constant): Build string_constant only
>      for a type that is main variant of char_type_node.
> ---
>   gcc/expr.c | 22 +++++++++++++---------
>   1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 5db0a7a8565..c3fdd82b319 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11828,17 +11828,21 @@ string_constant (tree arg, tree *ptr_offset, 
> tree *mem_size, tree *decl)
>       chartype = TREE_TYPE (chartype);
>         while (TREE_CODE (chartype) == ARRAY_TYPE)
>       chartype = TREE_TYPE (chartype);
> -      /* Convert a char array to an empty STRING_CST having an array
> -     of the expected type and size.  */
> -      if (!initsize)
> -      initsize = integer_zero_node;
> 
> -      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
> -      init = build_string_literal (size, NULL, chartype, size);
> -      init = TREE_OPERAND (init, 0);
> -      init = TREE_OPERAND (init, 0);
> +      if (TYPE_MAIN_VARIANT (chartype) == char_type_node)

The change to c_getstr I recently committed made it clear that
the function can:

   Return a pointer P to a NUL-terminated string containing
   the sequence of bytes corresponding to the representation
   of the object referred to by SRC (or a subsequence of such
   bytes within it if SRC is a reference to an initialized
   constant array plus some constant offset).

I.e., c_getstr returns a STRING_CST for arbitrary non-string
constants.  This enables optimizations like the by-pieces
expansion of calls to raw memory functions like memcpy, or
the folding of other raw memory calls like memchr with non-
string arguments.

c_getstr relies on string_constant for that.  Restricting
the latter function to just character types prevents these
optimizations for zero-initialized constants of other types.
A test case that shows the difference to the by-pieces
expansion goes something like this:

   const struct { char a[64]; } x = { 0 };

   void f (void *d)
   {
     __builtin_memcpy (d, &x, sizeof x - 1);
   }

A test case for the memchr folding is similar:

   const struct { char a[64]; } x = { 0 };

   int f (void *d)
   {
     return __builtin_memchr (&x, 0, sizeof x) != 0;
   }

The tests I committed with the change didn't exercise any of
this so that's my bad.  I'm still not sure I understand how
the problem with the incomplete type comes up (I haven't had
a chance to look into the recent updates on the bug yet) but
to retain the optimization (and keep the comments in sync
with the code) I think a better solution than restricting
the function to integers is to limit it to complete types.
Beyond that, extending the function to also constant arrays
or nonzero aggregates will also enable the optimization for
those.

Martin

> +    {
> +      /* Convert a char array to an empty STRING_CST having an array
> +         of the expected type and size.  */
> +      if (!initsize)
> +        initsize = integer_zero_node;
> +
> +      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
> +      init = build_string_literal (size, NULL, chartype, size);
> +      init = TREE_OPERAND (init, 0);
> +      init = TREE_OPERAND (init, 0);
> 
> -      *ptr_offset = integer_zero_node;
> +      *ptr_offset = integer_zero_node;
> +    }
>       }
> 
>     if (decl)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 15:53 ` Martin Sebor
@ 2020-07-27 18:54   ` Martin Liška
  2020-07-27 19:39     ` Martin Sebor
  2020-07-27 20:49   ` Jakub Jelinek
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Liška @ 2020-07-27 18:54 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches; +Cc: Jakub Jelinek

On 7/27/20 5:53 PM, Martin Sebor wrote:
> The tests I committed with the change didn't exercise any of
> this so that's my bad.  I'm still not sure I understand how
> the problem with the incomplete type comes up (I haven't had
> a chance to look into the recent updates on the bug yet) but
> to retain the optimization (and keep the comments in sync
> with the code) I think a better solution than restricting
> the function to integers is to limit it to complete types.
> Beyond that, extending the function to also constant arrays
> or nonzero aggregates will also enable the optimization for
> those.

Hello.

I must admit that I'm not super-familiar with that code I modified.
Can you please assign the PR and propose a proper fix? I can then
test it on chromium and I'm also deferring backport of the patch I installed
to master.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 18:54   ` Martin Liška
@ 2020-07-27 19:39     ` Martin Sebor
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Sebor @ 2020-07-27 19:39 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: Jakub Jelinek

On 7/27/20 12:54 PM, Martin Liška wrote:
> On 7/27/20 5:53 PM, Martin Sebor wrote:
>> The tests I committed with the change didn't exercise any of
>> this so that's my bad.  I'm still not sure I understand how
>> the problem with the incomplete type comes up (I haven't had
>> a chance to look into the recent updates on the bug yet) but
>> to retain the optimization (and keep the comments in sync
>> with the code) I think a better solution than restricting
>> the function to integers is to limit it to complete types.
>> Beyond that, extending the function to also constant arrays
>> or nonzero aggregates will also enable the optimization for
>> those.
> 
> Hello.
> 
> I must admit that I'm not super-familiar with that code I modified.
> Can you please assign the PR and propose a proper fix? I can then
> test it on chromium and I'm also deferring backport of the patch I 
> installed
> to master.

Sure.  I've been trying to wrap something up and it's been taking
longer than I expected.  I'll look into this as soon as I'm done,
hopefully tomorrow.

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 15:53 ` Martin Sebor
  2020-07-27 18:54   ` Martin Liška
@ 2020-07-27 20:49   ` Jakub Jelinek
  2020-07-28  7:00     ` Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2020-07-27 20:49 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Martin Liška, gcc-patches

On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote:
>   Return a pointer P to a NUL-terminated string containing
>   the sequence of bytes corresponding to the representation
>   of the object referred to by SRC (or a subsequence of such
>   bytes within it if SRC is a reference to an initialized
>   constant array plus some constant offset).
> 
> I.e., c_getstr returns a STRING_CST for arbitrary non-string
> constants.  This enables optimizations like the by-pieces
> expansion of calls to raw memory functions like memcpy, or
> the folding of other raw memory calls like memchr with non-
> string arguments.
> 
> c_getstr relies on string_constant for that.  Restricting
> the latter function to just character types prevents these
> optimizations for zero-initialized constants of other types.
> A test case that shows the difference to the by-pieces
> expansion goes something like this:

Having STRING_CST in the compiler with arbitrary array type is IMHO a very
bad idea, so if you want something like that, you should come up with a
different representation for that, not STRING_CSTs.
Because most of the compiler assumes STRING_CSTs are what it says, string
literals where elements are some kind of characters, don't have to be
narrow, but better should be integral.
Maybe returning a CONSTRUCTOR with no elements with the right type is a
better idea for that, that in the compiler stands for zero initialized
aggregate.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-27 20:49   ` Jakub Jelinek
@ 2020-07-28  7:00     ` Richard Biener
  2020-07-28 10:38       ` Martin Liška
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2020-07-28  7:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Martin Sebor, GCC Patches

On Mon, Jul 27, 2020 at 10:49 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote:
> >   Return a pointer P to a NUL-terminated string containing
> >   the sequence of bytes corresponding to the representation
> >   of the object referred to by SRC (or a subsequence of such
> >   bytes within it if SRC is a reference to an initialized
> >   constant array plus some constant offset).
> >
> > I.e., c_getstr returns a STRING_CST for arbitrary non-string
> > constants.  This enables optimizations like the by-pieces
> > expansion of calls to raw memory functions like memcpy, or
> > the folding of other raw memory calls like memchr with non-
> > string arguments.
> >
> > c_getstr relies on string_constant for that.  Restricting
> > the latter function to just character types prevents these
> > optimizations for zero-initialized constants of other types.
> > A test case that shows the difference to the by-pieces
> > expansion goes something like this:
>
> Having STRING_CST in the compiler with arbitrary array type is IMHO a very
> bad idea, so if you want something like that, you should come up with a
> different representation for that, not STRING_CSTs.
> Because most of the compiler assumes STRING_CSTs are what it says, string
> literals where elements are some kind of characters, don't have to be
> narrow, but better should be integral.
> Maybe returning a CONSTRUCTOR with no elements with the right type is a
> better idea for that, that in the compiler stands for zero initialized
> aggregate.

It's also a much shorter representation (that also works for strings, btw)
if it is all about all-zero "constants".

Richard.

>         Jakub
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] expr: build string_constant only for a char type
  2020-07-28  7:00     ` Richard Biener
@ 2020-07-28 10:38       ` Martin Liška
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Liška @ 2020-07-28 10:38 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: GCC Patches

On 7/28/20 9:00 AM, Richard Biener via Gcc-patches wrote:
> On Mon, Jul 27, 2020 at 10:49 PM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote:
>>>    Return a pointer P to a NUL-terminated string containing
>>>    the sequence of bytes corresponding to the representation
>>>    of the object referred to by SRC (or a subsequence of such
>>>    bytes within it if SRC is a reference to an initialized
>>>    constant array plus some constant offset).
>>>
>>> I.e., c_getstr returns a STRING_CST for arbitrary non-string
>>> constants.  This enables optimizations like the by-pieces
>>> expansion of calls to raw memory functions like memcpy, or
>>> the folding of other raw memory calls like memchr with non-
>>> string arguments.
>>>
>>> c_getstr relies on string_constant for that.  Restricting
>>> the latter function to just character types prevents these
>>> optimizations for zero-initialized constants of other types.
>>> A test case that shows the difference to the by-pieces
>>> expansion goes something like this:
>>
>> Having STRING_CST in the compiler with arbitrary array type is IMHO a very
>> bad idea, so if you want something like that, you should come up with a
>> different representation for that, not STRING_CSTs.
>> Because most of the compiler assumes STRING_CSTs are what it says, string
>> literals where elements are some kind of characters, don't have to be
>> narrow, but better should be integral.
>> Maybe returning a CONSTRUCTOR with no elements with the right type is a
>> better idea for that, that in the compiler stands for zero initialized
>> aggregate.
> 
> It's also a much shorter representation (that also works for strings, btw)
> if it is all about all-zero "constants".
> 
> Richard.
> 
>>          Jakub
>>

Based on the upcoming discussion, I decided to backport my current fix to releases/gcc-10
branch in order to fix the problematic ICE in chromium. I'm ready to backport whatever
Martin comes up with.

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-28 10:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-27 12:32 [PATCH] expr: build string_constant only for a char type Martin Liška
2020-07-27 13:16 ` Jakub Jelinek
2020-07-27 14:12   ` Martin Liška
2020-07-27 14:14     ` Jakub Jelinek
2020-07-27 15:53 ` Martin Sebor
2020-07-27 18:54   ` Martin Liška
2020-07-27 19:39     ` Martin Sebor
2020-07-27 20:49   ` Jakub Jelinek
2020-07-28  7:00     ` Richard Biener
2020-07-28 10:38       ` Martin Liška

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).