public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] PR tree-optimization/98335: Improvements to DSE's compute_trims.
@ 2022-03-09 17:43 Roger Sayle
  2022-03-11 11:34 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Roger Sayle @ 2022-03-09 17:43 UTC (permalink / raw)
  To: 'Richard Biener'; +Cc: 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 7022 bytes --]


Hi Richard,
Many thanks.  Yes, your proposed ao_ref_alignment is exactly what I was looking for.
Here's the second revision of my patch for PR tree-optimization/98335 that both uses/
introduces ao_ref_alignment and more intelligently aligns/trims both head and tail,
for example handling the case discussed by Richard and Jeff Law, of a 16 byte-aligned
object where we wish to avoid trimming (just) the last three bytes.  It uses the useful
property that writing N consecutive bytes, typically requires popcount(N) store
instructions, so we wish to align (if we can) that we begin/end with a store of N' bytes
where popcount(N') is one, if that isn't already the case.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
make -k check with no new failures.  Is this revised version ok for mainline?

2022-03-09  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR tree-optimization/98335
	* builtins.cc (get_object_alignment_2): Export.
	* builtins.h (get_object_alignment_2): Likewise.
	* tree-ssa-alias.cc (ao_ref_alignment): New.
	* tree-ssa-alias.h (ao_ref_alignment): Declare.

	* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
	to align head/tail, writing more bytes but using fewer store insns.
	(maybe_trim_memstar_call): Silence compiler warnings by using
	memset to initialize lendata.

gcc/testsuite/ChangeLog
	PR tree-optimization/98335
	* g++.dg/pr98335.C: New test case.
	* gcc.dg/pr86010.c: New test case.
	* gcc.dg/pr86010-2.c: New test case.

Thanks again for your help.
Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 08 March 2022 10:44
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] PR tree-optimization/98335: Improvements to DSE's
> compute_trims.
> 
> On Tue, Mar 8, 2022 at 11:10 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Mar 7, 2022 at 11:04 AM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> > >
> > >
> > > This patch is the main middle-end piece of a fix for PR
> > > tree-opt/98335, which is a code-quality regression affecting
> > > mainline.  The issue occurs in DSE's (dead store elimination's)
> > > compute_trims function that determines where a store to memory can
> > > be trimmed.  In the testcase given in the PR, this function notices
> > > that the first byte of a DImode store is dead, and replaces the
> > > 8-byte store at (aligned) offset zero, with a 7-byte store at
> > > (unaligned) offset one.  Most architectures can store a power-of-two
> > > bytes (up to a maximum) in single instruction, so writing 7 bytes
> > > requires more instructions than writing 8 bytes.  This patch follows Jakub
> Jelinek's suggestion in comment 5, that compute_trims needs improved
> heuristics.
> > >
> > > In this patch, decision of whether and how to align trim_head is
> > > based on the number of bytes being written, the alignment of the
> > > start of the object and where within the object the first byte is
> > > written.  The first tests check whether we're already writing to the
> > > start of the object, and that we're writing three or more bytes.  If
> > > we're only writing one or two bytes, there's no benefit from providing
> additional alignment.
> > > Then we determine the alignment of the object, which is either 1, 2,
> > > 4, 8 or 16 byte aligned (capping at 16 guarantees that we never
> > > write more than 7 bytes beyond the minimum required).  If the buffer
> > > is only
> > > 1 or 2 byte aligned there's no benefit from additional alignment.
> > > For the remaining cases, alignment of trim_head is based upon where
> > > within each aligned block (word) the first byte is written.  For
> > > example, storing the last byte (or last half-word) of a word can be
> > > performed with a single insn.
> > >
> > > On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
> > >
> > >         movl    $0, -24(%rsp)
> > >         movabsq $72057594037927935, %rdx
> > >         movl    $0, -21(%rsp)
> > >         andq    -24(%rsp), %rdx
> > >         movq    %rdx, %rax
> > >         salq    $8, %rax
> > >         movb    c(%rip), %al
> > >         ret
> > >
> > > to
> > >
> > >         xorl    %eax, %eax
> > >         movb    c(%rip), %al
> > >         ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > bootstrap and make -k check with no new failures.  I've also added
> > > new testcases for the original motivating PR
> > > tree-optimization/86010, to ensure that those remain optimized (in future).
> Ok for mainline?
> >
> > diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc index
> > 2b22a61..080e406 100644
> > --- a/gcc/tree-ssa-dse.cc
> > +++ b/gcc/tree-ssa-dse.cc
> > @@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int
> > *trim_head, int *trim_tail,
> >    int first_live = bitmap_first_set_bit (live);
> >    *trim_head = first_live - first_orig;
> >
> > -  /* If more than a word remains, then make sure to keep the
> > -     starting point at least word aligned.  */
> > -  if (last_live - first_live > UNITS_PER_WORD)
> > -    *trim_head &= ~(UNITS_PER_WORD - 1);
> > +  /* If REF is aligned, try to maintain this alignment if it reduces
> > +     the number of (power-of-two sized aligned) writes to memory.
> > +     First check that we're writing >= 3 bytes at a non-zero offset.
> > + */  if (first_live
> > +      && last_live - first_live >= 2)
> > +    {
> > +      unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base));
> >
> > you can't simply use TYPE_ALIGN_* on ref->base.  You can use
> > get_object_alignment on ref->ref, but ref->ref can be NULL in case the
> > ref was initialized from a builtin call like memcpy.
> >
> > Also ref->base is offsetted by ref->offset which you don't seem to
> > account for.  In theory one could export get_object_alignment_2 and if
> > ref->ref is NULL, use that on ref->base, passing addr_p = true, and
> > then adjust the resulting bitpos by ref->offset and fix align
> > accordingly (trimming might also align an access if the original
> > access was offsetted from known alignment).
> >
> > That said, a helper like ao_ref_alignment () might be useful here.
> 
> Like the attached - free feel to use that.
> 
> Richard.
> 
> >
> > I wonder if we can apply good heuristics to compute_trims without
> > taking into account context, like maybe_trimp_complex_store is already
> > limiting itself to useful subsets and the constructor and memstar
> > cases will only benefit if they end up being expanded inline via
> > *_by_pieces, not if expanded as a call.
> >
> > You don't seem to adjust *trim_tail at all, if an aligned 16 byte
> > region is trimmed there by 3 that will result in two extra stores as well, no?
> >



[-- Attachment #2: patchct6a.txt --]
[-- Type: text/plain, Size: 7322 bytes --]

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index d784a57..4c6c293 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -233,7 +233,7 @@ called_as_built_in (tree node)
    If ADDR_P is true we are taking the address of the memory reference EXP
    and thus cannot rely on the access taking place.  */
 
-static bool
+bool
 get_object_alignment_2 (tree exp, unsigned int *alignp,
 			unsigned HOST_WIDE_INT *bitposp, bool addr_p)
 {
diff --git a/gcc/builtins.h b/gcc/builtins.h
index ea10b4b..5ad830c 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -52,6 +52,8 @@ extern bool force_folding_builtin_constant_p;
 extern bool called_as_built_in (tree);
 extern bool get_object_alignment_1 (tree, unsigned int *,
 				    unsigned HOST_WIDE_INT *);
+extern bool get_object_alignment_2 (tree, unsigned int *,
+				    unsigned HOST_WIDE_INT *, bool);
 extern unsigned int get_object_alignment (tree);
 extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 3e8d245..50bd47b 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -790,6 +790,29 @@ ao_ref_alias_ptr_type (ao_ref *ref)
   return ret;
 }
 
+/* Return the alignment of the access *REF and store it in the *ALIGN
+   and *BITPOS pairs.  Returns false if no alignment could be determined.
+   See get_object_alignment_2 for details.  */
+
+bool
+ao_ref_alignment (ao_ref *ref, unsigned int *align,
+		  unsigned HOST_WIDE_INT *bitpos)
+{
+  if (ref->ref)
+    return get_object_alignment_1 (ref->ref, align, bitpos);
+
+  /* When we just have ref->base we cannot use get_object_alignment since
+     that will eventually use the type of the appearant access while for
+     example ao_ref_init_from_ptr_and_range is not careful to adjust that.  */
+  *align = BITS_PER_UNIT;
+  HOST_WIDE_INT offset;
+  if (!ref->offset.is_constant (&offset)
+      || !get_object_alignment_2 (ref->base, align, bitpos, true))
+    return false;
+  *bitpos += (unsigned HOST_WIDE_INT)offset * BITS_PER_UNIT;
+  *bitpos = *bitpos & (*align - 1);
+  return true;
+}
 
 /* Init an alias-oracle reference representation from a gimple pointer
    PTR a range specified by OFFSET, SIZE and MAX_SIZE under the assumption
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index 6ae1a65..dfb2127 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -119,6 +119,8 @@ extern alias_set_type ao_ref_alias_set (ao_ref *);
 extern alias_set_type ao_ref_base_alias_set (ao_ref *);
 extern tree ao_ref_alias_ptr_type (ao_ref *);
 extern tree ao_ref_base_alias_ptr_type (ao_ref *);
+extern bool ao_ref_alignment (ao_ref *, unsigned int *,
+			      unsigned HOST_WIDE_INT *);
 extern bool ptr_deref_may_alias_global_p (tree);
 extern bool ptr_derefs_may_alias_p (tree, tree);
 extern bool ptrs_compare_unequal (tree, tree);
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 2b22a61..49f91e2 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -405,10 +405,56 @@ compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail,
   int first_live = bitmap_first_set_bit (live);
   *trim_head = first_live - first_orig;
 
-  /* If more than a word remains, then make sure to keep the
-     starting point at least word aligned.  */
-  if (last_live - first_live > UNITS_PER_WORD)
-    *trim_head &= ~(UNITS_PER_WORD - 1);
+  /* If REF is aligned, try to maintain this alignment if it reduces
+     the number of (power-of-two sized aligned) writes to memory.  */
+  unsigned int align_bits;
+  unsigned HOST_WIDE_INT bitpos;
+  if ((*trim_head || *trim_tail)
+      && last_live - first_live >= 2
+      && ao_ref_alignment (ref, &align_bits, &bitpos)
+      && align_bits >= 32
+      && bitpos == 0
+      && align_bits % BITS_PER_UNIT == 0)
+    {
+      unsigned int align_units = align_bits / BITS_PER_UNIT;
+      if (align_units > 16)
+	align_units = 16;
+      while ((first_live | (align_units - 1)) > (unsigned int)last_live)
+	align_units >>= 1;
+
+      if (*trim_head)
+	{
+	  unsigned int pos = first_live & (align_units - 1);
+	  for (unsigned int i = 1; i <= align_units; i <<= 1)
+	    {
+	      unsigned int mask = ~(i - 1);
+	      unsigned int bytes = align_units - (pos & mask);
+	      if (wi::popcount (bytes) <= 1)
+		{
+		  *trim_head &= mask;
+		  break;
+		}
+	    }
+	}
+
+      if (*trim_tail)
+	{
+	  unsigned int pos = last_live & (align_units - 1);
+	  for (unsigned int i = 1; i <= align_units; i <<= 1)
+	    {
+	      int mask = i - 1;
+	      unsigned int bytes = (pos | mask) + 1;
+	      if ((last_live | mask) > (last_live + *trim_tail))
+		break;
+	      if (wi::popcount (bytes) <= 1)
+		{
+		  unsigned int extra = (last_live | mask) - last_live;
+		  *trim_tail -= extra;
+		  break;
+		}
+	    }
+	}
+    }
 
   if ((*trim_head || *trim_tail)
       && dump_file && (dump_flags & TDF_DETAILS))
@@ -590,7 +636,8 @@ maybe_trim_memstar_call (ao_ref *ref, sbitmap live, gimple *stmt)
 	     about those bytes, the presence or absence of '\0' bytes
 	     in there will affect whether it acts for the non-trimmed
 	     bytes as memset or memcpy/strncpy.  */
-	  c_strlen_data lendata = { };
+	  c_strlen_data lendata;
+	  memset (&lendata, 0, sizeof (lendata));
 	  int orig_head_trim = head_trim;
 	  tree srcstr = gimple_call_arg (stmt, 1);
 	  if (!get_range_strlen (srcstr, &lendata, /*eltsize=*/1)
diff --git a/gcc/testsuite/g++.dg/pr98335.C b/gcc/testsuite/g++.dg/pr98335.C
new file mode 100644
index 0000000..c54f4d9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr98335.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+struct Data {
+  char a;
+  int b;
+};
+
+char c;
+
+Data val(int idx) {
+  return { c };  // { dg-warning "extended initializer" "c++ 98"  { target { c++98_only } } }
+}
+
+/* { dg-final { scan-tree-dump-not " + 1B] = {}" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr86010-2.c b/gcc/testsuite/gcc.dg/pr86010-2.c
new file mode 100644
index 0000000..4c82e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr86010-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void f (void*);
+
+void g (char *a)
+{
+  __builtin_memset (a, 0, 8);
+  __builtin_memset (a, 0, 8);
+
+  f (a);
+}
+
+void h (char *a)
+{
+  __builtin_memset (a, 0, 8);
+  __builtin_memset (a, 0, 7);
+
+  f (a);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_memset" 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr86010.c b/gcc/testsuite/gcc.dg/pr86010.c
new file mode 100644
index 0000000..ac27989
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr86010.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void f (void*);
+
+void g (void)
+{
+  char a[8];
+  __builtin_memset (a, 0, 8);
+  __builtin_memset (a, 0, 8);
+
+  f (a);
+}
+
+void h (void)
+{
+  char a[8];
+  __builtin_memset (a, 0, 8);
+  __builtin_memset (a, 0, 7);
+
+  f (a);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_memset" 2 "optimized" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] PR tree-optimization/98335: Improvements to DSE's compute_trims.
  2022-03-09 17:43 [PATCH v2] PR tree-optimization/98335: Improvements to DSE's compute_trims Roger Sayle
@ 2022-03-11 11:34 ` Richard Biener
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2022-03-11 11:34 UTC (permalink / raw)
  To: Roger Sayle; +Cc: GCC Patches

On Wed, Mar 9, 2022 at 6:43 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
> Many thanks.  Yes, your proposed ao_ref_alignment is exactly what I was looking for.
> Here's the second revision of my patch for PR tree-optimization/98335 that both uses/
> introduces ao_ref_alignment and more intelligently aligns/trims both head and tail,
> for example handling the case discussed by Richard and Jeff Law, of a 16 byte-aligned
> object where we wish to avoid trimming (just) the last three bytes.  It uses the useful
> property that writing N consecutive bytes, typically requires popcount(N) store
> instructions, so we wish to align (if we can) that we begin/end with a store of N' bytes
> where popcount(N') is one, if that isn't already the case.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check with no new failures.  Is this revised version ok for mainline?

OK.

I wonder if we can also handle bitpos != 0, like if we have
a an access 3 bytes into an 8 byte aligned object and
want to trim 2 bytes at start then it would be good to trim only
1 byte so the access is then 4 byte aligned (4 bytes into the
8 byte aligned object).  Similar, trimming 6 bytes should be reduced
to trimming 5 bytes.

Richard.

> 2022-03-09  Roger Sayle  <roger@nextmovesoftware.com>
>             Richard Biener  <rguenther@suse.de>
>
> gcc/ChangeLog
>         PR tree-optimization/98335
>         * builtins.cc (get_object_alignment_2): Export.
>         * builtins.h (get_object_alignment_2): Likewise.
>         * tree-ssa-alias.cc (ao_ref_alignment): New.
>         * tree-ssa-alias.h (ao_ref_alignment): Declare.
>
>         * tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
>         to align head/tail, writing more bytes but using fewer store insns.
>         (maybe_trim_memstar_call): Silence compiler warnings by using
>         memset to initialize lendata.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/98335
>         * g++.dg/pr98335.C: New test case.
>         * gcc.dg/pr86010.c: New test case.
>         * gcc.dg/pr86010-2.c: New test case.
>
> Thanks again for your help.
> Roger
> --
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 08 March 2022 10:44
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH] PR tree-optimization/98335: Improvements to DSE's
> > compute_trims.
> >
> > On Tue, Mar 8, 2022 at 11:10 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Mar 7, 2022 at 11:04 AM Roger Sayle
> > <roger@nextmovesoftware.com> wrote:
> > > >
> > > >
> > > > This patch is the main middle-end piece of a fix for PR
> > > > tree-opt/98335, which is a code-quality regression affecting
> > > > mainline.  The issue occurs in DSE's (dead store elimination's)
> > > > compute_trims function that determines where a store to memory can
> > > > be trimmed.  In the testcase given in the PR, this function notices
> > > > that the first byte of a DImode store is dead, and replaces the
> > > > 8-byte store at (aligned) offset zero, with a 7-byte store at
> > > > (unaligned) offset one.  Most architectures can store a power-of-two
> > > > bytes (up to a maximum) in single instruction, so writing 7 bytes
> > > > requires more instructions than writing 8 bytes.  This patch follows Jakub
> > Jelinek's suggestion in comment 5, that compute_trims needs improved
> > heuristics.
> > > >
> > > > In this patch, decision of whether and how to align trim_head is
> > > > based on the number of bytes being written, the alignment of the
> > > > start of the object and where within the object the first byte is
> > > > written.  The first tests check whether we're already writing to the
> > > > start of the object, and that we're writing three or more bytes.  If
> > > > we're only writing one or two bytes, there's no benefit from providing
> > additional alignment.
> > > > Then we determine the alignment of the object, which is either 1, 2,
> > > > 4, 8 or 16 byte aligned (capping at 16 guarantees that we never
> > > > write more than 7 bytes beyond the minimum required).  If the buffer
> > > > is only
> > > > 1 or 2 byte aligned there's no benefit from additional alignment.
> > > > For the remaining cases, alignment of trim_head is based upon where
> > > > within each aligned block (word) the first byte is written.  For
> > > > example, storing the last byte (or last half-word) of a word can be
> > > > performed with a single insn.
> > > >
> > > > On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
> > > >
> > > >         movl    $0, -24(%rsp)
> > > >         movabsq $72057594037927935, %rdx
> > > >         movl    $0, -21(%rsp)
> > > >         andq    -24(%rsp), %rdx
> > > >         movq    %rdx, %rax
> > > >         salq    $8, %rax
> > > >         movb    c(%rip), %al
> > > >         ret
> > > >
> > > > to
> > > >
> > > >         xorl    %eax, %eax
> > > >         movb    c(%rip), %al
> > > >         ret
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > > bootstrap and make -k check with no new failures.  I've also added
> > > > new testcases for the original motivating PR
> > > > tree-optimization/86010, to ensure that those remain optimized (in future).
> > Ok for mainline?
> > >
> > > diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc index
> > > 2b22a61..080e406 100644
> > > --- a/gcc/tree-ssa-dse.cc
> > > +++ b/gcc/tree-ssa-dse.cc
> > > @@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int
> > > *trim_head, int *trim_tail,
> > >    int first_live = bitmap_first_set_bit (live);
> > >    *trim_head = first_live - first_orig;
> > >
> > > -  /* If more than a word remains, then make sure to keep the
> > > -     starting point at least word aligned.  */
> > > -  if (last_live - first_live > UNITS_PER_WORD)
> > > -    *trim_head &= ~(UNITS_PER_WORD - 1);
> > > +  /* If REF is aligned, try to maintain this alignment if it reduces
> > > +     the number of (power-of-two sized aligned) writes to memory.
> > > +     First check that we're writing >= 3 bytes at a non-zero offset.
> > > + */  if (first_live
> > > +      && last_live - first_live >= 2)
> > > +    {
> > > +      unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base));
> > >
> > > you can't simply use TYPE_ALIGN_* on ref->base.  You can use
> > > get_object_alignment on ref->ref, but ref->ref can be NULL in case the
> > > ref was initialized from a builtin call like memcpy.
> > >
> > > Also ref->base is offsetted by ref->offset which you don't seem to
> > > account for.  In theory one could export get_object_alignment_2 and if
> > > ref->ref is NULL, use that on ref->base, passing addr_p = true, and
> > > then adjust the resulting bitpos by ref->offset and fix align
> > > accordingly (trimming might also align an access if the original
> > > access was offsetted from known alignment).
> > >
> > > That said, a helper like ao_ref_alignment () might be useful here.
> >
> > Like the attached - free feel to use that.
> >
> > Richard.
> >
> > >
> > > I wonder if we can apply good heuristics to compute_trims without
> > > taking into account context, like maybe_trimp_complex_store is already
> > > limiting itself to useful subsets and the constructor and memstar
> > > cases will only benefit if they end up being expanded inline via
> > > *_by_pieces, not if expanded as a call.
> > >
> > > You don't seem to adjust *trim_tail at all, if an aligned 16 byte
> > > region is trimmed there by 3 that will result in two extra stores as well, no?
> > >
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH v2] PR tree-optimization/98335: Improvements to DSE's compute_trims.
@ 2022-03-10  7:06 Roger Sayle
  0 siblings, 0 replies; 3+ messages in thread
From: Roger Sayle @ 2022-03-10  7:06 UTC (permalink / raw)
  To: 'Richard Biener'; +Cc: 'GCC Patches'


Hi Richard,

> 2022-03-09  Roger Sayle  <roger@nextmovesoftware.com>
>             Richard Biener  <rguenther@suse.de>
> 
> gcc/ChangeLog
> 	PR tree-optimization/98335
> 	* builtins.cc (get_object_alignment_2): Export.
> 	* builtins.h (get_object_alignment_2): Likewise.
> 	* tree-ssa-alias.cc (ao_ref_alignment): New.
> 	* tree-ssa-alias.h (ao_ref_alignment): Declare.
> 
> 	* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
> 	to align head/tail, writing more bytes but using fewer store insns.

> 	(maybe_trim_memstar_call): Silence compiler warnings by using
> 	memset to initialize lendata.

Please ignore this final change to maybe_trim_memstar_call, which I've
now removed from the patch.  Jason Merrill has now explained that these
warnings are issues with the (RedHat) system compiler on my host system
rather than code cleanliness issues with GCC source tree.  I just need to
ignore these compiler warnings in stage1 of a bootstrap/build, even in the
source files I'm changing.  Sorry for the noise.  Hopefully the rest looks good.

Cheers,
Roger
--



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-11 11:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-09 17:43 [PATCH v2] PR tree-optimization/98335: Improvements to DSE's compute_trims Roger Sayle
2022-03-11 11:34 ` Richard Biener
2022-03-10  7:06 Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).