[PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
@ 2017-01-15  9:34 Jeff Law
  2017-01-16  8:51 ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2017-01-15  9:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

At one time I know I had the max_size == size test in 
valid_ao_ref_for_dse.  But it got lost at some point.  This is what 
caused the Ada failure.

Technically it'd be OK for the potentially dead store to have a variable 
size as long as the later stores covered the entire range of the 
potentially dead store.  I doubt this happens enough to be worth checking.

The ppc64 big endian failures were more interesting.  We had this in the IL:

memmove (dst, src, 0)

The trimming code assumes that there's at least one live byte in the 
store, which obviously isn't the case here.  The net result is we 
compute an incorrect trim and the copy goes wild with incorrect 
addresses and lengths.  This is trivial to fix by validating that the 
store has a nonzero length.

I was a bit curious how often this happened in practice because such a 
call is trivially dead.  ~80 during a bootstrap and a few dozen in the 
testsuite.  Given how trivial it is to detect and optimize, this patch 
includes removal of such calls.  This hunk makes the check for zero size 
in valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if 
we add more builtin support without filtering zero size we'd regress again.

I wanted to get this in tonight given the Ada and ppc64 breakage so I 
didn't do testcase reductions for the ppc test or the zero length 
optimization (acats covers the Ada breakage).   I'll get testcases done 
Monday.  I'll also finish generating suitable baselines for 
ppc64-linux-gnu over the rest of the weekend to verify if this fixes all 
the ppc64 regressions or just a subset of them.

Bootstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Regression tested 
on x86_64-linux-gnu.  Also verified the ppc64 testresults are improved 
(anything where sel-sched was faulting ought to be fixed now,  maybe 
others).  Installing on the trunk.

Jeff

[-- Attachment #2: P --]
[-- Type: text/plain, Size: 2230 bytes --]

commit 04c5d9cab65e2f5b219f9610621bea06173b9cb8
Author: Jeff Law <law@redhat.com>
Date:   Sun Jan 15 01:37:32 2017 -0700

    	PR tree-optimization/79090
    	* tree-ssa-dse.c (valid_ao_ref_for_dse): Reject zero length and
    	variable length stores.
    	(compute_trims): Delete dead assignment to *trim_tail.
    	(dse_dom_walker::dse_optimize_stmt): Optimize mem* calls with
    	zero length.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 746f6af..36982c6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2017-01-14  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/79090
+	* tree-ssa-dse.c (valid_ao_ref_for_dse): Reject zero length and
+	variable length stores.
+	(compute_trims): Delete dead assignment to *trim_tail.
+	(dse_dom_walker::dse_optimize_stmt): Optimize mem* calls with
+	zero length.
+
 2017-01-14  Bernd Schmidt  <bschmidt@redhat.com>

 	PR rtl-optimization/78626
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 2e6f8ff..65dcb12 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -129,6 +129,8 @@ valid_ao_ref_for_dse (ao_ref *ref)
 {
   return (ao_ref_base (ref)
 	  && ref->max_size != -1
+	  && ref->size != 0
+	  && ref->max_size == ref->size
 	  && (ref->offset % BITS_PER_UNIT) == 0
 	  && (ref->size % BITS_PER_UNIT) == 0
 	  && (ref->size != -1));
@@ -221,7 +223,6 @@ compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
      the REF to compute the trims.  */

   /* Now identify how much, if any of the tail we can chop off.  */
-  *trim_tail = 0;
   int last_orig = (ref->size / BITS_PER_UNIT) - 1;
   int last_live = bitmap_last_set_bit (live);
   *trim_tail = (last_orig - last_live) & ~0x1;
@@ -700,6 +701,16 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  case BUILT_IN_MEMMOVE:
 	  case BUILT_IN_MEMSET:
 	    {
+	      /* Occasionally calls with an explicit length of zero
+		 show up in the IL.  It's pointless to do analysis
+		 on them, they're trivially dead.  */
+	      tree size = gimple_call_arg (stmt, 2);
+	      if (integer_zerop (size))
+		{
+		  delete_dead_call (gsi);
+		  return;
+		}
+
 	      gimple *use_stmt;
 	      enum dse_store_status store_status;
 	      m_byte_tracking_enabled

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
  2017-01-15  9:34 [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs Jeff Law
@ 2017-01-16  8:51 ` Richard Biener
  2017-01-16 18:27   ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2017-01-16  8:51 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>
> At one time I know I had the max_size == size test in valid_ao_ref_for_dse.
> But it got lost at some point.  This is what caused the Ada failure.
>
> Technically it'd be OK for the potentially dead store to have a variable
> size as long as the later stores covered the entire range of the potentially
> dead store.  I doubt this happens enough to be worth checking.
>
> The ppc64 big endian failures were more interesting.  We had this in the IL:
>
> memmove (dst, src, 0)
>
> The trimming code assumes that there's at least one live byte in the store,
> which obviously isn't the case here.  The net result is we compute an
> incorrect trim and the copy goes wild with incorrect addresses and lengths.
> This is trivial to fix by validating that the store has a nonzero length.
>
> I was a bit curious how often this happened in practice because such a call
> is trivially dead.  ~80 during a bootstrap and a few dozen in the testsuite.
> Given how trivial it is to detect and optimize, this patch includes removal
> of such calls.  This hunk makes the check for zero size in
> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if we add
> more builtin support without filtering zero size we'd regress again.

Interesting - we do fold memset (..., 0) away so this means we either
have an unfolded
memset stmt in the IL before DSE.

> I wanted to get this in tonight given the Ada and ppc64 breakage so I didn't
> do testcase reductions for the ppc test or the zero length optimization
> (acats covers the Ada breakage).   I'll get testcases done Monday.  I'll
> also finish generating suitable baselines for ppc64-linux-gnu over the rest
> of the weekend to verify if this fixes all the ppc64 regressions or just a
> subset of them.
>
> Bootstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Regression tested on
> x86_64-linux-gnu.  Also verified the ppc64 testresults are improved
> (anything where sel-sched was faulting ought to be fixed now,  maybe
> others).  Installing on the trunk.
>
> Jeff
>
> commit 04c5d9cab65e2f5b219f9610621bea06173b9cb8
> Author: Jeff Law <law@redhat.com>
> Date:   Sun Jan 15 01:37:32 2017 -0700
>
>         PR tree-optimization/79090
>         * tree-ssa-dse.c (valid_ao_ref_for_dse): Reject zero length and
>         variable length stores.
>         (compute_trims): Delete dead assignment to *trim_tail.
>         (dse_dom_walker::dse_optimize_stmt): Optimize mem* calls with
>         zero length.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 746f6af..36982c6 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2017-01-14  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/79090
> +       * tree-ssa-dse.c (valid_ao_ref_for_dse): Reject zero length and
> +       variable length stores.
> +       (compute_trims): Delete dead assignment to *trim_tail.
> +       (dse_dom_walker::dse_optimize_stmt): Optimize mem* calls with
> +       zero length.
> +
>  2017-01-14  Bernd Schmidt  <bschmidt@redhat.com>
>
>         PR rtl-optimization/78626
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 2e6f8ff..65dcb12 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -129,6 +129,8 @@ valid_ao_ref_for_dse (ao_ref *ref)
>  {
>    return (ao_ref_base (ref)
>           && ref->max_size != -1
> +         && ref->size != 0
> +         && ref->max_size == ref->size
>           && (ref->offset % BITS_PER_UNIT) == 0
>           && (ref->size % BITS_PER_UNIT) == 0
>           && (ref->size != -1));
> @@ -221,7 +223,6 @@ compute_trims (ao_ref *ref, sbitmap live, int
> *trim_head, int *trim_tail)
>       the REF to compute the trims.  */
>
>    /* Now identify how much, if any of the tail we can chop off.  */
> -  *trim_tail = 0;
>    int last_orig = (ref->size / BITS_PER_UNIT) - 1;
>    int last_live = bitmap_last_set_bit (live);
>    *trim_tail = (last_orig - last_live) & ~0x1;
> @@ -700,6 +701,16 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator
> *gsi)
>           case BUILT_IN_MEMMOVE:
>           case BUILT_IN_MEMSET:
>             {
> +             /* Occasionally calls with an explicit length of zero
> +                show up in the IL.  It's pointless to do analysis
> +                on them, they're trivially dead.  */
> +             tree size = gimple_call_arg (stmt, 2);
> +             if (integer_zerop (size))
> +               {
> +                 delete_dead_call (gsi);
> +                 return;
> +               }
> +
>               gimple *use_stmt;
>               enum dse_store_status store_status;
>               m_byte_tracking_enabled
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
  2017-01-16  8:51 ` Richard Biener
@ 2017-01-16 18:27   ` Jeff Law
  2017-01-16 22:36     ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2017-01-16 18:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 01/16/2017 01:51 AM, Richard Biener wrote:
> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>
>> At one time I know I had the max_size == size test in valid_ao_ref_for_dse.
>> But it got lost at some point.  This is what caused the Ada failure.
>>
>> Technically it'd be OK for the potentially dead store to have a variable
>> size as long as the later stores covered the entire range of the potentially
>> dead store.  I doubt this happens enough to be worth checking.
>>
>> The ppc64 big endian failures were more interesting.  We had this in the IL:
>>
>> memmove (dst, src, 0)
>>
>> The trimming code assumes that there's at least one live byte in the store,
>> which obviously isn't the case here.  The net result is we compute an
>> incorrect trim and the copy goes wild with incorrect addresses and lengths.
>> This is trivial to fix by validating that the store has a nonzero length.
>>
>> I was a bit curious how often this happened in practice because such a call
>> is trivially dead.  ~80 during a bootstrap and a few dozen in the testsuite.
>> Given how trivial it is to detect and optimize, this patch includes removal
>> of such calls.  This hunk makes the check for zero size in
>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if we add
>> more builtin support without filtering zero size we'd regress again.
>
> Interesting - we do fold memset (..., 0) away so this means we either
> have an unfolded memset stmt in the IL before DSE.
It's actually exposed by fre3, both in the original test and in the 
reduced testcase.  In the reduced testcase we have this just prior to FRE3:

;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
   _3 = MEM[(const struct vec *)_4].m_num;
   if (_3 != 0)
     goto <bb 4>; [36.64%]
   else
     goto <bb 5>; [63.36%]
;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
   _6 = vec_av_set.m_vec;
   _7 = _6->m_num;
   _8 = _7 - _3;
   _6->m_num = _8;
   _9 = (long unsigned int) _8;
   _10 = _9 * 4;
   slot.2_11 = slot;
   dest.3_12 = dest;
   memmove (dest.3_12, slot.2_11, _10);
;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)


_3 has the value _6->m_num.  Thus _8 will have the value 0, which in 
turn makes _10 have the value zero as seen in the .fre3 dump:

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
   _4->m_num = 0;
   slot.2_11 = slot;
   dest.3_12 = dest;
   memmove (dest.3_12, slot.2_11, 0);

In the full test its similar.

I don't know if you want to try and catch this in FRE though.

If we detect in DCE (where it makes reasonable sense) rather than DSE, 
then we detect the dead mem* about 17 passes earlier and the dead 
argument setup about 20 passes earlier.  In the testcase I looked at, I 
didn't see additional secondary optimizations enabled, but I could 
imagine cases where it might.  Seems like a gcc-8 thing though.

Jeff


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
  2017-01-16 18:27   ` Jeff Law
@ 2017-01-16 22:36     ` Richard Biener
  2017-01-17  9:15       ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2017-01-16 22:36 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>On 01/16/2017 01:51 AM, Richard Biener wrote:
>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>>
>>> At one time I know I had the max_size == size test in
>valid_ao_ref_for_dse.
>>> But it got lost at some point.  This is what caused the Ada failure.
>>>
>>> Technically it'd be OK for the potentially dead store to have a
>variable
>>> size as long as the later stores covered the entire range of the
>potentially
>>> dead store.  I doubt this happens enough to be worth checking.
>>>
>>> The ppc64 big endian failures were more interesting.  We had this in
>the IL:
>>>
>>> memmove (dst, src, 0)
>>>
>>> The trimming code assumes that there's at least one live byte in the
>store,
>>> which obviously isn't the case here.  The net result is we compute
>an
>>> incorrect trim and the copy goes wild with incorrect addresses and
>lengths.
>>> This is trivial to fix by validating that the store has a nonzero
>length.
>>>
>>> I was a bit curious how often this happened in practice because such
>a call
>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>testsuite.
>>> Given how trivial it is to detect and optimize, this patch includes
>removal
>>> of such calls.  This hunk makes the check for zero size in
>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>we add
>>> more builtin support without filtering zero size we'd regress again.
>>
>> Interesting - we do fold memset (..., 0) away so this means we either
>> have an unfolded memset stmt in the IL before DSE.
>It's actually exposed by fre3, both in the original test and in the 
>reduced testcase.  In the reduced testcase we have this just prior to
>FRE3:
>
>;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>   _3 = MEM[(const struct vec *)_4].m_num;
>   if (_3 != 0)
>     goto <bb 4>; [36.64%]
>   else
>     goto <bb 5>; [63.36%]
>;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>
>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>   _6 = vec_av_set.m_vec;
>   _7 = _6->m_num;
>   _8 = _7 - _3;
>   _6->m_num = _8;
>   _9 = (long unsigned int) _8;
>   _10 = _9 * 4;
>   slot.2_11 = slot;
>   dest.3_12 = dest;
>   memmove (dest.3_12, slot.2_11, _10);
>;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>
>
>_3 has the value _6->m_num.  Thus _8 will have the value 0, which in 
>turn makes _10 have the value zero as seen in the .fre3 dump:
>
>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>   _4->m_num = 0;
>   slot.2_11 = slot;
>   dest.3_12 = dest;
>   memmove (dest.3_12, slot.2_11, 0);
>
>In the full test its similar.
>
>I don't know if you want to try and catch this in FRE though.

Ah, I think I have patches for this since a long time in my tree...  We're folding calls in a restricted way for some historical reason.

>If we detect in DCE (where it makes reasonable sense) rather than DSE, 
>then we detect the dead mem* about 17 passes earlier and the dead 
>argument setup about 20 passes earlier.  In the testcase I looked at, I
>
>didn't see additional secondary optimizations enabled, but I could 
>imagine cases where it might.  Seems like a gcc-8 thing though.

I'll give it a quick look tomorrow.

Richard.

>Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
  2017-01-16 22:36     ` Richard Biener
@ 2017-01-17  9:15       ` Richard Biener
  2017-01-17 14:42         ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2017-01-17  9:15 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4377 bytes --]

On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>>On 01/16/2017 01:51 AM, Richard Biener wrote:
>>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>>>
>>>> At one time I know I had the max_size == size test in
>>valid_ao_ref_for_dse.
>>>> But it got lost at some point.  This is what caused the Ada failure.
>>>>
>>>> Technically it'd be OK for the potentially dead store to have a
>>variable
>>>> size as long as the later stores covered the entire range of the
>>potentially
>>>> dead store.  I doubt this happens enough to be worth checking.
>>>>
>>>> The ppc64 big endian failures were more interesting.  We had this in
>>the IL:
>>>>
>>>> memmove (dst, src, 0)
>>>>
>>>> The trimming code assumes that there's at least one live byte in the
>>store,
>>>> which obviously isn't the case here.  The net result is we compute
>>an
>>>> incorrect trim and the copy goes wild with incorrect addresses and
>>lengths.
>>>> This is trivial to fix by validating that the store has a nonzero
>>length.
>>>>
>>>> I was a bit curious how often this happened in practice because such
>>a call
>>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>>testsuite.
>>>> Given how trivial it is to detect and optimize, this patch includes
>>removal
>>>> of such calls.  This hunk makes the check for zero size in
>>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>>we add
>>>> more builtin support without filtering zero size we'd regress again.
>>>
>>> Interesting - we do fold memset (..., 0) away so this means we either
>>> have an unfolded memset stmt in the IL before DSE.
>>It's actually exposed by fre3, both in the original test and in the
>>reduced testcase.  In the reduced testcase we have this just prior to
>>FRE3:
>>
>>;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>>;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>>;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>>   _3 = MEM[(const struct vec *)_4].m_num;
>>   if (_3 != 0)
>>     goto <bb 4>; [36.64%]
>>   else
>>     goto <bb 5>; [63.36%]
>>;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>>
>>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>   _6 = vec_av_set.m_vec;
>>   _7 = _6->m_num;
>>   _8 = _7 - _3;
>>   _6->m_num = _8;
>>   _9 = (long unsigned int) _8;
>>   _10 = _9 * 4;
>>   slot.2_11 = slot;
>>   dest.3_12 = dest;
>>   memmove (dest.3_12, slot.2_11, _10);
>>;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>>
>>
>>_3 has the value _6->m_num.  Thus _8 will have the value 0, which in
>>turn makes _10 have the value zero as seen in the .fre3 dump:
>>
>>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>   _4->m_num = 0;
>>   slot.2_11 = slot;
>>   dest.3_12 = dest;
>>   memmove (dest.3_12, slot.2_11, 0);
>>
>>In the full test its similar.
>>
>>I don't know if you want to try and catch this in FRE though.
>
> Ah, I think I have patches for this since a long time in my tree...  We're folding calls in a restricted way for some historical reason.
>
>>If we detect in DCE (where it makes reasonable sense) rather than DSE,
>>then we detect the dead mem* about 17 passes earlier and the dead
>>argument setup about 20 passes earlier.  In the testcase I looked at, I
>>
>>didn't see additional secondary optimizations enabled, but I could
>>imagine cases where it might.  Seems like a gcc-8 thing though.
>
> I'll give it a quick look tomorrow.

The comment before fold_stmt_inplace no longer applies (but it seems I
simply forgot to push
this change...).  It's better to not keep unfolded stmts around, so
I'll commit this as last bit
of stage3 if testing is fine.

Bootstrap / regtest on x86_64-unknown-linux-gnu in progress.

Richard.

2017-01-17  Richard Biener  <rguenther@suse.de>

        * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
        Fold calls regularly.

        * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.

[-- Attachment #2: p3 --]
[-- Type: application/octet-stream, Size: 2818 bytes --]

2017-01-17  Richard Biener  <rguenther@suse.de>

	* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
	Fold calls regularly.

	* gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c	(revision 244519)
+++ gcc/tree-ssa-pre.c	(working copy)
@@ -4604,30 +4624,22 @@ eliminate_dom_walker::before_dom_childre
 	      && TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR)
 	    recompute_tree_invariant_for_addr_expr (gimple_assign_rhs1 (stmt));
 	  gimple *old_stmt = stmt;
-	  if (is_gimple_call (stmt))
-	    {
-	      /* ???  Only fold calls inplace for now, this may create new
-		 SSA names which in turn will confuse free_scc_vn SSA name
-		 release code.  */
-	      fold_stmt_inplace (&gsi);
-	      /* When changing a call into a noreturn call, cfg cleanup
-		 is needed to fix up the noreturn call.  */
-	      if (!was_noreturn && gimple_call_noreturn_p (stmt))
-		el_to_fixup.safe_push  (stmt);
-	    }
-	  else
-	    {
-	      fold_stmt (&gsi);
-	      stmt = gsi_stmt (gsi);
-	      if ((gimple_code (stmt) == GIMPLE_COND
-		   && (gimple_cond_true_p (as_a <gcond *> (stmt))
-		       || gimple_cond_false_p (as_a <gcond *> (stmt))))
-		  || (gimple_code (stmt) == GIMPLE_SWITCH
-		      && TREE_CODE (gimple_switch_index (
-				      as_a <gswitch *> (stmt)))
-		         == INTEGER_CST))
-		el_todo |= TODO_cleanup_cfg;
-	    }
+	  fold_stmt (&gsi);
+	  stmt = gsi_stmt (gsi);
+	  /* When changing a call into a noreturn call, cfg cleanup
+	     is needed to fix up the noreturn call.  */
+	  if (!was_noreturn
+	      && is_gimple_call (stmt) && gimple_call_noreturn_p (stmt))
+	    el_to_fixup.safe_push  (stmt);
+	  /* When changing a condition or switch into one we know what
+	     edge will be executed, schedule a cfg cleanup.  */
+	  if ((gimple_code (stmt) == GIMPLE_COND
+	       && (gimple_cond_true_p (as_a <gcond *> (stmt))
+		   || gimple_cond_false_p (as_a <gcond *> (stmt))))
+	      || (gimple_code (stmt) == GIMPLE_SWITCH
+		  && TREE_CODE (gimple_switch_index
+				  (as_a <gswitch *> (stmt))) == INTEGER_CST))
+	    el_todo |= TODO_cleanup_cfg;
 	  /* If we removed EH side-effects from the statement, clean
 	     its EH information.  */
 	  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))

Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-57.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-57.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-57.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+static int n;
+void foo(char *p)
+{
+  n = 0;
+  __builtin_memset (p, 0, n);
+}
+
+/* { dg-final { scan-tree-dump-not "memset" "fre1" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
  2017-01-17  9:15       ` Richard Biener
@ 2017-01-17 14:42         ` Jeff Law
  2017-01-18 12:45           ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2017-01-17 14:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 01/17/2017 02:15 AM, Richard Biener wrote:
> On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>>> On 01/16/2017 01:51 AM, Richard Biener wrote:
>>>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> At one time I know I had the max_size == size test in
>>> valid_ao_ref_for_dse.
>>>>> But it got lost at some point.  This is what caused the Ada failure.
>>>>>
>>>>> Technically it'd be OK for the potentially dead store to have a
>>> variable
>>>>> size as long as the later stores covered the entire range of the
>>> potentially
>>>>> dead store.  I doubt this happens enough to be worth checking.
>>>>>
>>>>> The ppc64 big endian failures were more interesting.  We had this in
>>> the IL:
>>>>>
>>>>> memmove (dst, src, 0)
>>>>>
>>>>> The trimming code assumes that there's at least one live byte in the
>>> store,
>>>>> which obviously isn't the case here.  The net result is we compute
>>> an
>>>>> incorrect trim and the copy goes wild with incorrect addresses and
>>> lengths.
>>>>> This is trivial to fix by validating that the store has a nonzero
>>> length.
>>>>>
>>>>> I was a bit curious how often this happened in practice because such
>>> a call
>>>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>>> testsuite.
>>>>> Given how trivial it is to detect and optimize, this patch includes
>>> removal
>>>>> of such calls.  This hunk makes the check for zero size in
>>>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>>> we add
>>>>> more builtin support without filtering zero size we'd regress again.
>>>>
>>>> Interesting - we do fold memset (..., 0) away so this means we either
>>>> have an unfolded memset stmt in the IL before DSE.
>>> It's actually exposed by fre3, both in the original test and in the
>>> reduced testcase.  In the reduced testcase we have this just prior to
>>> FRE3:
>>>
>>> ;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>>> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>>>   _3 = MEM[(const struct vec *)_4].m_num;
>>>   if (_3 != 0)
>>>     goto <bb 4>; [36.64%]
>>>   else
>>>     goto <bb 5>; [63.36%]
>>> ;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>> ;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>>>
>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>   _6 = vec_av_set.m_vec;
>>>   _7 = _6->m_num;
>>>   _8 = _7 - _3;
>>>   _6->m_num = _8;
>>>   _9 = (long unsigned int) _8;
>>>   _10 = _9 * 4;
>>>   slot.2_11 = slot;
>>>   dest.3_12 = dest;
>>>   memmove (dest.3_12, slot.2_11, _10);
>>> ;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>>>
>>>
>>> _3 has the value _6->m_num.  Thus _8 will have the value 0, which in
>>> turn makes _10 have the value zero as seen in the .fre3 dump:
>>>
>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>   _4->m_num = 0;
>>>   slot.2_11 = slot;
>>>   dest.3_12 = dest;
>>>   memmove (dest.3_12, slot.2_11, 0);
>>>
>>> In the full test its similar.
>>>
>>> I don't know if you want to try and catch this in FRE though.
>>
>> Ah, I think I have patches for this since a long time in my tree...  We're folding calls in a restricted way for some historical reason.
>>
>>> If we detect in DCE (where it makes reasonable sense) rather than DSE,
>>> then we detect the dead mem* about 17 passes earlier and the dead
>>> argument setup about 20 passes earlier.  In the testcase I looked at, I
>>>
>>> didn't see additional secondary optimizations enabled, but I could
>>> imagine cases where it might.  Seems like a gcc-8 thing though.
>>
>> I'll give it a quick look tomorrow.
>
> The comment before fold_stmt_inplace no longer applies (but it seems I
> simply forgot to push
> this change...).  It's better to not keep unfolded stmts around, so
> I'll commit this as last bit
> of stage3 if testing is fine.
>
> Bootstrap / regtest on x86_64-unknown-linux-gnu in progress.
>
> Richard.
>
> 2017-01-17  Richard Biener  <rguenther@suse.de>
>
>         * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
>         Fold calls regularly.
>
>         * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.
>
Note you'll need the trivial update to the new ssa-dse testcase as it 
verifies removal of the dead memmove.

jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs
  2017-01-17 14:42         ` Jeff Law
@ 2017-01-18 12:45           ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2017-01-18 12:45 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Tue, Jan 17, 2017 at 3:42 PM, Jeff Law <law@redhat.com> wrote:
> On 01/17/2017 02:15 AM, Richard Biener wrote:
>>
>> On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <law@redhat.com>
>>> wrote:
>>>>
>>>> On 01/16/2017 01:51 AM, Richard Biener wrote:
>>>>>
>>>>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>> At one time I know I had the max_size == size test in
>>>>
>>>> valid_ao_ref_for_dse.
>>>>>>
>>>>>> But it got lost at some point.  This is what caused the Ada failure.
>>>>>>
>>>>>> Technically it'd be OK for the potentially dead store to have a
>>>>
>>>> variable
>>>>>>
>>>>>> size as long as the later stores covered the entire range of the
>>>>
>>>> potentially
>>>>>>
>>>>>> dead store.  I doubt this happens enough to be worth checking.
>>>>>>
>>>>>> The ppc64 big endian failures were more interesting.  We had this in
>>>>
>>>> the IL:
>>>>>>
>>>>>>
>>>>>> memmove (dst, src, 0)
>>>>>>
>>>>>> The trimming code assumes that there's at least one live byte in the
>>>>
>>>> store,
>>>>>>
>>>>>> which obviously isn't the case here.  The net result is we compute
>>>>
>>>> an
>>>>>>
>>>>>> incorrect trim and the copy goes wild with incorrect addresses and
>>>>
>>>> lengths.
>>>>>>
>>>>>> This is trivial to fix by validating that the store has a nonzero
>>>>
>>>> length.
>>>>>>
>>>>>>
>>>>>> I was a bit curious how often this happened in practice because such
>>>>
>>>> a call
>>>>>>
>>>>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>>>>
>>>> testsuite.
>>>>>>
>>>>>> Given how trivial it is to detect and optimize, this patch includes
>>>>
>>>> removal
>>>>>>
>>>>>> of such calls.  This hunk makes the check for zero size in
>>>>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>>>>
>>>> we add
>>>>>>
>>>>>> more builtin support without filtering zero size we'd regress again.
>>>>>
>>>>>
>>>>> Interesting - we do fold memset (..., 0) away so this means we either
>>>>> have an unfolded memset stmt in the IL before DSE.
>>>>
>>>> It's actually exposed by fre3, both in the original test and in the
>>>> reduced testcase.  In the reduced testcase we have this just prior to
>>>> FRE3:
>>>>
>>>> ;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>>>> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>>>> ;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>>>>   _3 = MEM[(const struct vec *)_4].m_num;
>>>>   if (_3 != 0)
>>>>     goto <bb 4>; [36.64%]
>>>>   else
>>>>     goto <bb 5>; [63.36%]
>>>> ;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>> ;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>>>>
>>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>>   _6 = vec_av_set.m_vec;
>>>>   _7 = _6->m_num;
>>>>   _8 = _7 - _3;
>>>>   _6->m_num = _8;
>>>>   _9 = (long unsigned int) _8;
>>>>   _10 = _9 * 4;
>>>>   slot.2_11 = slot;
>>>>   dest.3_12 = dest;
>>>>   memmove (dest.3_12, slot.2_11, _10);
>>>> ;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>>>>
>>>>
>>>> _3 has the value _6->m_num.  Thus _8 will have the value 0, which in
>>>> turn makes _10 have the value zero as seen in the .fre3 dump:
>>>>
>>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>>   _4->m_num = 0;
>>>>   slot.2_11 = slot;
>>>>   dest.3_12 = dest;
>>>>   memmove (dest.3_12, slot.2_11, 0);
>>>>
>>>> In the full test its similar.
>>>>
>>>> I don't know if you want to try and catch this in FRE though.
>>>
>>>
>>> Ah, I think I have patches for this since a long time in my tree...
>>> We're folding calls in a restricted way for some historical reason.
>>>
>>>> If we detect in DCE (where it makes reasonable sense) rather than DSE,
>>>> then we detect the dead mem* about 17 passes earlier and the dead
>>>> argument setup about 20 passes earlier.  In the testcase I looked at, I
>>>>
>>>> didn't see additional secondary optimizations enabled, but I could
>>>> imagine cases where it might.  Seems like a gcc-8 thing though.
>>>
>>>
>>> I'll give it a quick look tomorrow.
>>
>>
>> The comment before fold_stmt_inplace no longer applies (but it seems I
>> simply forgot to push
>> this change...).  It's better to not keep unfolded stmts around, so
>> I'll commit this as last bit
>> of stage3 if testing is fine.
>>
>> Bootstrap / regtest on x86_64-unknown-linux-gnu in progress.
>>
>> Richard.
>>
>> 2017-01-17  Richard Biener  <rguenther@suse.de>
>>
>>         * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
>>         Fold calls regularly.
>>
>>         * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.
>>
> Note you'll need the trivial update to the new ssa-dse testcase as it
> verifies removal of the dead memmove.

Yep, the patch had other fallout so I'm postponing it for GCC 8.

Richard.

> jeff
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-18 12:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-15  9:34 [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs Jeff Law
2017-01-16  8:51 ` Richard Biener
2017-01-16 18:27   ` Jeff Law
2017-01-16 22:36     ` Richard Biener
2017-01-17  9:15       ` Richard Biener
2017-01-17 14:42         ` Jeff Law
2017-01-18 12:45           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).