public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
@ 2020-03-24 17:51 ` sandra at gcc dot gnu.org
  2020-03-25  7:21 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: sandra at gcc dot gnu.org @ 2020-03-24 17:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

sandra at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandra at gcc dot gnu.org

--- Comment #9 from sandra at gcc dot gnu.org ---
Both the new test cases are failing on nios2 at -Os, -O2, and -O3.  I've done
some analysis, but I'm not sure exactly where the problem lies, and whether
this is a problem in the nios2 back end or somewhere else.

long __attribute__((noipa))
foo (struct bb *bv, void *ptr)
{
  struct aa *a = ptr;
  struct bb *b = ptr;
  bv->b.u.f = 1;
  a->a.u.i = 0;
  b->b.u.f = 0;
  return bv->b.u.f;
}

is compiling to

foo:
        movi    r2, 1
        stw     r2, 0(r4)
        ldw     r2, 0(r4)
        stw     zero, 0(r5)
        stw     zero, 4(r5)
        ret

What's going on here is that load instructions have 3-cycle latency on nios2,
so the sched2 pass is moving the "ldw r2, 0(r4)" to load the return value 2
instructions earlier....  ahead of the store instruction to the same location
via the aliased pointer.  :-(

I'm not an expert on the instruction scheduler, and it seems like the target
hooks and machine description syntax are all focused on modelling pipeline
costs in order to minimize stalls, not telling the scheduler that certain
instructions cannot be correctly reordered at all.  Should some other pass be
inserting optimization barriers, or something like that?  I feel like I'm
missing some big-picture expertise of where this needs to be fixed, so any
suggestions to point me in the right direction would be appreciated.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
  2020-03-24 17:51 ` [Bug tree-optimization/93946] Bogus redundant store removal sandra at gcc dot gnu.org
@ 2020-03-25  7:21 ` rguenth at gcc dot gnu.org
  2020-03-27 20:19 ` sandra at gcc dot gnu.org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-25  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to sandra from comment #9)
> Both the new test cases are failing on nios2 at -Os, -O2, and -O3.  I've
> done some analysis, but I'm not sure exactly where the problem lies, and
> whether this is a problem in the nios2 back end or somewhere else.
> 
> long __attribute__((noipa))
> foo (struct bb *bv, void *ptr)
> {
>   struct aa *a = ptr;
>   struct bb *b = ptr;
>   bv->b.u.f = 1;
>   a->a.u.i = 0;
>   b->b.u.f = 0;
>   return bv->b.u.f;
> }
> 
> is compiling to
> 
> foo:
>         movi    r2, 1
>         stw     r2, 0(r4)
>         ldw     r2, 0(r4)
>         stw     zero, 0(r5)
>         stw     zero, 4(r5)
>         ret
> 
> What's going on here is that load instructions have 3-cycle latency on
> nios2, so the sched2 pass is moving the "ldw r2, 0(r4)" to load the return
> value 2 instructions earlier....  ahead of the store instruction to the same
> location via the aliased pointer.  :-(
> 
> I'm not an expert on the instruction scheduler, and it seems like the target
> hooks and machine description syntax are all focused on modelling pipeline
> costs in order to minimize stalls, not telling the scheduler that certain
> instructions cannot be correctly reordered at all.  Should some other pass
> be inserting optimization barriers, or something like that?  I feel like I'm
> missing some big-picture expertise of where this needs to be fixed, so any
> suggestions to point me in the right direction would be appreciated.

The instruction scheduler needs to check dependences which should correctly
model the constraints here already.  So you need to start looking at the
RTL before sched2 to see if that's sane (compare to say x86 RTL and look
for obvious signs of errors - some target expanders made errors here in the
past).  Then debug sched2 as to what true/anti/output_dependence tests it
performs and why those tell sched2 moving is OK.

As you can see the fix involved touching many passes so no doubt there can
be more issues elsewhere ...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
  2020-03-24 17:51 ` [Bug tree-optimization/93946] Bogus redundant store removal sandra at gcc dot gnu.org
  2020-03-25  7:21 ` rguenth at gcc dot gnu.org
@ 2020-03-27 20:19 ` sandra at gcc dot gnu.org
  2020-03-28  7:38 ` rguenther at suse dot de
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: sandra at gcc dot gnu.org @ 2020-03-27 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #11 from sandra at gcc dot gnu.org ---
RTL before sched2 does look sane and similar to that generated for x86 with
-m32.

I've been trying to step through sched2.  I think that where things get
interesting is the call to true_dependence at sched-deps.c:2663.

Breakpoint 6, true_dependence (mem=0x7ffff742c9a8, mem_mode=E_VOIDmode, 
    x=0x7ffff742cac8)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3056
3056      return true_dependence_1 (mem, mem_mode, NULL_RTX,
(gdb) print debug_rtx(mem)
(mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
$19 = void
(gdb) print debug_rtx(x)
(mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4 A32])
$20 = void

This is making it all the way to the end of true_dependence_1, into
rtx_refs_may_alias_p, and into refs_may_alias_p_1, which is returning false,
which gets propagated back up as the result of true_dependence.  IIUC, this is
what is allowing sched2 to move the read from "x" ahead of the write to "mem".

Before I spend more time on this, am I on the right track here?  And is this
pointing at the problem being in refs_may_alias_p_1 rather than somewhere along
the way e.g. in true_dependence_1?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-03-27 20:19 ` sandra at gcc dot gnu.org
@ 2020-03-28  7:38 ` rguenther at suse dot de
  2020-03-28 22:41 ` sandra at gcc dot gnu.org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: rguenther at suse dot de @ 2020-03-28  7:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On March 27, 2020 9:19:33 PM GMT+01:00, "sandra at gcc dot gnu.org"
<gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946
>
>--- Comment #11 from sandra at gcc dot gnu.org ---
>RTL before sched2 does look sane and similar to that generated for x86
>with
>-m32.
>
>I've been trying to step through sched2.  I think that where things get
>interesting is the call to true_dependence at sched-deps.c:2663.
>
>Breakpoint 6, true_dependence (mem=0x7ffff742c9a8, mem_mode=E_VOIDmode,
>
>    x=0x7ffff742cac8)
>    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3056
>3056      return true_dependence_1 (mem, mem_mode, NULL_RTX,
>(gdb) print debug_rtx(mem)
>(mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa
>*)ptr_1(D)].a.u.i+0 S4 A32])
>$19 = void
>(gdb) print debug_rtx(x)
>(mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4
>A32])
>$20 = void
>
>This is making it all the way to the end of true_dependence_1, into
>rtx_refs_may_alias_p, and into refs_may_alias_p_1, which is returning
>false,
>which gets propagated back up as the result of true_dependence.  IIUC,
>this is
>what is allowing sched2 to move the read from "x" ahead of the write to
>"mem".
>
>Before I spend more time on this, am I on the right track here?  And is
>this
>pointing at the problem being in refs_may_alias_p_1 rather than
>somewhere along
>the way e.g. in true_dependence_1?

Those are two stores, right? Then true_dependence is the wrong predicate to
look at it should be output_dependence instead.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-03-28  7:38 ` rguenther at suse dot de
@ 2020-03-28 22:41 ` sandra at gcc dot gnu.org
  2020-03-30  6:43 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: sandra at gcc dot gnu.org @ 2020-03-28 22:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #13 from sandra at gcc dot gnu.org ---
Well, no.  The problem is that the scheduler is moving 

        ldw     r2, 0(r4)

ahead of

        stw     zero, 0(r5)

which is incorrect because the pointers in r4 and r5 are aliases.

So at the point of call to true_dependence, I see:

(gdb) frame 1
#1  0x0000000001d1a108 in sched_analyze_2 (deps=0x7fffffffdd50, 
    x=0x7ffff742cac8, insn=0x7ffff7315600)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/sched-deps.c:2663
2663                    if (true_dependence (pending_mem->element (), VOIDmode,
t)
(gdb) print debug_rtx(insn)
(insn 17 10 18 2 (set (reg/i:SI 2 r2)
        (mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4
A32]))
"/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/pr93946-1.c":18:1
5 {movsi_internal}
     (expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:47 bv ] [47])
        (nil)))
$3 = void
(gdb) print debug_rtx(pending->insn())
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1
MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32])
        (const_int 0 [0]))
"/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12
5 {movsi_internal}
     (nil))
$4 = void

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2020-03-28 22:41 ` sandra at gcc dot gnu.org
@ 2020-03-30  6:43 ` rguenth at gcc dot gnu.org
  2020-04-06  5:35 ` sandra at gcc dot gnu.org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-30  6:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to sandra from comment #13)
> Well, no.  The problem is that the scheduler is moving 
> 
>         ldw     r2, 0(r4)
> 
> ahead of
> 
>         stw     zero, 0(r5)
> 
> which is incorrect because the pointers in r4 and r5 are aliases.

Ah, so the scheduler needs to call anti_dependence (WAR).

> So at the point of call to true_dependence, I see:
> 
> (gdb) frame 1
> #1  0x0000000001d1a108 in sched_analyze_2 (deps=0x7fffffffdd50, 
>     x=0x7ffff742cac8, insn=0x7ffff7315600)
>     at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/sched-deps.c:2663
> 2663                    if (true_dependence (pending_mem->element (),
> VOIDmode, t)
> (gdb) print debug_rtx(insn)
> (insn 17 10 18 2 (set (reg/i:SI 2 r2)
>         (mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0
> S4 A32]))
> "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/
> pr93946-1.c":18:1 5 {movsi_internal}
>      (expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:47 bv ] [47])
>         (nil)))
> $3 = void
> (gdb) print debug_rtx(pending->insn())
> (insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1
> MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32])
>         (const_int 0 [0]))
> "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/
> pr93946-1.c":15:12 5 {movsi_internal}
>      (nil))
> $4 = void

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2020-03-30  6:43 ` rguenth at gcc dot gnu.org
@ 2020-04-06  5:35 ` sandra at gcc dot gnu.org
  2020-04-06  7:49 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: sandra at gcc dot gnu.org @ 2020-04-06  5:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #15 from sandra at gcc dot gnu.org ---
Hmmm.  I've gone over this code 2 or 3 times now, and I'm still convinced the
problem is in the alias analysis, not the scheduler.

I've stepped deeper into the code in the debugger, and here is the backtrace
from where I see things going wrong:

#0  indirect_refs_may_alias_p (ref1=0x7ffff74196f0, base1=0x7ffff73fec08, 
    offset1=..., max_size1=..., size1=..., ref1_alias_set=1, 
    base1_alias_set=4, ref2=0x7ffff74195d0, base2=0x7ffff73febb8, offset2=..., 
    max_size2=..., size2=..., ref2_alias_set=1, base2_alias_set=6, tbaa_p=true)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2122
#1  0x00000000013f8266 in refs_may_alias_p_2 (ref1=0x7fffffffd7d0, 
    ref2=0x7fffffffd790, tbaa_p=true)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2320
#2  0x00000000013f82bb in refs_may_alias_p_1 (ref1=0x7fffffffd7d0, 
    ref2=0x7fffffffd790, tbaa_p=true)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2339
#3  0x0000000000b90b06 in rtx_refs_may_alias_p (x=0x7ffff742cac8, 
    mem=0x7ffff742c9a8, tbaa_p=true)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:365
#4  0x0000000000b981de in true_dependence_1 (mem=0x7ffff742c9a8, 
    mem_mode=E_SImode, mem_addr=0x7ffff742c7e0, x=0x7ffff742cac8, 
    x_addr=0x7ffff742c798, mem_canonicalized=false)
    at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3048

The code here says

  /* Do type-based disambiguation.  */
  if (base1_alias_set != base2_alias_set
      && !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
    return false;

and the "false" return status gets propagated all the way back up to
true_dependence.

It seems to me that what is going wrong is that it is failing to consider that
two pointer parameters can be aliased no matter what their declared type is,
and no matter what types they are cast to -- e.g. because they point to members
of the same union.  Should ao_ref_base_alias_set be putting everything based on
pointer parameters without restrict semantics into the same alias set, maybe? 
Or should there be some code in indirect_refs_may_alias_p to look for this
situation before it reaches the point of type-based disambiguation where it is
currently failing to DTRT?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2020-04-06  5:35 ` sandra at gcc dot gnu.org
@ 2020-04-06  7:49 ` rguenth at gcc dot gnu.org
  2020-04-06  8:08 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-06  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, now looking myself.  RTL expansion creates

(insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4
A32])
        (reg:SI 49)) "t.c":12:13 -1
     (nil))
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
        (const_int 0 [0])) "t.c":13:12 -1
     (nil))
(insn 10 9 11 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ])
                (const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
        (const_int 0 [0])) "t.c":13:12 -1
     (nil))
(insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb
*)ptr_1(D)].b.u.f+0 S4 A32])
        (const_int 0 [0])) "t.c":14:12 -1
     (nil))
(insn 12 11 13 2 (set (reg:SI 51)
        (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]))
"t.c":15:17 -1
     (nil))

where insn 11 is the important one.  Somehow on nios2 the CSE1 removes that
store.

deferring deletion of insn with uid = 11.

and we end up with

(insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4
A32])
        (reg:SI 49)) "t.c":12:13 5 {movsi_internal}
     (expr_list:REG_DEAD (reg:SI 49)
        (nil)))
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
        (const_int 0 [0])) "t.c":13:12 5 {movsi_internal}
     (nil))
(insn 10 9 12 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ])
                (const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
        (const_int 0 [0])) "t.c":13:12 5 {movsi_internal}
     (nil))
(insn 12 10 13 2 (set (reg:SI 51 [ bv_3(D)->b.u.f ])
        (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]))
"t.c":15:17 5 {movsi_internal}
     (expr_list:REG_DEAD (reg/v/f:SI 47 [ bv ])
        (nil)))

where there indeed is no scheduling barrier anymore.

I didn't know CSE removes stores or why this only triggers on nios2, it looks
like some DF thing?  Backtrace of the "DSE":

#0  delete_insn (insn=0x7ffff6bc3400)
    at /space/rguenther/src/gcc/gcc/cfgrtl.c:135
#1  0x0000000000b0bfa5 in delete_insn_and_edges (insn=0x7ffff6bc3400)
    at /space/rguenther/src/gcc/gcc/cfgrtl.c:237
#2  0x0000000001a9d8eb in cse_insn (insn=0x7ffff6bc3400)
    at /space/rguenther/src/gcc/gcc/cse.c:5571
#3  0x0000000001aa0b76 in cse_extended_basic_block (ebb_data=0x7fffffffdc90)
    at /space/rguenther/src/gcc/gcc/cse.c:6614
#4  0x0000000001aa10a5 in cse_main (f=0x7ffff6cce310, nregs=52)
    at /space/rguenther/src/gcc/gcc/cse.c:6793

that's

      /* Similarly for no-op moves.  */
      else if (noop_insn)
        {
          if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
            cse_cfg_altered = true;
          cse_cfg_altered |= delete_insn_and_edges (insn);
          /* No more processing for this set.  */
          sets[i].rtl = 0;

so appearantly it does redundant store removal as well...

          /* Similarly, lots of targets don't allow no-op
             (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
             might be impossible for certain registers (like CC registers).  */
          else if (n_sets == 1
                   && !CALL_P (insn)
                   && (MEM_P (trial) || REG_P (trial))
                   && rtx_equal_p (trial, dest)
                   && !side_effects_p (dest)
                   && (cfun->can_delete_dead_exceptions
                       || insn_nothrow_p (insn)))
            {
              SET_SRC (sets[i].rtl) = trial;
              noop_insn = true;
              break;
            }

where

(gdb) p debug_rtx (insn)
(insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb
*)ptr_1(D)].b.u.f+0 S4 A32])
        (const_int 0 [0])) "t.c":14:12 5 {movsi_internal}
     (expr_list:REG_DEAD (reg/v/f:SI 48 [ ptr ])
        (nil)))
(gdb) p debug_rtx (trial)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4
A32])
$4 = void
(gdb) p debug_rtx (dest)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4
A32])
$6 = void

so it might be that the trigger is a target where sizeof(long long) = 2 *
sizeof(long) _and_ we split stores to the larger type
(I tried to pick a set of types where sizeof is the same but
alias-sets are different - otherwise I'd have to cater for big vs.
little-endian).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2020-04-06  7:49 ` rguenth at gcc dot gnu.org
@ 2020-04-06  8:08 ` rguenth at gcc dot gnu.org
  2020-04-08  2:30 ` sandra at gcc dot gnu.org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-06  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
x86 splits large stores only after reload it seems.  Also on x86 GIMPLE
store-merging triggers, merging the two = 0 stores so I need -fno-store-merging
to
even get two stores there.  Still the issue in CSE is obviously there...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2020-04-06  8:08 ` rguenth at gcc dot gnu.org
@ 2020-04-08  2:30 ` sandra at gcc dot gnu.org
  2020-04-08  6:20 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: sandra at gcc dot gnu.org @ 2020-04-08  2:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #18 from sandra at gcc dot gnu.org ---
Hmmm, it looks to me like things are going wrong in the tree fre1 pass too.  I
commented out the redundant zero store in the test case to see what would
happen, like

long __attribute__((noipa))
foo (struct bb *bv, void *ptr)
{
  struct aa *a = ptr;
  struct bb *b = ptr;
  bv->b.u.f = 1;
  a->a.u.i = 0;
  /*  b->b.u.f = 0; */
  return bv->b.u.f;
}

fre1 gets this as input:

__attribute__((noipa, noinline, noclone, no_icf))
foo (struct bb * bv, void * ptr)
{
  struct bb * b;
  struct aa * a;
  long int _8;

  <bb 2> :
  bv_5(D)->b.u.f = 1;
  MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
  _8 = bv_5(D)->b.u.f;
  return _8;

}


and produces this output:

__attribute__((noipa, noinline, noclone, no_icf))
foo (struct bb * bv, void * ptr)
{
  struct bb * b;
  struct aa * a;

  <bb 2> :
  bv_5(D)->b.u.f = 1;
  MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
  return 1;

}

This is with -O2.

Again, it seems like something is not realizing that pointer parameters can be
aliased no matter what their types are.  Or perhaps that is just a symptom of
some other underlying bug.  :-S

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2020-04-08  2:30 ` sandra at gcc dot gnu.org
@ 2020-04-08  6:20 ` rguenth at gcc dot gnu.org
  2020-04-08 12:04 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-08  6:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to sandra from comment #18)
> Hmmm, it looks to me like things are going wrong in the tree fre1 pass too. 
> I commented out the redundant zero store in the test case to see what would
> happen, like
> 
> long __attribute__((noipa))
> foo (struct bb *bv, void *ptr)
> {
>   struct aa *a = ptr;
>   struct bb *b = ptr;
>   bv->b.u.f = 1;
>   a->a.u.i = 0;
>   /*  b->b.u.f = 0; */
>   return bv->b.u.f;
> }
> 
> fre1 gets this as input:
> 
> __attribute__((noipa, noinline, noclone, no_icf))
> foo (struct bb * bv, void * ptr)
> {
>   struct bb * b;
>   struct aa * a;
>   long int _8;
> 
>   <bb 2> :
>   bv_5(D)->b.u.f = 1;
>   MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
>   _8 = bv_5(D)->b.u.f;
>   return _8;
> 
> }
> 
> 
> and produces this output:
> 
> __attribute__((noipa, noinline, noclone, no_icf))
> foo (struct bb * bv, void * ptr)
> {
>   struct bb * b;
>   struct aa * a;
> 
>   <bb 2> :
>   bv_5(D)->b.u.f = 1;
>   MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
>   return 1;
> 
> }
> 
> This is with -O2.
> 
> Again, it seems like something is not realizing that pointer parameters can
> be aliased no matter what their types are.  Or perhaps that is just a
> symptom of some other underlying bug.  :-S

No, it's a feature and called type-based alias-analysis.  The "redundant"
store is what eventually aliases with the load, the other store to the
same location can be disambiguated using TBAA.  So the second store is
not really redudnant - while it has no effect on the bits in the memory
it alters the memory state by changing the effective type of the memory
location to one compatible with the following load (or since we're dealing
with unions, it changes the active union member).  This is why we cannot
elide the "redudnant" store.  We can elide the earlier store as dead
though (since there's no intermediate use).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2020-04-08  6:20 ` rguenth at gcc dot gnu.org
@ 2020-04-08 12:04 ` rguenth at gcc dot gnu.org
  2020-04-08 17:28 ` cvs-commit at gcc dot gnu.org
  2020-04-09  2:25 ` sandra at gcc dot gnu.org
  13 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-08 12:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for the CSE issue we go through the equivalence chain and find

(gdb) p debug_rtx (p->exp)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4
A32])

5076              if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
5077                src_related = dest;
...
5121              if (rtx_equal_p (src_related, dest))
5122                src_related_cost = src_related_regcost = -1;

the first issue is that we're recording dest as src_related here losing the
opportunity to do a validity check later on.  If we fix that we can do the
usual validity check, copied from DSE:

diff --git a/gcc/cse.c b/gcc/cse.c
index 3e8724b3fed..f07bbdbebad 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5074,7 +5074,7 @@ cse_insn (rtx_insn *insn)
             to prefer it.  Copy it to src_related.  The code below will
             then give it a negative cost.  */
          if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
-           src_related = dest;
+           src_related = p->exp;
        }

       /* Find the cheapest valid equivalent, trying all the available
@@ -5332,7 +5332,16 @@ cse_insn (rtx_insn *insn)
                   && rtx_equal_p (trial, dest)
                   && !side_effects_p (dest)
                   && (cfun->can_delete_dead_exceptions
-                      || insn_nothrow_p (insn)))
+                      || insn_nothrow_p (insn))
+                  /* We can only remove the later store if the earlier aliases
+                     at least all accesses the later one.  */
+                  && (!MEM_P (trial)
+                      || ((MEM_ALIAS_SET (dest) == MEM_ALIAS_SET (trial)
+                           || alias_set_subset_of (MEM_ALIAS_SET (dest),
+                                                   MEM_ALIAS_SET (trial)))
+                           && (!MEM_EXPR (trial)
+                               || refs_same_for_tbaa_p (MEM_EXPR (trial),
+                                                        MEM_EXPR (dest))))))
            {
              SET_SRC (sets[i].rtl) = trial;
              noop_insn = true;

that gives us for the testcase

foo:
        movi    r2, 1
        stw     r2, 0(r4)
        stw     zero, 0(r5)
        ldw     r2, 0(r4)
        stw     zero, 4(r5)
        ret

which looks correct to me.  I'm going to test the above on x86_64-linux.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2020-04-08 12:04 ` rguenth at gcc dot gnu.org
@ 2020-04-08 17:28 ` cvs-commit at gcc dot gnu.org
  2020-04-09  2:25 ` sandra at gcc dot gnu.org
  13 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-04-08 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:dd9ca9d770a18ce4b16d867f49fef3293b483ff5

commit r10-7635-gdd9ca9d770a18ce4b16d867f49fef3293b483ff5
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Apr 8 14:04:35 2020 +0200

    rtl-optimization/93946 - fix TBAA for redundant store removal in CSE

    It turns out RTL CSE tries to remove redundant stores but fails to
    do the usual validity check what such a change is TBAA neutral to
    later loads.

    This now triggers with the PR93946 testcases on nios2.

    2020-04-08  Richard Biener  <rguenther@suse.de>

            PR rtl-optimization/93946
            * cse.c (cse_insn): Record the tabled expression in
            src_related.  Verify a redundant store removal is valid.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/93946] Bogus redundant store removal
       [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2020-04-08 17:28 ` cvs-commit at gcc dot gnu.org
@ 2020-04-09  2:25 ` sandra at gcc dot gnu.org
  13 siblings, 0 replies; 14+ messages in thread
From: sandra at gcc dot gnu.org @ 2020-04-09  2:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #22 from sandra at gcc dot gnu.org ---
My nios2-elf test results look good now with this patch.  Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-09  2:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-93946-4@http.gcc.gnu.org/bugzilla/>
2020-03-24 17:51 ` [Bug tree-optimization/93946] Bogus redundant store removal sandra at gcc dot gnu.org
2020-03-25  7:21 ` rguenth at gcc dot gnu.org
2020-03-27 20:19 ` sandra at gcc dot gnu.org
2020-03-28  7:38 ` rguenther at suse dot de
2020-03-28 22:41 ` sandra at gcc dot gnu.org
2020-03-30  6:43 ` rguenth at gcc dot gnu.org
2020-04-06  5:35 ` sandra at gcc dot gnu.org
2020-04-06  7:49 ` rguenth at gcc dot gnu.org
2020-04-06  8:08 ` rguenth at gcc dot gnu.org
2020-04-08  2:30 ` sandra at gcc dot gnu.org
2020-04-08  6:20 ` rguenth at gcc dot gnu.org
2020-04-08 12:04 ` rguenth at gcc dot gnu.org
2020-04-08 17:28 ` cvs-commit at gcc dot gnu.org
2020-04-09  2:25 ` sandra at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).