public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/67443] New: DSE removes required store instruction
@ 2015-09-03  7:57 krebbel at gcc dot gnu.org
  2015-09-03  8:19 ` [Bug rtl-optimization/67443] [5/6 regression] " rguenth at gcc dot gnu.org
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: krebbel at gcc dot gnu.org @ 2015-09-03  7:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

            Bug ID: 67443
           Summary: DSE removes required store instruction
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Created attachment 36289
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36289&action=edit
Autoreduced testcase

This bug has been reported on the Debian mailing list. It occurs when building
the mozilla javascript engine with GCC 5 on S/390:

https://lists.debian.org/debian-s390/2015/08/msg00006.html

Bisecting shows that the failure occurred after this commit:

commit 90f3e775d6b7bec70e883579beb49b456c135a09
Author: wmi <wmi@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jan 22 17:59:23 2015 +0000

    2015-01-22  Wei Mi  <wmi@google.com>

            PR rtl-optimization/64557
            * dse.c (record_store): Call get_addr for mem_addr.
            (check_mem_read_rtx): Likewise.

Building the attached testcase with GCC before and after the commit shows that
a single stc (store character) instruction is missing after the patch.

Build options: -O2 -march=z900 -fPIC -std=gnu++11


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
@ 2015-09-03  8:19 ` rguenth at gcc dot gnu.org
  2015-09-08  5:02 ` wmi at google dot com
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-09-03  8:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |5.3
            Summary|[5 regression] DSE removes  |[5/6 regression] DSE
                   |required store instruction  |removes required store
                   |                            |instruction


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
  2015-09-03  8:19 ` [Bug rtl-optimization/67443] [5/6 regression] " rguenth at gcc dot gnu.org
@ 2015-09-08  5:02 ` wmi at google dot com
  2015-09-08  5:33 ` wmi at google dot com
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: wmi at google dot com @ 2015-09-08  5:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #1 from wmi at google dot com ---
Seems the patch makes some problem exposed. 

For the testcase 1.cxx below:

typedef struct A {
  unsigned i : 8;
  unsigned j : 24;
} A;

void foo(A *a) {
  a->i = 3;
  a->j = 5;
}

The rtl generated by s390x-ibm-linux-g++ seems wrong.

~/workarea/gcc-r227524/build/install/bin/s390x-ibm-linux-g++ -O2 -S 1.cxx
-fdump-rtl-expand-details-blocks

(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v/f:DI 60 [ a ])
        (reg:DI 2 %r2 [ a ])) 4.cxx:6 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 8 2 (set (mem/j:QI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->i+0 S1 A32])
        (const_int 3 [0x3])) 4.cxx:7 -1
     (nil))
(insn 8 6 9 2 (set (reg:SI 62)
        (mem/j:SI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->j+-1 S4 A32])) 4.cxx:8 -1
     (nil))
(insn 9 8 10 2 (parallel [
            (set (reg:SI 63)
                (and:SI (reg:SI 62)
                    (const_int -16777216 [0xffffffffff000000])))
            (clobber (reg:CC 33 %cc))
        ]) 4.cxx:8 -1
     (nil))
(insn 10 9 11 2 (parallel [
            (set (reg:SI 64)
                (ior:SI (reg:SI 63)
                    (const_int 5 [0x5])))
            (clobber (reg:CC 33 %cc))
        ]) 4.cxx:8 -1
     (nil))
(insn 11 10 0 2 (set (mem/j:SI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->j+-1 S4 A32])
        (reg:SI 64)) 4.cxx:8 -1
     (nil))
;;  succ:       EXIT [100.0%]  (FALLTHRU)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
  2015-09-03  8:19 ` [Bug rtl-optimization/67443] [5/6 regression] " rguenth at gcc dot gnu.org
  2015-09-08  5:02 ` wmi at google dot com
@ 2015-09-08  5:33 ` wmi at google dot com
  2015-10-14 16:27 ` vogt at linux dot vnet.ibm.com
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: wmi at google dot com @ 2015-09-08  5:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #2 from wmi at google dot com ---
Another problem is found in true_dependence_1 in alias.c. true_mem_addr or
true_x_addr got after calling get_addr may be used as inputs of
memrefs_conflict_p. However memrefs_conflict_p expects to use VALUE type nodes
as its inputs, so the values of the memory addresses can be comparable. Only
find_base_term and base_alias_check should use true_mem_addr/true_x_addr in
true_dependence_1.

This problem is not a correctness issue, but may affect the effectiveness of
dse/postreload...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-09-08  5:33 ` wmi at google dot com
@ 2015-10-14 16:27 ` vogt at linux dot vnet.ibm.com
  2015-10-14 16:40 ` vogt at linux dot vnet.ibm.com
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-14 16:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

Dominik Vogt <vogt at linux dot vnet.ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vogt at linux dot vnet.ibm.com

--- Comment #3 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
I think the Rtl in comment 1 ist correct.  Note that "i" is stored at
0x00000000.xx000000 and "j" is stored at 0x00000000.000000xx.  That is the
reason for the rather confusing mask in insn 9.  Your test program compiles and
runs fine for me.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-10-14 16:27 ` vogt at linux dot vnet.ibm.com
@ 2015-10-14 16:40 ` vogt at linux dot vnet.ibm.com
  2015-10-14 16:41 ` vogt at linux dot vnet.ibm.com
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-14 16:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #4 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
Now this is the result of my efforts of reducing the original large test
program to a minimum.

---
struct s_t 
{ 
  unsigned f1: 8; 
  unsigned f2: 24; 
}; 
bool foo(int a, int **pp, s_t **pps, void *s, int **x) 
{ 
  s_t *ps = *pps; 
  int *gt; 
  int y; 
  int z; 

  __asm__ __volatile__ ("" : : : "0","1","2","3","6","7","14"); 
  unsigned i = **pp; 
  if (**pp >= 999 && **x < 77) 
    { 
      ps->f1 = **x; 
      ps->f2 = **pp; 
    } 
  __asm__ __volatile__ ("" : : : "0","1","2","3","4","5","6","7","8"); 
  do { y &= *gt; } while (y != 0); 
  __asm__ __volatile__ ("" : : : "memory", "2","3","4","5"); 
  do { z &= *gt; } while (z != 0); 
  return i && x && ps && s; 
} 
---

Without having looked at the Rtl yet, I strongly suspect that the uninitialised
variables y and z trigger the problem.  Originally this was inlined code of a
function returning int or bool but without a return statement.  The __asm__
__volatile__ lines are replacements for functions calls to simplify the test. 
Whether the "stc" is removed or not seems to depend on which registers are
available and on how many variables are still in use.

See attached .tar.bz2 contains the test program, the assembler output and the
rtl output of -da.  If you need more information, just ask.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2015-10-14 16:40 ` vogt at linux dot vnet.ibm.com
@ 2015-10-14 16:41 ` vogt at linux dot vnet.ibm.com
  2015-10-14 21:32 ` vogt at linux dot vnet.ibm.com
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-14 16:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #5 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
Created attachment 36509
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36509&action=edit
test program and debug output

test program and debug output


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2015-10-14 16:41 ` vogt at linux dot vnet.ibm.com
@ 2015-10-14 21:32 ` vogt at linux dot vnet.ibm.com
  2015-10-14 21:35 ` vogt at linux dot vnet.ibm.com
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-14 21:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #8 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
Snippet of the assembly code from the test program posted in comment 4:

good compiler:
==============

# r1 is 0x******XX (XX = value to store in the f1 field)
# r3 is the target address (ps)
# r9 is 0x**YYYYYY (YYYYYY = value to store in the f2 field)

  l       %r2,.L25-.L24(%r13) # r2 := (uint32)0xff000000
  stc     %r1,0(%r3)          # 8-bit-store *(char *)ps := (r1 & 0xff)
                              # -> ps: XX ** ** **
  lr      %r1,%r9             # 32-bit-load r1 := r9 -> r1 = 0x**YYYYYY
  n       %r2,0(%r3)          # 32-bit-and r2 := r2 & *(uint32 *)ps
                              # -> r2 = 0xXX000000
  nilh    %r1,255             # 16-bit-and of bits 32-47 of r1 with 0x00ff,
i.e.
                              # it zeroes the top 8 bits -> r1 = 0x00YYYYYY 
  or      %r1,%r2             # 32-bit or r1 := r1 | r2 -> 0xXX00YYYY
  st      %r1,0(%r3)          # 32-bit store *(uint32)ps := r1
                              # -> ps: XX YY YY YY

.L24:
.L25:
  .long   -16777216

bad compiler:
=============

The stc instruction is missing, so the resulting structure has a bad value in
f1.

--

This is the modified program I used for debugging, compiled with

  $ g++ -O2 -march=z900 -fPIC -std=gnu++11 t67443.cxx

---
#include <stdio.h> 

__attribute__ ((noinline)) bool f(void *s) { return s ? true : false; } 
int *gt; 
struct s_t 
{ 
  unsigned f1: 8; 
  unsigned f2: 24; 
}; 
__attribute__ ((noinline)) 
bool foo(int a, int **pp, s_t **pps, void *s, int **x) 
{ 
  s_t *ps = *pps; 

  if (!f(x)) 
    return false; 
  unsigned i = **pp; 
  volatile register long r2 __asm__ ("r2") = 1; 
  if (**pp >= 999 && **x < 77) 
    { 
      ps->f1 = **x; 
      ps->f2 = **pp; 
    } 
  f(s); 
  int y; 
  do { y &= *gt; } while (y != 0); 
  __asm__ __volatile__ ("" : : : "memory", "r2","r3","r4","r5"); 
  int z; 
  do { z &= *gt; } while (z != 0); 
  return i && x && ps && s; 
} 

int main(void) 
{ 
  int i1 = 999; 
  int *pi1 = &i1; 
  s_t st; 
  s_t *pst = &st; 
  int x = 55; 
  int *px = &x; 
  int t = 0; 

  gt = &t; 
  foo(1, &pi1, &pst, 0, &px); 
  fprintf(stderr, "f1 %u, f2 %u\n", (unsigned int)st.f1, (unsigned int)st.f2); 
} 
---

This prints "55 999" for the good compiler (correct) and "255 999" for the bad
compiler (wrong).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2015-10-14 21:32 ` vogt at linux dot vnet.ibm.com
@ 2015-10-14 21:35 ` vogt at linux dot vnet.ibm.com
  2015-10-15 10:04 ` vogt at linux dot vnet.ibm.com
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-14 21:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #9 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
>   or      %r1,%r2             # 32-bit or r1 := r1 | r2 -> 0xXX00YYYY
                                                                ^^^^
This should be

  or      %r1,%r2             # 32-bit or r1 := r1 | r2 -> 0xXXYYYYYY
                                                              ^^^^

Sorry for the typo.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2015-10-14 21:35 ` vogt at linux dot vnet.ibm.com
@ 2015-10-15 10:04 ` vogt at linux dot vnet.ibm.com
  2015-10-15 10:13 ` vogt at linux dot vnet.ibm.com
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-15 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #10 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
This is what's happening:

Before the dse2 pass, there is a store instruction (25) followed by a read
instruction later (29):

---
# store first byte
(insn 25 135 136 4 (set (mem/j:QI (reg/v/f:DI 2 %r2 [orig:47 ps ] [47]) [2
ps_7\
->f1+0 S1 A32]) 
        (reg:QI 1 %r1 [orig:51 D.2017+3 ] [51])) t67443.cxx:17 74 {*movqi} 
     (nil))

(insn 136 25 137 4 (set (reg/v/f:DI 1 %r1 [orig:47 ps ] [47]) 
        (reg/v/f:DI 3 %r3 [orig:47 ps ] [47])) t67443.cxx:18 63 {*movdi_64} 
     (nil)) 

# overwrites r2
(insn 137 136 29 4 (set (reg:SI 2 %r2 [68]) 
        (mem/u/c:SI (unspec:DI [ 
                    (symbol_ref/u:DI ("*.LC0") [flags 0x2]) 
                    (reg:DI 13 %r13) 
                ] UNSPEC_LTREF) [3  S4 A32])) t67443.cxx:18 67 {*movsi_zarch} 
     (nil)) 

# write three bytes
(insn 29 137 139 4 (parallel [ 
            (set (reg:SI 2 %r2 [68]) 
                (and:SI (reg:SI 2 %r2 [68]) 
                    (mem/j:SI (reg/v/f:DI 1 %r1 [orig:47 ps ] [47]) [2
ps_7->f2\
+-1 S4 A32]))) 
            (clobber (reg:CC 33 %cc)) 
        ]) t67443.cxx:18 454 {*andsi3_zarch} 
     (nil)) 
---

With the unpatched code, the a "value" expression is put in the mem_addr field
of the store_info for the "set" (similar for the read in insn 29):

store (value:DI 3:3 @0xb7049598/0xb70516a8) 
read  (value:DI 3:3 @0xb7049598/0xb70516a8) 

With the patched code, the mem_addr is resolved to a register expression r2 for
the store and r1 for the read:

store (reg/v/f:DI 2 %r2 [orig:47 ps ] [47])
read  (reg/v/f:DI 1 %r1 [orig:47 ps ] [47])

Then, canon_true_dependence is called with these terms and calls get_addr on
them again.  In the unpatched case, the same address value is resolved to the
same register expression:

(reg/v/f:DI 1 %r1 [orig:47 ps ] [47])

With the patched code, there's nothing left to do for get_addr, so there are
two different register expressions that describe the same address.  In the end,
memrefs_conflict_p() gets called with the two different register expressions
and fails to detect that they refer to the same address (which they do *not* at
insn 29 because r2 has been overwritten in between).

So, I'd say the new call of get_addr() in record_store() is too early because
it may replace the address value by an expression that is no longer valid when
the memory is read later.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2015-10-15 10:04 ` vogt at linux dot vnet.ibm.com
@ 2015-10-15 10:13 ` vogt at linux dot vnet.ibm.com
  2015-10-17 19:25 ` wmi at google dot com
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-15 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #11 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
This is the bug report for the problem that the patch fixed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2015-10-15 10:13 ` vogt at linux dot vnet.ibm.com
@ 2015-10-17 19:25 ` wmi at google dot com
  2015-10-18 18:55 ` wmi at google dot com
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: wmi at google dot com @ 2015-10-17 19:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #12 from wmi at google dot com ---
Yes, I agree it is a problem that memrefs_conflict_p doesn't take effect. But I
am still wondering even if memrefs_conflict_p doesn't take effect, the alias
oracle query in rtx_refs_may_alias_p should have returned may-alias for the
load and store. Why rtx_refs_may_alias_p failed to do that?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2015-10-17 19:25 ` wmi at google dot com
@ 2015-10-18 18:55 ` wmi at google dot com
  2015-10-18 19:54 ` vogt at linux dot vnet.ibm.com
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: wmi at google dot com @ 2015-10-18 18:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #13 from wmi at google dot com ---
Use the extracted testcase vogt contributed. Here is some digging about why
rtx_refs_may_alias_p returns noalias for the load and store:

(gdb) c
Continuing.

Breakpoint 3, rtx_refs_may_alias_p (x=0x7ffff57fe768, mem=0x7ffff57fe708,
tbaa_p=true)
    at ../../src/gcc/alias.c:385
385       if (!ao_ref_from_mem (&ref1, x)

******
(gdb) p print_rtl_single(stderr, x)
(mem/j:SI (reg/v/f:DI 1 %r1 [orig:64 ps ] [64]) [5 ps_8->f2+-1 S4 A32])
$1 = 1
(gdb) p print_rtl_single(stderr, mem)
(mem/j:QI (reg/v/f:DI 2 %r2 [orig:64 ps ] [64]) [5 ps_8->f1+0 S1 A32])
******

rtx_refs_may_alias_p(x, mem, true) returns no_alias for "x" and "mem". 

>From RTL representation, x's starting address is ps_8->f2+-1, size is 4 //See
[5 ps_8->f2+-1 S4 A32]
mem's starting address is ps_8->f1+0, size is 1 // see [5 ps_8->f1+0 S1 A32]
So x and mem are aliased with each other.

******
(gdb) p ref1
$3 = {ref = 0x7ffff568d9f0, base = 0x7ffff57d7c30, offset = 8, size = 24,
max_size = 24, ref_alias_set = 5,
  base_alias_set = -1, volatile_p = false}
(gdb) p ref2
$4 = {ref = 0x7ffff568d9c0, base = 0x7ffff57d7f50, offset = 0, size = 8,
max_size = 8, ref_alias_set = 5,
  base_alias_set = -1, volatile_p = false}
(gdb) p debug_generic_expr(ref1.base)
*ps_8
$6 = void
(gdb) p debug_generic_expr(ref2.base)
*ps_8
$7 = void
******

rtx_refs_may_alias_p(x, mem, true) calls refs_may_alias_p_1(&ref1, &ref2, ...)
as its helper func. For ref1 and ref2, they have the same base -- *ps_8, but
they have non-overlapping accessing ranges -- ref1 from 8 to 8+24, ref2 from 0
to 8, so ref1 has no-alias with ref2. 

The major difference is between ref1 and x. ref1 is initialized using
MEM_EXPR(x) (MEM_EXPR(x) is ps_8->f2). So ref1 has its offset to be 8 and its
size to be 24. However, x has starting address to be ps_8->f2-1 and size to be
32 bits. Usually ref1's offset and size will be adjusted according to
MEM_SIZE(x) and MEM_OFFSET(x). However, because of the if (...) clause below,
ao_ref_from_mem returns true without adjusting ref1->offset and ref1->size.

(gdb) p debug_generic_expr(MEM_EXPR(x))
ps_8->f2
(gdb) p MEM_OFFSET(x)
$11 = -1
(gdb) p MEM_SIZE(x)
$12 = 4

ao_ref_from_mem (ao_ref *ref, const_rtx mem)
{
  tree expr = MEM_EXPR (mem);
  ...
  ao_ref_init (ref, expr);
  base = ao_ref_base (ref);
  ...
  /* If the base decl is a parameter we can have negative MEM_OFFSET in
     case of promoted subregs on bigendian targets.  Trust the MEM_EXPR
     here.  */
  if (MEM_OFFSET (mem) < 0
      && (MEM_SIZE (mem) + MEM_OFFSET (mem)) * BITS_PER_UNIT == ref->size)
    return true; 

  ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
  ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;

  ...
}

I don't understand the code above well -- why can we trust MEM_EXPR instead of
relying on MEM_OFFSET and MEM_SIZE? It seems not the case for the testcase
here.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2015-10-18 18:55 ` wmi at google dot com
@ 2015-10-18 19:54 ` vogt at linux dot vnet.ibm.com
  2015-10-18 20:58 ` vogt at linux dot vnet.ibm.com
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-18 19:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #14 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
This is the thread discussing the patch that introduced the if-condition above
to ao_ref_from_mem:
https://gcc.gnu.org/ml/gcc-patches/2009-09/msg01638.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2015-10-18 19:54 ` vogt at linux dot vnet.ibm.com
@ 2015-10-18 20:58 ` vogt at linux dot vnet.ibm.com
  2015-10-20  9:13 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-18 20:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #15 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
The original problem was that a 4-byte load (or whatever) from memory M into an
8-byte-register on bigendian machines is represented by a memory reference of
size 8 starting at M-4?  This problem was present since 138bc75d-0d04-0410-96
was applied, but never triggered because there was another code path that
identified the alias.  With the get_addr-Patch that safety net is gone and that
now triggers the bug uin the old patch.

The mistake that patch seems to make is to assume that a negative MEM_OFFEST
always indicates a "promoted subreg on" a "bigendian target" while there are
actually other situations where this can occur:

(1) Store a byte at address p; address p is loaded into register r<n> first.
(2) Some code forces register r<n> to be reused for other purposes.
(3) Load address p to a register r<m>; r<n> is still used by something else.
(4) Read bytes 1-2 (or more) from address p+1; this results in a multi-byte
read from address p; p is loaded into register r<m> (m != n) because r<n> is
still in use.

Changing the condition to

  if (MEM_OFFSET (mem) < 0
      && (MEM_SIZE (mem) + MEM_OFFSET (mem)) * BITS_PER_UNIT == ref->size
      && MEM_SIZE (mem) + MEM_OFFSET (mem) > 0
      && (-MEM_OFFSET (mem)) % (MEM_SIZE (mem) + MEM_OFFSET (mem)) == 0)
   return true;

fixes that (but possibly causes problems elsewhere).  The changed code just
assumes that "aligned" (-offset == size / 2, probably should take the memory
addres into account) references with negative offset are are caused by
"promoted subregs on bigendian targets" and "unaligned" references are not. 
It's in no way obvious to me whether this "unaligned" check covers exactly all
relevant cases.  Probably not, but at least it makes the test program work.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2015-10-18 20:58 ` vogt at linux dot vnet.ibm.com
@ 2015-10-20  9:13 ` rguenth at gcc dot gnu.org
  2015-10-20 13:13 ` vogt at linux dot vnet.ibm.com
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-20  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see multiple issues with ao_ref_from_mem connected with MEM_OFFSET/MEM_SIZE.
I don't remember exactly what was the issue with the promoted parameters
but it may have been with ranges_overlap_p not dealing with signed offsets
properly (that has been fixed meanwhile).

So I'd start with simply removing

  /* If the base decl is a parameter we can have negative MEM_OFFSET in
     case of promoted subregs on bigendian targets.  Trust the MEM_EXPR
     here.  */
  if (MEM_OFFSET (mem) < 0
      && (MEM_SIZE (mem) + MEM_OFFSET (mem)) * BITS_PER_UNIT == ref->size)
    return true;

One issue that remains is that when MEM_OFFSET/MEM_SIZE gets us outside
of the MEM_EXPR object (its offset/size) we have to strip
handled-component-refs
until MEM_OFFSET/MEM_SIZE are again within bounds (or conservatively reset
ref->ref to NULL_TREE).  Either the code adjusting the mem attributes
MEM_OFFSET/MEM_SIZE is supposed to do that already (no idea) or we'll run into
issues.

Basically MEM_SIZE must be <= ref->size, MEM_OFFSET may never be negative
unless ref->offset is zero and if MEM_OFFSET is positive then
MEM_OFFSET + MEM_SIZE must be <= ref->size.

It's of course a tricky area ;)

I believe that

  /* If MEM_OFFSET and MEM_SIZE get us outside of the base object of
     the MEM_EXPR punt.  This happens for STRICT_ALIGNMENT targets a lot.  */
  if (MEM_EXPR (mem) != get_spill_slot_decl (false)
      && (ref->offset < 0
          || (DECL_P (ref->base)
              && (DECL_SIZE (ref->base) == NULL_TREE
                  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
                  || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
                                ref->offset + ref->size)))))
    return false;

will ultimatively "save" us here as well (very conservatively though).

So does

Index: gcc/alias.c
===================================================================
--- gcc/alias.c (revision 229022)
+++ gcc/alias.c (working copy)
@@ -339,15 +339,16 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
       || !MEM_SIZE_KNOWN_P (mem))
     return true;

-  /* If the base decl is a parameter we can have negative MEM_OFFSET in
-     case of promoted subregs on bigendian targets.  Trust the MEM_EXPR
-     here.  */
+  /* If MEM_OFFSET/MEM_SIZE get us outside of ref->offset/ref->max_size
+     drop ref->ref.  */
   if (MEM_OFFSET (mem) < 0
-      && (MEM_SIZE (mem) + MEM_OFFSET (mem)) * BITS_PER_UNIT == ref->size)
-    return true;
+      || (ref->max_size != -1
+         && ((MEM_OFFSET (mem) + MEM_SIZE (mem)) * BITS_PER_UNIT
+             > ref->max_size)))
+    ref->ref = NULL_TREE;

-  /* Otherwise continue and refine size and offset we got from analyzing
-     MEM_EXPR by using MEM_SIZE and MEM_OFFSET.  */
+  /* Refine size and offset we got from analyzing MEM_EXPR by using
+     MEM_SIZE and MEM_OFFSET.  */

   ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
   ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;

fix the issue?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2015-10-20  9:13 ` rguenth at gcc dot gnu.org
@ 2015-10-20 13:13 ` vogt at linux dot vnet.ibm.com
  2015-10-20 13:32 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-20 13:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #17 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
> So does [patch] fix the issue?

Yes.  The testsuite on 64-bit and 31-bit systems has no regressions with the
patch either.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2015-10-20 13:13 ` vogt at linux dot vnet.ibm.com
@ 2015-10-20 13:32 ` rguenther at suse dot de
  2015-10-21  0:11 ` vogt at linux dot vnet.ibm.com
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenther at suse dot de @ 2015-10-20 13:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 20 Oct 2015, vogt at linux dot vnet.ibm.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443
> 
> --- Comment #17 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
> > So does [patch] fix the issue?
> 
> Yes.  The testsuite on 64-bit and 31-bit systems has no regressions with the
> patch either.

Ok, I tested on ppc64 (big-endian) with C,C++,Fortran and see no
obvious regressions either.

Will queue testing on x86_64.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2015-10-20 13:32 ` rguenther at suse dot de
@ 2015-10-21  0:11 ` vogt at linux dot vnet.ibm.com
  2015-10-21  0:12 ` vogt at linux dot vnet.ibm.com
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-21  0:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #19 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
This is a cleaner test case for s390/s390x.  If there was a way to tell gcc
"all registers except the first three argument registers are not available",
the test should be fairly easy to convert to other targets.  I'll try to port
it to x86 tomorrow.

--- pr67443.c ---
/* Test case for PR/67443.  */

/* { dg-do run { target s390*-*-* } } */
/* { dg-prune-output "call-clobbered register used for global register
variable" } */
/* { dg-options "-march=z900 -fPIC -fomit-frame-pointer -O3" } */

#include <assert.h>

/* Block all registers except the first three argument registers.  */
register long r0 asm ("r0");
register long r1 asm ("r1");
register long r5 asm ("r5");
register long r6 asm ("r6");
register long r7 asm ("r7");
register long r8 asm ("r8");
register long r9 asm ("r9");
register long r10 asm ("r10");
register long r11 asm ("r11");

struct s_t
{
  unsigned f1 : 8;
  unsigned f2 : 24;
};

__attribute__ ((noinline))
void foo (struct s_t *ps, int c, int i)
{
  /* Uses r2 as address register.  */
  ps->f1 = c;
  /* The calculation of the value is so expensive that it's cheaper to spill ps
     to the stack and reload it later (into a different register).
     ==> Uses r4 as address register.*/
  ps->f2 = i + i % 3;
  /* If dead store elimination fails to detect that the address in r2 during
     the first assignment is an alias of the address in r4 during the second
     assignment, it eliminates the first assignment and the f1 field is not
     written (bug).  */
}

int main (void)
{
  struct s_t s = { 0x01u, 0x020304u };

  foo (&s, 0, 0);
  assert (s.f1 == 0&& s.f2 == 0);

  return 0;
}
-----------------


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2015-10-21  0:11 ` vogt at linux dot vnet.ibm.com
@ 2015-10-21  0:12 ` vogt at linux dot vnet.ibm.com
  2015-10-23 10:27 ` vogt at linux dot vnet.ibm.com
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-21  0:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #20 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
Created attachment 36553
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36553&action=edit
Dejagnu test case for s390/s390x.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2015-10-21  0:12 ` vogt at linux dot vnet.ibm.com
@ 2015-10-23 10:27 ` vogt at linux dot vnet.ibm.com
  2015-10-26 15:25 ` rguenth at gcc dot gnu.org
  2015-10-26 15:26 ` [Bug rtl-optimization/67443] [5 " rguenth at gcc dot gnu.org
  21 siblings, 0 replies; 23+ messages in thread
From: vogt at linux dot vnet.ibm.com @ 2015-10-23 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #21 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
Trying to write a test case for Power, I was unable to force the bug to show in
the first place.  Although I can force the addresses used into different
registers, dead store elimination on power does not kill the first store.  ==>
No test case for Power.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2015-10-23 10:27 ` vogt at linux dot vnet.ibm.com
@ 2015-10-26 15:25 ` rguenth at gcc dot gnu.org
  2015-10-26 15:26 ` [Bug rtl-optimization/67443] [5 " rguenth at gcc dot gnu.org
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-26 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Mon Oct 26 15:24:45 2015
New Revision: 229372

URL: https://gcc.gnu.org/viewcvs?rev=229372&root=gcc&view=rev
Log:
2015-10-26  Richard Biener  <rguenther@suse.de>
        Dominik Vogt  <vogt@linux.vnet.ibm.com>

        PR middle-end/67443
        * alias.c (ao_ref_from_mem): Remove promoted subreg handling.
        Properly prune ref->ref for accesses outside of ref.

        * gcc.target/s390/pr67443.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.target/s390/pr67443.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/alias.c
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/67443] [5 regression] DSE removes required store instruction
  2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2015-10-26 15:25 ` rguenth at gcc dot gnu.org
@ 2015-10-26 15:26 ` rguenth at gcc dot gnu.org
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-26 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2015-10-26
      Known to work|                            |6.0
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
            Summary|[5/6 regression] DSE        |[5 regression] DSE removes
                   |removes required store      |required store instruction
                   |instruction                 |
     Ever confirmed|0                           |1

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-10-26 15:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03  7:57 [Bug rtl-optimization/67443] New: DSE removes required store instruction krebbel at gcc dot gnu.org
2015-09-03  8:19 ` [Bug rtl-optimization/67443] [5/6 regression] " rguenth at gcc dot gnu.org
2015-09-08  5:02 ` wmi at google dot com
2015-09-08  5:33 ` wmi at google dot com
2015-10-14 16:27 ` vogt at linux dot vnet.ibm.com
2015-10-14 16:40 ` vogt at linux dot vnet.ibm.com
2015-10-14 16:41 ` vogt at linux dot vnet.ibm.com
2015-10-14 21:32 ` vogt at linux dot vnet.ibm.com
2015-10-14 21:35 ` vogt at linux dot vnet.ibm.com
2015-10-15 10:04 ` vogt at linux dot vnet.ibm.com
2015-10-15 10:13 ` vogt at linux dot vnet.ibm.com
2015-10-17 19:25 ` wmi at google dot com
2015-10-18 18:55 ` wmi at google dot com
2015-10-18 19:54 ` vogt at linux dot vnet.ibm.com
2015-10-18 20:58 ` vogt at linux dot vnet.ibm.com
2015-10-20  9:13 ` rguenth at gcc dot gnu.org
2015-10-20 13:13 ` vogt at linux dot vnet.ibm.com
2015-10-20 13:32 ` rguenther at suse dot de
2015-10-21  0:11 ` vogt at linux dot vnet.ibm.com
2015-10-21  0:12 ` vogt at linux dot vnet.ibm.com
2015-10-23 10:27 ` vogt at linux dot vnet.ibm.com
2015-10-26 15:25 ` rguenth at gcc dot gnu.org
2015-10-26 15:26 ` [Bug rtl-optimization/67443] [5 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).