[Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
@ 2014-10-30 10:04 ` belagod at gcc dot gnu.org
  2014-10-30 10:10 ` rguenth at gcc dot gnu.org
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-10-30 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Tejas Belagod <belagod at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64
             Status|UNCONFIRMED                 |NEW
      Known to work|                            |4.9.1
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2014-10-30
     Ever confirmed|0                           |1
            Summary|[4.9 Regression][AArch64]   |[5.0 Regression][AArch64]
                   |Failure to constant fold.   |Failure to constant fold.
      Known to fail|                            |5.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
  2014-10-30 10:04 ` [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold belagod at gcc dot gnu.org
@ 2014-10-30 10:10 ` rguenth at gcc dot gnu.org
  2014-11-04 11:41 ` belagod at gcc dot gnu.org
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-30 10:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
How does it look before RTL expansion with 4.9 vs. 5.0?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
  2014-10-30 10:04 ` [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold belagod at gcc dot gnu.org
  2014-10-30 10:10 ` rguenth at gcc dot gnu.org
@ 2014-11-04 11:41 ` belagod at gcc dot gnu.org
  2014-11-04 16:32 ` belagod at gcc dot gnu.org
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-11-04 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #2 from Tejas Belagod <belagod at gcc dot gnu.org> ---
foo.c.optimized:

5.0:

;;    prev block 0, next block 1, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  # .MEM_4 = VDEF <.MEM_3(D)>
  aD.1380 = *.LC0D.1387;
  # VUSE <.MEM_4>
  vect__6.6_13 = MEM[(intD.7 *)&aD.1380];
  # VUSE <.MEM_4>
  vect__6.6_10 = MEM[(intD.7 *)&aD.1380 + 16B];
  _27 = BIT_FIELD_REF <vect__6.6_13, 32, 0>;
  _16 = BIT_FIELD_REF <vect__6.6_10, 32, 0>;
  _15 = _16 + _27;
  _18 = BIT_FIELD_REF <vect__6.6_13, 32, 32>;
  _14 = BIT_FIELD_REF <vect__6.6_10, 32, 32>;
  _5 = _14 + _18;
  _12 = BIT_FIELD_REF <vect__6.6_13, 32, 64>;
  _2 = BIT_FIELD_REF <vect__6.6_10, 32, 64>;
  _29 = _2 + _12;
  _30 = BIT_FIELD_REF <vect__6.6_13, 32, 96>;
  _31 = BIT_FIELD_REF <vect__6.6_10, 32, 96>;
  _32 = _30 + _31;
  vect_sum_7.7_17 = {_15, _5, _29, _32};
  stmp_sum_7.8_19 = _15;
  stmp_sum_7.8_20 = _5;
  stmp_sum_7.8_21 = stmp_sum_7.8_19 + stmp_sum_7.8_20;
  stmp_sum_7.8_22 = _29;
  stmp_sum_7.8_23 = stmp_sum_7.8_21 + _29;
  stmp_sum_7.8_24 = _32;
  stmp_sum_7.8_25 = stmp_sum_7.8_23 + _32;
  vect_sum_7.9_26 = stmp_sum_7.8_25;
  # .MEM_9 = VDEF <.MEM_4>
  aD.1380 ={v} {CLOBBER};
  # VUSE <.MEM_9>
  return vect_sum_7.9_26;
;;    succ:       EXIT [100.0%] 


Very strange that vectorizer seems to be kicking in with -mgeneral-regs-only

4.9.2:

;;   basic block 2, loop depth 0, count 0, freq 1111, maybe hot
;;    prev block 0, next block 1, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  # .MEM_4 = VDEF <.MEM_3(D)>
  aD.1374[0] = 0;
  # .MEM_5 = VDEF <.MEM_4>
  aD.1374[1] = 1;
  # .MEM_6 = VDEF <.MEM_5>
  aD.1374[2] = 2;
  # .MEM_7 = VDEF <.MEM_6>
  aD.1374[3] = 3;
  # .MEM_8 = VDEF <.MEM_7>
  aD.1374[4] = 4;
  # .MEM_9 = VDEF <.MEM_8>
  aD.1374[5] = 5;
  # .MEM_10 = VDEF <.MEM_9>
  aD.1374[6] = 6;
  # VUSE <.MEM_10>
  _20 = aD.1374[0];
  # VUSE <.MEM_10>
  _29 = aD.1374[1];
  sum_30 = _20 + _29;
  # VUSE <.MEM_10>
  _36 = aD.1374[2];
  sum_37 = sum_30 + _36;
  # VUSE <.MEM_10>
  _43 = aD.1374[3];
  sum_44 = sum_37 + _43;
  # VUSE <.MEM_10>
  _50 = aD.1374[4];
  sum_51 = sum_44 + _50;
  # VUSE <.MEM_10>
  _57 = aD.1374[5];
  sum_58 = sum_51 + _57;
  # VUSE <.MEM_10>
  _64 = aD.1374[6];
  sum_65 = sum_58 + _64;
  sum_14 = sum_65 + 7;
  # .MEM_17 = VDEF <.MEM_10>
  aD.1374 ={v} {CLOBBER};
  # VUSE <.MEM_17>
  return sum_14;
;;    succ:       EXIT [100.0%] 

4.9's much saner.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2014-11-04 11:41 ` belagod at gcc dot gnu.org
@ 2014-11-04 16:32 ` belagod at gcc dot gnu.org
  2014-11-04 20:58 ` rguenth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-11-04 16:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #3 from Tejas Belagod <belagod at gcc dot gnu.org> ---
When I try 5.0 with -fno-tree-vectorize, I get:

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  # .MEM_4 = VDEF <.MEM_3(D)>
  aD.2496 = *.LC0D.2503;
  # VUSE <.MEM_4>
  _10 = aD.2496[0];
  # VUSE <.MEM_4>
  _22 = aD.2496[1];
  sum_23 = _10 + _22;
  # VUSE <.MEM_4>
  _29 = aD.2496[2];
  sum_30 = sum_23 + _29;
  # VUSE <.MEM_4>
  _36 = aD.2496[3];
  sum_37 = sum_30 + _36;
  # VUSE <.MEM_4>
  _43 = aD.2496[4];
  sum_44 = sum_37 + _43;
  # VUSE <.MEM_4>
  _50 = aD.2496[5];
  sum_51 = sum_44 + _50;
  # VUSE <.MEM_4>
  _57 = aD.2496[6];
  sum_58 = sum_51 + _57;
  # VUSE <.MEM_4>
  _6 = aD.2496[7];
  sum_7 = _6 + sum_58;
  # .MEM_9 = VDEF <.MEM_4>
  aD.2496 ={v} {CLOBBER};
  # VUSE <.MEM_9>
  return sum_7;
;;    succ:       EXIT

This:

  # .MEM_4 = VDEF <.MEM_3(D)>
  aD.2496 = *.LC0D.2503;

is what's mainly different from 4.9. 5.0 seems to use a TImode load to
initialize the stack with the const array.

(insn 10 9 11 (set (mem/c:TI (reg:DI 91) [1 aD.2496+0 S16 A128])
        (reg:TI 93)) foo.c:4 -1
     (nil))

(insn 11 10 12 (set (reg:TI 94)
        (mem/u/c:TI (plus:DI (reg:DI 92)
                (const_int 16 [0x10])) [0  S16 A32])) foo.c:4 -1
     (nil))

(insn 12 11 0 (set (mem/c:TI (plus:DI (reg:DI 91)
                (const_int 16 [0x10])) [1 aD.2496+16 S16 A128])
        (reg:TI 94)) foo.c:4 -1
     (nil))

;; sum_23 = _10 + _22;

(insn 13 12 14 (set (reg:SI 95)
        (mem/c:SI (plus:DI (reg/f:DI 68 virtual-stack-vars)
                (const_int -32 [0xffffffffffffffe0])) [1 aD.2496+0 S4 A128]))
foo.c:9 -1
     (nil))


When DSE wants to optimize it away, it fails to extract SI values from the
TImode stores:

**scanning insn=14
cselib lookup (reg/f:DI 64 sfp) => 3:3
cselib value 6:4299 0x2f6de50 (plus:DI (reg/f:DI 64 sfp)
    (const_int -28 [0xffffffffffffffe4]))

cselib lookup (plus:DI (reg/f:DI 64 sfp)
        (const_int -28 [0xffffffffffffffe4])) => 6:4299
  mem: (plus:DI (reg/f:DI 64 sfp)
    (const_int -28 [0xffffffffffffffe4]))

   after canon_rtx address: (plus:DI (reg/f:DI 64 sfp)
    (const_int -28 [0xffffffffffffffe4]))
  gid=0 offset=-28 
 processing const load gid=0[-28..-24)
trying to replace SImode load in insn 14 from TImode store in insn 10
(lshiftrt:DI (reg:DI 105)
    (const_int 32 [0x20]))

Hot cost: 8 (final)
 -- could not extract bits of stored value
removing from active insn=10 has store
mems_found = 0, cannot_delete = true
cselib lookup (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
            (const_int -28 [0xffffffffffffffe4])) [1 aD.2496+4 S4 A32]) => 0:0

**scanning insn=15
....


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2014-11-04 16:32 ` belagod at gcc dot gnu.org
@ 2014-11-04 20:58 ` rguenth at gcc dot gnu.org
  2014-11-20 12:44 ` rguenth at gcc dot gnu.org
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-04 20:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org
   Target Milestone|---                         |5.0

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.  Note that 4.9 simply didn't vectorize this.  And note that
unfortunately only FRE/PRE have a chance to optimize this but they do not run
that late.

Jakub wanted to enable FRE late for some other PR.

Tejas, can you try

Index: passes.def
===================================================================
--- passes.def  (revision 217035)
+++ passes.def  (working copy)
@@ -255,7 +255,7 @@
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_tracer);
-      NEXT_PASS (pass_dominator);
+      NEXT_PASS (pass_fre);
       NEXT_PASS (pass_strlen);
       NEXT_PASS (pass_vrp);
       /* The only const/copy propagation opportunities left after

?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2014-11-04 20:58 ` rguenth at gcc dot gnu.org
@ 2014-11-20 12:44 ` rguenth at gcc dot gnu.org
  2014-11-20 16:21 ` belagod at gcc dot gnu.org
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-20 12:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Should be fixed now - see PR63677.

*** This bug has been marked as a duplicate of bug 63677 ***


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2014-11-20 12:44 ` rguenth at gcc dot gnu.org
@ 2014-11-20 16:21 ` belagod at gcc dot gnu.org
  2014-11-20 16:49 ` pinskia at gcc dot gnu.org
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-11-20 16:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #7 from Tejas Belagod <belagod at gcc dot gnu.org> ---
I tried this, but it still doesn't seem to fold for aarch64.

So, here is the DOM trace for aarch64:

Optimizing statement a = *.LC0;
LKUP STMT a = *.LC0 with .MEM_3(D)
LKUP STMT *.LC0 = a with .MEM_3(D)
Optimizing statement vectp_a.5_1 = &a;
LKUP STMT vectp_a.5_1 = &a
==== ASGN vectp_a.5_1 = &a
Optimizing statement vect__6.6_13 = MEM[(int *)vectp_a.5_1];
  Replaced 'vectp_a.5_1' with constant '&aD.2604'
LKUP STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4
2>>> STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4
Optimizing statement vect_sum_7.7_6 = vect__6.6_13;
LKUP STMT vect_sum_7.7_6 = vect__6.6_13
==== ASGN vect_sum_7.7_6 = vect__6.6_13
Optimizing statement vectp_a.4_7 = vectp_a.5_1 + 16;
  Replaced 'vectp_a.5_1' with constant '&aD.2604'
LKUP STMT vectp_a.4_7 = &a pointer_plus_expr 16
2>>> STMT vectp_a.4_7 = &a pointer_plus_expr 16
==== ASGN vectp_a.4_7 = &MEM[(void *)&a + 16B]
Optimizing statement ivtmp_8 = 1;
LKUP STMT ivtmp_8 = 1
==== ASGN ivtmp_8 = 1
Optimizing statement vect__6.6_10 = MEM[(int *)vectp_a.4_7];
  Replaced 'vectp_a.4_7' with constant '&MEM[(voidD.39 *)&aD.2604 + 16B]'
  Folded to: vect__6.6_10 = MEM[(int *)&a + 16B];
LKUP STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4
2>>> STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4
Optimizing statement vect_sum_7.7_17 = vect_sum_7.7_6 + vect__6.6_10;
  Replaced 'vect_sum_7.7_6' with variable 'vect__6.6_13'
gimple_simplified to vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13;
  Folded to: vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13;
LKUP STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13
2>>> STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13
...

In x86's case, by this time, the constant vectors have been propagated and
folded into a constant vector:

Optimizing statement vect_cst_.12_23 = { 0, 1, 2, 3 };
LKUP STMT vect_cst_.12_23 = { 0, 1, 2, 3 }
==== ASGN vect_cst_.12_23 = { 0, 1, 2, 3 }
Optimizing statement vect_cst_.11_32 = { 4, 5, 6, 7 };
LKUP STMT vect_cst_.11_32 = { 4, 5, 6, 7 }
==== ASGN vect_cst_.11_32 = { 4, 5, 6, 7 }
Optimizing statement vectp.14_2 = &a[0];
LKUP STMT vectp.14_2 = &a[0]
==== ASGN vectp.14_2 = &a[0]
Optimizing statement MEM[(int *)vectp.14_2] = vect_cst_.12_23;
  Replaced 'vectp.14_2' with constant '&aD.1831[0]'
  Replaced 'vect_cst_.12_23' with constant '{ 0, 1, 2, 3 }'
  Folded to: MEM[(int *)&a] = { 0, 1, 2, 3 };
LKUP STMT MEM[(int *)&a] = { 0, 1, 2, 3 } with .MEM_3(D)
LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_3(D)
LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25
2>>> STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25
Optimizing statement vectp.14_21 = vectp.14_2 + 16;
  Replaced 'vectp.14_2' with constant '&aD.1831[0]'
LKUP STMT vectp.14_21 = &a[0] pointer_plus_expr 16
2>>> STMT vectp.14_21 = &a[0] pointer_plus_expr 16
==== ASGN vectp.14_21 = &MEM[(void *)&a + 16B]
Optimizing statement MEM[(int *)vectp.14_21] = vect_cst_.11_32;
  Replaced 'vectp.14_21' with constant '&MEM[(voidD.41 *)&aD.1831 + 16B]'
  Replaced 'vect_cst_.11_32' with constant '{ 4, 5, 6, 7 }'
  Folded to: MEM[(int *)&a + 16B] = { 4, 5, 6, 7 };
LKUP STMT MEM[(int *)&a + 16B] = { 4, 5, 6, 7 } with .MEM_25
LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_25
LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19
2>>> STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19
Optimizing statement vectp_a.5_22 = &a;
LKUP STMT vectp_a.5_22 = &a
==== ASGN vectp_a.5_22 = &a
Optimizing statement vect__13.6_20 = MEM[(int *)vectp_a.5_22];
  Replaced 'vectp_a.5_22' with constant '&aD.1831'
LKUP STMT vect__13.6_20 = MEM[(int *)&a] with .MEM_19
FIND: { 0, 1, 2, 3 }
  Replaced redundant expr '# VUSE <.MEM_19>
MEM[(intD.6 *)&aD.1831]' with '{ 0, 1, 2, 3 }'
==== ASGN vect__13.6_20 = { 0, 1, 2, 3 }
Optimizing statement vect_sum_14.7_13 = vect__13.6_20;
  Replaced 'vect__13.6_20' with constant '{ 0, 1, 2, 3 }'
LKUP STMT vect_sum_14.7_13 = { 0, 1, 2, 3 }
==== ASGN vect_sum_14.7_13 = { 0, 1, 2, 3 }
....

While the MEM[vect_ptr + CST] gets replaced correctly by 'a', it doesn't seem
to figure out that the literal pool load 'a = *LC0' is nothing but 

 vect_cst_.12_23 = { 0, 1, 2, 3 }; and vect_cst_.11_32 = { 4, 5, 6, 7 };

which is the only major difference between how the const vector is initialized
in x86 and aarch64. Is DOM not able to understand 'a = *LC0'?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2014-11-20 16:21 ` belagod at gcc dot gnu.org
@ 2014-11-20 16:49 ` pinskia at gcc dot gnu.org
  2014-11-21  8:48 ` rguenth at gcc dot gnu.org
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: pinskia at gcc dot gnu.org @ 2014-11-20 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Tejas Belagod from comment #7)
> I tried this, but it still doesn't seem to fold for aarch64.
> 
> So, here is the DOM trace for aarch64:
> 
> Optimizing statement a = *.LC0;

Why do we get LC0 in the first place?  It seems like it is happening because of
some cost model issue with MOVECOST.

> LKUP STMT a = *.LC0 with .MEM_3(D)
> LKUP STMT *.LC0 = a with .MEM_3(D)
> Optimizing statement vectp_a.5_1 = &a;
> LKUP STMT vectp_a.5_1 = &a
> ==== ASGN vectp_a.5_1 = &a
> Optimizing statement vect__6.6_13 = MEM[(int *)vectp_a.5_1];
>   Replaced 'vectp_a.5_1' with constant '&aD.2604'
> LKUP STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4
> 2>>> STMT vect__6.6_13 = MEM[(int *)&a] with .MEM_4
> Optimizing statement vect_sum_7.7_6 = vect__6.6_13;
> LKUP STMT vect_sum_7.7_6 = vect__6.6_13
> ==== ASGN vect_sum_7.7_6 = vect__6.6_13
> Optimizing statement vectp_a.4_7 = vectp_a.5_1 + 16;
>   Replaced 'vectp_a.5_1' with constant '&aD.2604'
> LKUP STMT vectp_a.4_7 = &a pointer_plus_expr 16
> 2>>> STMT vectp_a.4_7 = &a pointer_plus_expr 16
> ==== ASGN vectp_a.4_7 = &MEM[(void *)&a + 16B]
> Optimizing statement ivtmp_8 = 1;
> LKUP STMT ivtmp_8 = 1
> ==== ASGN ivtmp_8 = 1
> Optimizing statement vect__6.6_10 = MEM[(int *)vectp_a.4_7];
>   Replaced 'vectp_a.4_7' with constant '&MEM[(voidD.39 *)&aD.2604 + 16B]'
>   Folded to: vect__6.6_10 = MEM[(int *)&a + 16B];
> LKUP STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4
> 2>>> STMT vect__6.6_10 = MEM[(int *)&a + 16B] with .MEM_4
> Optimizing statement vect_sum_7.7_17 = vect_sum_7.7_6 + vect__6.6_10;
>   Replaced 'vect_sum_7.7_6' with variable 'vect__6.6_13'
> gimple_simplified to vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13;
>   Folded to: vect_sum_7.7_17 = vect__6.6_10 + vect__6.6_13;
> LKUP STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13
> 2>>> STMT vect_sum_7.7_17 = vect__6.6_10 plus_expr vect__6.6_13
> ...
> 
> In x86's case, by this time, the constant vectors have been propagated and
> folded into a constant vector:
> 
> Optimizing statement vect_cst_.12_23 = { 0, 1, 2, 3 };
> LKUP STMT vect_cst_.12_23 = { 0, 1, 2, 3 }
> ==== ASGN vect_cst_.12_23 = { 0, 1, 2, 3 }
> Optimizing statement vect_cst_.11_32 = { 4, 5, 6, 7 };
> LKUP STMT vect_cst_.11_32 = { 4, 5, 6, 7 }
> ==== ASGN vect_cst_.11_32 = { 4, 5, 6, 7 }
> Optimizing statement vectp.14_2 = &a[0];
> LKUP STMT vectp.14_2 = &a[0]
> ==== ASGN vectp.14_2 = &a[0]
> Optimizing statement MEM[(int *)vectp.14_2] = vect_cst_.12_23;
>   Replaced 'vectp.14_2' with constant '&aD.1831[0]'
>   Replaced 'vect_cst_.12_23' with constant '{ 0, 1, 2, 3 }'
>   Folded to: MEM[(int *)&a] = { 0, 1, 2, 3 };
> LKUP STMT MEM[(int *)&a] = { 0, 1, 2, 3 } with .MEM_3(D)
> LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_3(D)
> LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25
> 2>>> STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_25
> Optimizing statement vectp.14_21 = vectp.14_2 + 16;
>   Replaced 'vectp.14_2' with constant '&aD.1831[0]'
> LKUP STMT vectp.14_21 = &a[0] pointer_plus_expr 16
> 2>>> STMT vectp.14_21 = &a[0] pointer_plus_expr 16
> ==== ASGN vectp.14_21 = &MEM[(void *)&a + 16B]
> Optimizing statement MEM[(int *)vectp.14_21] = vect_cst_.11_32;
>   Replaced 'vectp.14_21' with constant '&MEM[(voidD.41 *)&aD.1831 + 16B]'
>   Replaced 'vect_cst_.11_32' with constant '{ 4, 5, 6, 7 }'
>   Folded to: MEM[(int *)&a + 16B] = { 4, 5, 6, 7 };
> LKUP STMT MEM[(int *)&a + 16B] = { 4, 5, 6, 7 } with .MEM_25
> LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_25
> LKUP STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19
> 2>>> STMT { 4, 5, 6, 7 } = MEM[(int *)&a + 16B] with .MEM_19
> Optimizing statement vectp_a.5_22 = &a;
> LKUP STMT vectp_a.5_22 = &a
> ==== ASGN vectp_a.5_22 = &a
> Optimizing statement vect__13.6_20 = MEM[(int *)vectp_a.5_22];
>   Replaced 'vectp_a.5_22' with constant '&aD.1831'
> LKUP STMT vect__13.6_20 = MEM[(int *)&a] with .MEM_19
> FIND: { 0, 1, 2, 3 }
>   Replaced redundant expr '# VUSE <.MEM_19>
> MEM[(intD.6 *)&aD.1831]' with '{ 0, 1, 2, 3 }'
> ==== ASGN vect__13.6_20 = { 0, 1, 2, 3 }
> Optimizing statement vect_sum_14.7_13 = vect__13.6_20;
>   Replaced 'vect__13.6_20' with constant '{ 0, 1, 2, 3 }'
> LKUP STMT vect_sum_14.7_13 = { 0, 1, 2, 3 }
> ==== ASGN vect_sum_14.7_13 = { 0, 1, 2, 3 }
> ....
> 
> While the MEM[vect_ptr + CST] gets replaced correctly by 'a', it doesn't
> seem to figure out that the literal pool load 'a = *LC0' is nothing but 
> 
>  vect_cst_.12_23 = { 0, 1, 2, 3 }; and vect_cst_.11_32 = { 4, 5, 6, 7 };
> 
> which is the only major difference between how the const vector is
> initialized in x86 and aarch64. Is DOM not able to understand 'a = *LC0'?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2014-11-20 16:49 ` pinskia at gcc dot gnu.org
@ 2014-11-21  8:48 ` rguenth at gcc dot gnu.org
  2014-11-21 10:17 ` rguenth at gcc dot gnu.org
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-21  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |ASSIGNED
         Resolution|DUPLICATE                   |---
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, ISTR this from another bug where I wanted to investigate but never got
along doing that ...

Well for DOM the issue here is that it doesn't handle the aggregate init

      a = *.LC0;

followed by a partial read from a.

But I wanted to analyze why FRE can't handle this either - sth I'll now do
with a modified pass pipeline (though fixing that won't help you here).


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2014-11-21  8:48 ` rguenth at gcc dot gnu.org
@ 2014-11-21 10:17 ` rguenth at gcc dot gnu.org
  2014-11-21 10:26 ` belagod at gcc dot gnu.org
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-21 10:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so I have a patch that works until we get to try constant folding
a vector load from an array of elements.  There both native_encode_expr
and fold_ctor_reference fail (the first because it doesn't handle
CONSTRUCTORs yet the latter because it doesn't handle multi-field/index
results).  The idea was to get CONSTRUCTOR support into native_encode_expr
which would then handle the fully-constant cases (but not initializers
with constant addresses like &global-var).

Of course it wouldn't help as FRE isn't run late.  Hah.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2014-11-21 10:17 ` rguenth at gcc dot gnu.org
@ 2014-11-21 10:26 ` belagod at gcc dot gnu.org
  2014-11-21 10:36 ` rguenther at suse dot de
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-11-21 10:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #11 from Tejas Belagod <belagod at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Tejas Belagod from comment #7)
> > I tried this, but it still doesn't seem to fold for aarch64.
> > 
> > So, here is the DOM trace for aarch64:
> > 
> > Optimizing statement a = *.LC0;
> 
> Why do we get LC0 in the first place?  It seems like it is happening because
> of some cost model issue with MOVECOST.
> 

Can the cost model affect something as early as gimple?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2014-11-21 10:26 ` belagod at gcc dot gnu.org
@ 2014-11-21 10:36 ` rguenther at suse dot de
  2014-11-21 10:54 ` belagod at gcc dot gnu.org
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2014-11-21 10:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 21 Nov 2014, belagod at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #11 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> (In reply to Andrew Pinski from comment #8)
> > (In reply to Tejas Belagod from comment #7)
> > > I tried this, but it still doesn't seem to fold for aarch64.
> > > 
> > > So, here is the DOM trace for aarch64:
> > > 
> > > Optimizing statement a = *.LC0;
> > 
> > Why do we get LC0 in the first place?  It seems like it is happening because
> > of some cost model issue with MOVECOST.
> > 
> 
> Can the cost model affect something as early as gimple?

Through CLEAR_RATIO and can_move_by_pieces (and for complex stuff
initializer_constant_valid_p).  I think it's mostly can_move_by_pieces
here.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2014-11-21 10:36 ` rguenther at suse dot de
@ 2014-11-21 10:54 ` belagod at gcc dot gnu.org
  2014-11-21 11:25 ` jgreenhalgh at gcc dot gnu.org
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-11-21 10:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #13 from Tejas Belagod <belagod at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #12)
> On Fri, 21 Nov 2014, belagod at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> > 
> > --- Comment #11 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> > (In reply to Andrew Pinski from comment #8)
> > > (In reply to Tejas Belagod from comment #7)
> > > > I tried this, but it still doesn't seem to fold for aarch64.
> > > > 
> > > > So, here is the DOM trace for aarch64:
> > > > 
> > > > Optimizing statement a = *.LC0;
> > > 
> > > Why do we get LC0 in the first place?  It seems like it is happening because
> > > of some cost model issue with MOVECOST.
> > > 
> > 
> > Can the cost model affect something as early as gimple?
> 
> Through CLEAR_RATIO and can_move_by_pieces (and for complex stuff
> initializer_constant_valid_p).  I think it's mostly can_move_by_pieces
> here.

Ah, jgreenhalgh just did some move_by_pieces restructuring recently.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2014-11-21 10:54 ` belagod at gcc dot gnu.org
@ 2014-11-21 11:25 ` jgreenhalgh at gcc dot gnu.org
  2014-11-21 18:20 ` jgreenhalgh at gcc dot gnu.org
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2014-11-21 11:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #14 from jgreenhalgh at gcc dot gnu.org ---
Yes, we turn move_by_pieces off for AArch64 so we can use our own expander for
block moves. This has a number of negative side-effects where optimizers decide
that if you're not using move_by_pieces, block moves must be expensive - this
is bogus! (see Joern's email on the thread hookizing BY_PIECES_P
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00197.html ).


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2014-11-21 11:25 ` jgreenhalgh at gcc dot gnu.org
@ 2014-11-21 18:20 ` jgreenhalgh at gcc dot gnu.org
  2014-11-24  8:52 ` rguenther at suse dot de
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2014-11-21 18:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #15 from jgreenhalgh at gcc dot gnu.org ---
I wonder whether the call to can_move_by_pieces in gimplify.c is bogus. It
seems to me that gimplify.c really wants to know whether it is a good idea to
scalarize the constructor copy - nothing to do with whether we will copy it by
pieces or as a block or otherwise.

If that is what gimplify.c wants to ask, then we can use the SRA parameters I
added last month to get a better answer.

The patch would look something like the below, it won't "fix" this testcase -
but it would allow you to revert to the 4.9 behaviour by tweaking the parameter
value.

It *feels* like the right thing to do, but I don't know the code and I
might be wrong. An alternate approach would be to introduce a new target
hook which returns true if scalarizing a copy is smarter than leaving it
as an aggregate, but that sounds so close to what SRA is supposed to control
as to end up indistinguishable from this patch.

Any thoughts? Or should I just propose this patch on gcc-patches. (It passes an
x86_64 bootstrap with no issues).

---
2014-11-21  James Greenhalgh  <james.greenhalgh@arm.com>

    * gimplify.c (gimplify_init_constructor): Scalarize
    constructor copy based on SRA parameters, rather than
    can_move_by_pieces.
---

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8e3dd83..be51ce7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "gimple-low.h"
 #include "cilk.h"
+#include "params.h"

 #include "langhooks-def.h"    /* FIXME: for lhd_set_decl_assembler_name */
 #include "tree-pass.h"        /* FIXME: only for PROP_gimple_any */
@@ -3895,7 +3896,6 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
                       DECL_ATTRIBUTES (current_function_decl))))
       {
         HOST_WIDE_INT size = int_size_in_bytes (type);
-        unsigned int align;

         /* ??? We can still get unbounded array types, at least
            from the C++ front end.  This seems wrong, but attempt
@@ -3907,20 +3907,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
           TREE_TYPE (ctor) = type = TREE_TYPE (object);
           }

-        /* Find the maximum alignment we can assume for the object.  */
-        /* ??? Make use of DECL_OFFSET_ALIGN.  */
-        if (DECL_P (object))
-          align = DECL_ALIGN (object);
-        else
-          align = TYPE_ALIGN (type);
-
         /* Do a block move either if the size is so small as to make
            each individual move a sub-unit move on average, or if it
-           is so large as to make individual moves inefficient.  */
+           is so large as to make individual moves inefficient.  Reuse
+           the same costs logic as we use in the SRA passes.  */
+            unsigned max_scalarization_size
+          = optimize_function_for_size_p (cfun)
+            ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE)
+        : PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED);
+
         if (size > 0
         && num_nonzero_elements > 1
         && (size < num_nonzero_elements
-            || !can_move_by_pieces (size, align)))
+            || size > max_scalarization_size))
           {
         if (notify_temp_creation)
           return GS_ERROR;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2014-11-21 18:20 ` jgreenhalgh at gcc dot gnu.org
@ 2014-11-24  8:52 ` rguenther at suse dot de
  2014-11-24 11:16 ` belagod at gcc dot gnu.org
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2014-11-24  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 21 Nov 2014, jgreenhalgh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #15 from jgreenhalgh at gcc dot gnu.org ---
> I wonder whether the call to can_move_by_pieces in gimplify.c is bogus. It
> seems to me that gimplify.c really wants to know whether it is a good idea to
> scalarize the constructor copy - nothing to do with whether we will copy it by
> pieces or as a block or otherwise.
> 
> If that is what gimplify.c wants to ask, then we can use the SRA parameters I
> added last month to get a better answer.
> 
> The patch would look something like the below, it won't "fix" this testcase -
> but it would allow you to revert to the 4.9 behaviour by tweaking the parameter
> value.
> 
> It *feels* like the right thing to do, but I don't know the code and I
> might be wrong. An alternate approach would be to introduce a new target
> hook which returns true if scalarizing a copy is smarter than leaving it
> as an aggregate, but that sounds so close to what SRA is supposed to control
> as to end up indistinguishable from this patch.
> 
> Any thoughts? Or should I just propose this patch on gcc-patches. (It passes an
> x86_64 bootstrap with no issues).

Certainly removing the alignment is not going to fly - we'd generate
very bad code for strict-align targets for initializing packed
structs by pieces for example.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2014-11-24  8:52 ` rguenther at suse dot de
@ 2014-11-24 11:16 ` belagod at gcc dot gnu.org
  2014-11-24 11:31 ` rguenther at suse dot de
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2014-11-24 11:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #17 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> -
>  	    /* Do a block move either if the size is so small as to make
>  	       each individual move a sub-unit move on average, or if it
> -	       is so large as to make individual moves inefficient.  */
> +	       is so large as to make individual moves inefficient.  Reuse
> +	       the same costs logic as we use in the SRA passes.  */
> +            unsigned max_scalarization_size
> +	      = optimize_function_for_size_p (cfun)
> +	        ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE)
> +		: PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED);
> +
>  	    if (size > 0
>  		&& num_nonzero_elements > 1
>  		&& (size < num_nonzero_elements
> -		    || !can_move_by_pieces (size, align)))
> +		    || size > max_scalarization_size))
>  	      {
>  		if (notify_temp_creation)
>  		  return GS_ERROR;

I think both move_by_pieces and SRA can co-exist here:

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8e3dd83..be51ce7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "gimple-low.h"
 #include "cilk.h"
+#include "params.h"

 #include "langhooks-def.h"    /* FIXME: for lhd_set_decl_assembler_name */
 #include "tree-pass.h"        /* FIXME: only for PROP_gimple_any */
@@ -3895,7 +3896,6 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
                       DECL_ATTRIBUTES (current_function_decl))))
       {
         HOST_WIDE_INT size = int_size_in_bytes (type);
        unsigned int align;

         /* ??? We can still get unbounded array types, at least
            from the C++ front end.  This seems wrong, but attempt
@@ -3907,20 +3907,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
           TREE_TYPE (ctor) = type = TREE_TYPE (object);
           }

        /* Find the maximum alignment we can assume for the object.  */
        /* ??? Make use of DECL_OFFSET_ALIGN.  */
        if (DECL_P (object))
          align = DECL_ALIGN (object);
        else
          align = TYPE_ALIGN (type);

         /* Do a block move either if the size is so small as to make
            each individual move a sub-unit move on average, or if it
-           is so large as to make individual moves inefficient.  */
+           is so large as to make individual moves inefficient.  Reuse
+           the same costs logic as we use in the SRA passes.  */
+            unsigned max_scalarization_size
+          = optimize_function_for_size_p (cfun)
+            ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE)
+        : PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED);
+
         if (size > 0
         && num_nonzero_elements > 1
         && (size < num_nonzero_elements
+            || size > max_scalarization_size
            || !can_move_by_pieces (size, align))
           {
         if (notify_temp_creation)
           return GS_ERROR;

If it isn't profitable to do an SRA, we can fall-back to the backend hook to
move it by pieces. This way, I think we'll have move opportunity for
optimization.
>From gcc-bugs-return-468231-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Nov 24 11:20:46 2014
Return-Path: <gcc-bugs-return-468231-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 10349 invoked by alias); 24 Nov 2014 11:20:46 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 10310 invoked by uid 48); 24 Nov 2014 11:20:41 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/64031] (un-)conditional execution state is not preserved by PRE/sink
Date: Mon, 24 Nov 2014 11:20:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 4.9.3
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: keywords bug_status cf_reconfirmed_on blocked short_desc everconfirmed
Message-ID: <bug-64031-4-MYEACF7ugX@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64031-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64031-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-11/txt/msg02703.txt.bz2
Content-length: 1378

https://gcc.gnu.org/bugzilla/show_bug.cgi?idd031

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-11-24
             Blocks|                            |53947
            Summary|Vectorization of max/min is |(un-)conditional execution
                   |not robust enough           |state is not preserved by
                   |                            |PRE/sink
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that PRE optimizes this to

  f2_11 = f2_10 * f2_10;
  if (f2_10 < f2_11)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:
  pretmp_25 = f2_11 * f2_11;

  <bb 5>:
  # prephitmp_26 = PHI <f2_11(3), pretmp_25(4)>
  *_9 = prephitmp_26;

and f2_11 * f2_11 may trap thus ifcvt refuses to execute it unconditionally
(but only PRE made it executed conditionally).

Thus "confirmed" that both PRE and code sinking can make stmts executed
conditionally while they were not so before which can pessimize transforms
done by later passes such as LIM and if-conversion.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2014-11-24 11:16 ` belagod at gcc dot gnu.org
@ 2014-11-24 11:31 ` rguenther at suse dot de
  2014-11-24 12:01 ` jgreenhalgh at gcc dot gnu.org
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2014-11-24 11:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 24 Nov 2014, belagod at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #17 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> > -
> >  	    /* Do a block move either if the size is so small as to make
> >  	       each individual move a sub-unit move on average, or if it
> > -	       is so large as to make individual moves inefficient.  */
> > +	       is so large as to make individual moves inefficient.  Reuse
> > +	       the same costs logic as we use in the SRA passes.  */
> > +            unsigned max_scalarization_size
> > +	      = optimize_function_for_size_p (cfun)
> > +	        ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE)
> > +		: PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED);
> > +
> >  	    if (size > 0
> >  		&& num_nonzero_elements > 1
> >  		&& (size < num_nonzero_elements
> > -		    || !can_move_by_pieces (size, align)))
> > +		    || size > max_scalarization_size))
> >  	      {
> >  		if (notify_temp_creation)
> >  		  return GS_ERROR;
> 
> I think both move_by_pieces and SRA can co-exist here:
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 8e3dd83..be51ce7 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "omp-low.h"
>  #include "gimple-low.h"
>  #include "cilk.h"
> +#include "params.h"
> 
>  #include "langhooks-def.h"    /* FIXME: for lhd_set_decl_assembler_name */
>  #include "tree-pass.h"        /* FIXME: only for PROP_gimple_any */
> @@ -3895,7 +3896,6 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
>                        DECL_ATTRIBUTES (current_function_decl))))
>        {
>          HOST_WIDE_INT size = int_size_in_bytes (type);
>         unsigned int align;
> 
>          /* ??? We can still get unbounded array types, at least
>             from the C++ front end.  This seems wrong, but attempt
> @@ -3907,20 +3907,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
>            TREE_TYPE (ctor) = type = TREE_TYPE (object);
>            }
> 
>         /* Find the maximum alignment we can assume for the object.  */
>         /* ??? Make use of DECL_OFFSET_ALIGN.  */
>         if (DECL_P (object))
>           align = DECL_ALIGN (object);
>         else
>           align = TYPE_ALIGN (type);
> 
>          /* Do a block move either if the size is so small as to make
>             each individual move a sub-unit move on average, or if it
> -           is so large as to make individual moves inefficient.  */
> +           is so large as to make individual moves inefficient.  Reuse
> +           the same costs logic as we use in the SRA passes.  */
> +            unsigned max_scalarization_size
> +          = optimize_function_for_size_p (cfun)
> +            ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE)
> +        : PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED);
> +
>          if (size > 0
>          && num_nonzero_elements > 1
>          && (size < num_nonzero_elements
> +            || size > max_scalarization_size
>             || !can_move_by_pieces (size, align))
>            {
>          if (notify_temp_creation)
>            return GS_ERROR;
> 
> If it isn't profitable to do an SRA, we can fall-back to the backend hook to
> move it by pieces. This way, I think we'll have move opportunity for
> optimization.

But that wouldn't fix the AARCH64 case as the backend says "no" here
anyway?
>From gcc-bugs-return-468243-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Nov 24 11:32:32 2014
Return-Path: <gcc-bugs-return-468243-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 28599 invoked by alias); 24 Nov 2014 11:32:31 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 28572 invoked by uid 48); 24 Nov 2014 11:32:28 -0000
From: "kugan at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/64045] New: fortran.dg/pr45636.f90 fails for AArch64 - memcpy and memset are not combined
Date: Mon, 24 Nov 2014 11:32:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: kugan at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-64045-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-11/txt/msg02715.txt.bz2
Content-length: 548

https://gcc.gnu.org/bugzilla/show_bug.cgi?idd045

            Bug ID: 64045
           Summary: fortran.dg/pr45636.f90 fails for AArch64 - memcpy and
                    memset are not combined
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kugan at gcc dot gnu.org

In current trunk, fortran.dg/pr45636.f90 fails for AArch64 as memcpy and memset
are not combined.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2014-11-24 11:31 ` rguenther at suse dot de
@ 2014-11-24 12:01 ` jgreenhalgh at gcc dot gnu.org
  2014-11-24 12:19 ` rguenther at suse dot de
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2014-11-24 12:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #19 from jgreenhalgh at gcc dot gnu.org ---
(In reply to rguenther@suse.de from comment #16)
> Certainly removing the alignment is not going to fly - we'd generate
> very bad code for strict-align targets for initializing packed
> structs by pieces for example.

Surely this is already true?

The alignment here is what we can assume for the entire aggregate - the
previous check was can_move_by_pieces, which doesn't check the components of
the aggregate.

For a well-aligned aggreagate of the appropriate size, can_move_by_pieces will
return true, and we'll initialize the packed struct by its components
regardless of the component alignment.

Or am I missing something?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2014-11-24 12:01 ` jgreenhalgh at gcc dot gnu.org
@ 2014-11-24 12:19 ` rguenther at suse dot de
  2014-11-24 13:45 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2014-11-24 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #20 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 24 Nov 2014, jgreenhalgh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #19 from jgreenhalgh at gcc dot gnu.org ---
> (In reply to rguenther@suse.de from comment #16)
> > Certainly removing the alignment is not going to fly - we'd generate
> > very bad code for strict-align targets for initializing packed
> > structs by pieces for example.
> 
> Surely this is already true?
> 
> The alignment here is what we can assume for the entire aggregate - the
> previous check was can_move_by_pieces, which doesn't check the components of
> the aggregate.
> 
> For a well-aligned aggreagate of the appropriate size, can_move_by_pieces will
> return true, and we'll initialize the packed struct by its components
> regardless of the component alignment.
> 
> Or am I missing something?

I thought that AARCH64 fails to do the init by pieces exactly because
can_move_by_pieces_p say so.  Adding another condition that may also
reject it won't help, no?

Richard.

>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2014-11-24 12:19 ` rguenther at suse dot de
@ 2014-11-24 13:45 ` rguenth at gcc dot gnu.org
  2014-11-24 13:45 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-24 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
CCing Martin for the thoughts on SRA.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2014-11-24 13:45 ` rguenth at gcc dot gnu.org
@ 2014-11-24 13:45 ` rguenth at gcc dot gnu.org
  2014-11-24 14:07 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-24 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm not tackling exactly this bug, still planning to commit the SCCVN
improvement.

Note that generally I believe we do lower to piecewise init way to early
(see bugs about failing to re-combine those piecewise stores).  In fact
this lowering is exactly something I would expect SRA to perform after
cost/benefit analysis.  Enabling FRE to optimize things the same as if
initialized piecewise should make that possible.  It would also enable
the possibility to simply replace references to the target aggregate with
references to the constant pool entry if we can compute the target aggregate
is never changed besides its initialization.

Thus "real" work should probably concentrate on getting SRA to decompose 'a'
(SRA currently does not handle arrays at all, besides arrays of size 1).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2014-11-24 13:45 ` rguenth at gcc dot gnu.org
@ 2014-11-24 14:07 ` rguenth at gcc dot gnu.org
  2015-02-07 10:55 ` [Bug target/63679] [5 " jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-24 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Mon Nov 24 14:07:18 2014
New Revision: 218019

URL: https://gcc.gnu.org/viewcvs?rev=218019&root=gcc&view=rev
Log:
2014-11-24  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/63679
    * tree-ssa-sccvn.c: Include ipa-ref.h, plugin-api.h and cgraph.h.
    (copy_reference_ops_from_ref): Fix non-constant ADDR_EXPR case
    to properly leave off at -1.
    (fully_constant_vn_reference_p): Generalize folding from
    constant initializers.
    (vn_reference_lookup_3): When looking through aggregate copies
    handle offsetted reads and try simplifying the result to
    a constant.
    * gimple-fold.h (fold_ctor_reference): Export.
    * gimple-fold.c (fold_ctor_reference): Likewise.

    * gcc.dg/tree-ssa/ssa-fre-42.c: New testcase.
    * gcc.dg/tree-ssa/20030807-5.c: Avoid folding read from global to zero.
    * gcc.target/i386/ssetype-1.c: Likewise.
    * gcc.target/i386/ssetype-3.c: Likewise.
    * gcc.target/i386/ssetype-5.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-42.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimple-fold.c
    trunk/gcc/gimple-fold.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/20030807-5.c
    trunk/gcc/testsuite/gcc.target/i386/ssetype-1.c
    trunk/gcc/testsuite/gcc.target/i386/ssetype-3.c
    trunk/gcc/testsuite/gcc.target/i386/ssetype-5.c
    trunk/gcc/tree-ssa-sccvn.c


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2014-11-24 14:07 ` rguenth at gcc dot gnu.org
@ 2015-02-07 10:55 ` jakub at gcc dot gnu.org
  2015-02-09  9:08 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-07 10:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[5.0 Regression][AArch64]   |[5 Regression][AArch64]
                   |Failure to constant fold.   |Failure to constant fold.

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So fixed?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2015-02-07 10:55 ` [Bug target/63679] [5 " jakub at gcc dot gnu.org
@ 2015-02-09  9:08 ` rguenth at gcc dot gnu.org
  2015-02-09 10:10 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-09  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
No, aarch64 still commits the initializer to memory so the patch doesn't help
it for the testcase in the end.  It still improves things a bit.

I'm quite sure we can't fix this in a target independent way for GCC 5, thus
the only chance is to make aarch64 not commit the initializer to memory.
That is also the real regression (aarch64 changed to commit the initializer
to memory).

aarch64 is secondary target only thus this missed-optimization shouldn't block
the release.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2015-02-09  9:08 ` rguenth at gcc dot gnu.org
@ 2015-02-09 10:10 ` jakub at gcc dot gnu.org
  2015-02-09 12:20 ` belagod at gcc dot gnu.org
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-09 10:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #26 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'd say it is a bug in the backend, if you want to override some expansion,
you'd better add some target hook for that, rather than messing up with
MOVE_BY_PIECES and setting it to clearly bogus values.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2015-02-09 10:10 ` jakub at gcc dot gnu.org
@ 2015-02-09 12:20 ` belagod at gcc dot gnu.org
  2015-02-09 13:17 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2015-02-09 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #27 from Tejas Belagod <belagod at gcc dot gnu.org> ---
We'd want to scalarize this early preferably in SRA as it gives a chance to
passes like vectorization to vectorize more loops. I checked that
sra-max-scalarization-Osize{-Ospeed} had no effect on scalarizing 'a = *.LC0'
and that's one of the cost functions that affects scalarization. Also, isn't it
difficult to decide scalarization of aggregates based on how it might be
optimized in the future passes as accurately predicting the transformations it
could go through in subsequent passes is not easy?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2015-02-09 12:20 ` belagod at gcc dot gnu.org
@ 2015-02-09 13:17 ` rguenther at suse dot de
  2015-02-09 13:34 ` belagod at gcc dot gnu.org
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2015-02-09 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #28 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 9 Feb 2015, belagod at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #27 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> We'd want to scalarize this early preferably in SRA as it gives a chance to
> passes like vectorization to vectorize more loops. I checked that
> sra-max-scalarization-Osize{-Ospeed} had no effect on scalarizing 'a = *.LC0'

because SRA can't scalarize 'a = *.LC0'.  But yes, ideally we'd change
gimplification to never decompose initializers but have SRA do it.
But that's of course not a GCC 5 thing.

It has the advantage of splitting the initialization only when it is
(likely) profitable and otherwise leave it to the target to decide
how to expand the initialization (and it opens up the possibility
to directly use a constant-pool entry if the data is readonly).

> and that's one of the cost functions that affects scalarization. Also, 
> isn't it difficult to decide scalarization of aggregates based on how it 
> might be optimized in the future passes as accurately predicting the 
> transformations it could go through in subsequent passes is not easy?

Of course.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (27 preceding siblings ...)
  2015-02-09 13:17 ` rguenther at suse dot de
@ 2015-02-09 13:34 ` belagod at gcc dot gnu.org
  2015-03-12 16:53 ` [Bug target/63679] [5 / 6 " ramana at gcc dot gnu.org
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: belagod at gcc dot gnu.org @ 2015-02-09 13:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #29 from Tejas Belagod <belagod at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #28)
> On Mon, 9 Feb 2015, belagod at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> > 
> > --- Comment #27 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> > We'd want to scalarize this early preferably in SRA as it gives a chance to
> > passes like vectorization to vectorize more loops. I checked that
> > sra-max-scalarization-Osize{-Ospeed} had no effect on scalarizing 'a = *.LC0'
> 
> because SRA can't scalarize 'a = *.LC0'.  But yes, ideally we'd change
> gimplification to never decompose initializers but have SRA do it.
> But that's of course not a GCC 5 thing.
> 
> It has the advantage of splitting the initialization only when it is
> (likely) profitable and otherwise leave it to the target to decide
> how to expand the initialization (and it opens up the possibility
> to directly use a constant-pool entry if the data is readonly).

Which cost function(s) control this profitability of early splitting?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5 / 6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (28 preceding siblings ...)
  2015-02-09 13:34 ` belagod at gcc dot gnu.org
@ 2015-03-12 16:53 ` ramana at gcc dot gnu.org
  2015-07-28 17:15 ` [Bug target/63679] [5/6 " alalaw01 at gcc dot gnu.org
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-03-12 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ramana at gcc dot gnu.org
   Target Milestone|5.0                         |6.0
            Summary|[5 Regression][AArch64]     |[5 / 6 Regression][AArch64]
                   |Failure to constant fold.   |Failure to constant fold.

--- Comment #31 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #30)
> On Mon, 9 Feb 2015, belagod at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> > 
> > --- Comment #29 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> > (In reply to rguenther@suse.de from comment #28)
> > > On Mon, 9 Feb 2015, belagod at gcc dot gnu.org wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> > > > 
> > > > --- Comment #27 from Tejas Belagod <belagod at gcc dot gnu.org> ---
> > > > We'd want to scalarize this early preferably in SRA as it gives a chance to
> > > > passes like vectorization to vectorize more loops. I checked that
> > > > sra-max-scalarization-Osize{-Ospeed} had no effect on scalarizing 'a = *.LC0'
> > > 
> > > because SRA can't scalarize 'a = *.LC0'.  But yes, ideally we'd change
> > > gimplification to never decompose initializers but have SRA do it.
> > > But that's of course not a GCC 5 thing.
> > > 
> > > It has the advantage of splitting the initialization only when it is
> > > (likely) profitable and otherwise leave it to the target to decide
> > > how to expand the initialization (and it opens up the possibility
> > > to directly use a constant-pool entry if the data is readonly).
> > 
> > Which cost function(s) control this profitability of early splitting?
> 
> See gimplify_init_constructor and callees.

 Given all the comments above this sounds like a 6.0 fix - I'm just making this
a 6.0 target - we can always change it back if someone can fix it in time for 5 

Ramana


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (29 preceding siblings ...)
  2015-03-12 16:53 ` [Bug target/63679] [5 / 6 " ramana at gcc dot gnu.org
@ 2015-07-28 17:15 ` alalaw01 at gcc dot gnu.org
  2015-07-28 18:36 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-07-28 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

alalaw01 at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alalaw01 at gcc dot gnu.org

--- Comment #32 from alalaw01 at gcc dot gnu.org ---
Is the SRA approach going to work? I have hacked up my SRA so that it generates
this:

foo ()
{
  int sum;
  int i;
  const int a[8];
  unsigned int i.0_7;
  int _8;
  unsigned int i.0_19;

  <bb 2>:
  MEM[(int[8] *)&a] = 0;
  MEM[(int[8] *)&a + 4B] = 1;
  MEM[(int[8] *)&a + 8B] = 2;
  MEM[(int[8] *)&a + 12B] = 3;
  MEM[(int[8] *)&a + 16B] = 4;
  MEM[(int[8] *)&a + 20B] = 5;
  MEM[(int[8] *)&a + 24B] = 6;
  MEM[(int[8] *)&a + 28B] = 7;
  i.0_19 = 0;
  if (i.0_19 != 8)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  # i_20 = PHI <i_10(3), 0(2)>
  # sum_21 = PHI <sum_9(3), 0(2)>
  _8 = a[i_20];
  sum_9 = sum_21 + _8;
  i_10 = i_20 + 1;
  i.0_7 = (unsigned int) i_10;
  if (i.0_7 != 8)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 4>:
  # sum_22 = PHI <sum_9(3), 0(2)>
  a ={v} {CLOBBER};
  return sum_22;
}

the vectorizer then transforms to:
...
  <bb 2>:
  MEM[(int[8] *)&a] = 0;
  MEM[(int[8] *)&a + 4B] = 1;
  MEM[(int[8] *)&a + 8B] = 2;
  MEM[(int[8] *)&a + 12B] = 3;
  MEM[(int[8] *)&a + 16B] = 4;
  MEM[(int[8] *)&a + 20B] = 5;
  MEM[(int[8] *)&a + 24B] = 6;
  MEM[(int[8] *)&a + 28B] = 7;

  <bb 3>:
  # i_20 = PHI <0(2), i_10(4)>
  # sum_21 = PHI <0(2), sum_9(4)>
  # ivtmp_19 = PHI <8(2), ivtmp_22(4)>
  # vectp_a.1_1 = PHI <&a(2), vectp_a.1_2(4)>
  # vect_sum_9.4_17 = PHI <{ 0, 0, 0, 0 }(2), vect_sum_9.4_23(4)>
  # ivtmp_27 = PHI <0(2), ivtmp_28(4)>
  vect__8.3_18 = MEM[(int *)vectp_a.1_1];
  _8 = a[i_20];
  vect_sum_9.4_23 = vect__8.3_18 + vect_sum_9.4_17;
  sum_9 = _8 + sum_21;
  i_10 = i_20 + 1;
  ivtmp_22 = ivtmp_19 - 1;
  vectp_a.1_2 = vectp_a.1_1 + 16;
  ivtmp_28 = ivtmp_27 + 1;
  if (ivtmp_28 < 2)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  # sum_7 = PHI <sum_9(3)>
  # vect_sum_9.4_24 = PHI <vect_sum_9.4_23(3)>
  stmp_sum_9.5_25 = [reduc_plus_expr] vect_sum_9.4_24;
  vect_sum_9.6_26 = stmp_sum_9.5_25 + 0;
  a ={v} {CLOBBER};
  return vect_sum_9.6_26;

}

and the optimized tree is:

foo ()
{
  int vect_sum_9.6;
  int stmp_sum_9.5;
  vector(4) int vect_sum_9.4;
  const vector(4) int vect__8.3;
  const int a[8];

  <bb 2>:
  MEM[(int[8] *)&a] = { 0, 1, 2, 3 };
  MEM[(int[8] *)&a + 16B] = { 4, 5, 6, 7 };
  vect__8.3_20 = MEM[(int *)&a];
  vect__8.3_18 = MEM[(int *)&a + 16B];
  vect_sum_9.4_23 = vect__8.3_18 + vect__8.3_20;
  stmp_sum_9.5_25 = [reduc_plus_expr] vect_sum_9.4_23;
  vect_sum_9.6_26 = stmp_sum_9.5_25;
  a ={v} {CLOBBER};
  return vect_sum_9.6_26;
}

final assembly is:
        ldr     q1, .LC1
        sub     sp, sp, #32
        ldr     q0, .LC2
        add     sp, sp, 32
        add     v0.4s, v0.4s, v1.4s
        addv    s0, v0.4s
        umov    w0, v0.s[0]
        ret
which is a slight improvement, but not really what we are looking for...


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (30 preceding siblings ...)
  2015-07-28 17:15 ` [Bug target/63679] [5/6 " alalaw01 at gcc dot gnu.org
@ 2015-07-28 18:36 ` pinskia at gcc dot gnu.org
  2015-07-29  7:23 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-07-28 18:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #33 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to alalaw01 from comment #32)
> and the optimized tree is:
> 
> foo ()
> {
>   int vect_sum_9.6;
>   int stmp_sum_9.5;
>   vector(4) int vect_sum_9.4;
>   const vector(4) int vect__8.3;
>   const int a[8];
> 
>   <bb 2>:
>   MEM[(int[8] *)&a] = { 0, 1, 2, 3 };
>   MEM[(int[8] *)&a + 16B] = { 4, 5, 6, 7 };
>   vect__8.3_20 = MEM[(int *)&a];
>   vect__8.3_18 = MEM[(int *)&a + 16B];

So a missing constant prop here.


>   vect_sum_9.4_23 = vect__8.3_18 + vect__8.3_20;

Also most likely we don't have much constant folding for this.

>   stmp_sum_9.5_25 = [reduc_plus_expr] vect_sum_9.4_23;

Or even this.

But once those are resolved, this whole function will resolve itself :).


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (31 preceding siblings ...)
  2015-07-28 18:36 ` pinskia at gcc dot gnu.org
@ 2015-07-29  7:23 ` rguenther at suse dot de
  2015-07-29 17:50 ` alalaw01 at gcc dot gnu.org
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2015-07-29  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #34 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 28 Jul 2015, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #33 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> (In reply to alalaw01 from comment #32)
> > and the optimized tree is:
> > 
> > foo ()
> > {
> >   int vect_sum_9.6;
> >   int stmp_sum_9.5;
> >   vector(4) int vect_sum_9.4;
> >   const vector(4) int vect__8.3;
> >   const int a[8];
> > 
> >   <bb 2>:
> >   MEM[(int[8] *)&a] = { 0, 1, 2, 3 };
> >   MEM[(int[8] *)&a + 16B] = { 4, 5, 6, 7 };
> >   vect__8.3_20 = MEM[(int *)&a];
> >   vect__8.3_18 = MEM[(int *)&a + 16B];
> 
> So a missing constant prop here.

Yes, but DOM should have handled this.

> 
> >   vect_sum_9.4_23 = vect__8.3_18 + vect__8.3_20;
> 
> Also most likely we don't have much constant folding for this.

We do.

> >   stmp_sum_9.5_25 = [reduc_plus_expr] vect_sum_9.4_23;
> 
> Or even this.

Likewise.

> But once those are resolved, this whole function will resolve itself :).

RTL CSE should catch the constant prop I think but I doubt we have
much vector op simplify_rtx support.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (32 preceding siblings ...)
  2015-07-29  7:23 ` rguenther at suse dot de
@ 2015-07-29 17:50 ` alalaw01 at gcc dot gnu.org
  2015-08-03 15:38 ` alalaw01 at gcc dot gnu.org
  2015-08-04  9:30 ` rguenther at suse dot de
  35 siblings, 0 replies; 36+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-07-29 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #35 from alalaw01 at gcc dot gnu.org ---
So it should be happening in dom2. On x86, input to dom2 is

  vect_cst_.9_31 = { 0, 1, 2, 3 };
[...]MEM[(int *)&a] = vect_cst_.9_31;
[...]vect__13.3_20 = MEM[(int *)&a];

resulting in:

Optimizing statement vect_cst_.9_31 = { 0, 1, 2, 3 };
LKUP STMT vect_cst_.9_31 = { 0, 1, 2, 3 }
==== ASGN vect_cst_.9_31 = { 0, 1, 2, 3 }
...
Optimizing statement MEM[(int *)&a] = vect_cst_.9_31;
  Replaced 'vect_cst_.9_31' with constant '{ 0, 1, 2, 3 }'
LKUP STMT MEM[(int *)&a] = { 0, 1, 2, 3 } with .MEM_3(D)
LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_3(D)
LKUP STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_17
2>>> STMT { 0, 1, 2, 3 } = MEM[(int *)&a] with .MEM_17
...
Optimizing statement vect__13.3_20 = MEM[(int *)&a];
LKUP STMT vect__13.3_20 = MEM[(int *)&a] with .MEM_21
FIND: { 0, 1, 2, 3 }
  Replaced redundant expr 'MEM[(int *)&a]' with '{ 0, 1, 2, 3 }'

My version has input to dom2:

  vect_cst_.8_27 = { 0, 1, 2, 3 };
[...]MEM[(int[8] *)&a] = vect_cst_.8_27;
[...]vect__8.3_20 = MEM[(int *)&a];

Optimizing statement vect_cst_.8_27 = { 0, 1, 2, 3 };
LKUP STMT vect_cst_.8_27 = { 0, 1, 2, 3 }
==== ASGN vect_cst_.8_27 = { 0, 1, 2, 3 }
...
Optimizing statement MEM[(int[8] *)&a] = vect_cst_.8_27;
  Replaced 'vect_cst_.8_27' with constant '{ 0, 1, 2, 3 }'
LKUP STMT MEM[(int[8] *)&a] = { 0, 1, 2, 3 } with .MEM_3(D)
LKUP STMT { 0, 1, 2, 3 } = MEM[(int[8] *)&a] with .MEM_3(D)
LKUP STMT { 0, 1, 2, 3 } = MEM[(int[8] *)&a] with .MEM_17
2>>> STMT { 0, 1, 2, 3 } = MEM[(int[8] *)&a] with .MEM_17
...
Optimizing statement vect__8.3_20 = MEM[(int *)&a];
LKUP STMT vect__8.3_20 = MEM[(int *)&a] with .MEM_21
2>>> STMT vect__8.3_20 = MEM[(int *)&a] with .MEM_21

Which looks like MEM[(int *)&a] and MEM[(int[8] *)&a] are hashing differently
and hence dom2 is not finding it.

Could be that I need my SRA to output something closer to
  a[1] = 1;
where I currently have
  MEM[(int[8] *)&a + 4B] = 1;
but also feel that those two statements hashing differently is not really
helpful!


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (33 preceding siblings ...)
  2015-07-29 17:50 ` alalaw01 at gcc dot gnu.org
@ 2015-08-03 15:38 ` alalaw01 at gcc dot gnu.org
  2015-08-04  9:30 ` rguenther at suse dot de
  35 siblings, 0 replies; 36+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-08-03 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #37 from alalaw01 at gcc dot gnu.org ---
Hmmm, no it's not the hashing - that pretty much ignores all types. It's the
comparison in hashable_expr_equal_p, which just uses operand_equal_p,
specifically this part (in fold-const.c):

    case MEM_REF:
          /* Require equal access sizes, and similar pointer types.
             We can have incomplete types for array references of
             variable-sized arrays from the Fortran frontend
             though.  Also verify the types are compatible.  */
          if (!((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1))
                   || (TYPE_SIZE (TREE_TYPE (arg0))
                       && TYPE_SIZE (TREE_TYPE (arg1))
                       && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)),
                                           TYPE_SIZE (TREE_TYPE (arg1)),
flags)))
                  && types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1))
                  && ((flags & OEP_ADDRESS_OF)
                      || (alias_ptr_types_compatible_p
                            (TREE_TYPE (TREE_OPERAND (arg0, 1)),
                             TREE_TYPE (TREE_OPERAND (arg1, 1)))
                          && (MR_DEPENDENCE_CLIQUE (arg0)
                              == MR_DEPENDENCE_CLIQUE (arg1))
                          && (MR_DEPENDENCE_BASE (arg0)
                              == MR_DEPENDENCE_BASE (arg1))
                          && (TYPE_ALIGN (TREE_TYPE (arg0))
                            == TYPE_ALIGN (TREE_TYPE (arg1)))))))

specifically, a pointer to int, and a pointer to an array of int, are not
alias_ptr_types_compatible_p. (I'm not clear that they should be, either!?)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Bug target/63679] [5/6 Regression][AArch64] Failure to constant fold.
       [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
                   ` (34 preceding siblings ...)
  2015-08-03 15:38 ` alalaw01 at gcc dot gnu.org
@ 2015-08-04  9:30 ` rguenther at suse dot de
  35 siblings, 0 replies; 36+ messages in thread
From: rguenther at suse dot de @ 2015-08-04  9:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679

--- Comment #38 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 3 Aug 2015, alalaw01 at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63679
> 
> --- Comment #37 from alalaw01 at gcc dot gnu.org ---
> Hmmm, no it's not the hashing - that pretty much ignores all types. It's the
> comparison in hashable_expr_equal_p, which just uses operand_equal_p,
> specifically this part (in fold-const.c):
> 
>     case MEM_REF:
>           /* Require equal access sizes, and similar pointer types.
>              We can have incomplete types for array references of
>              variable-sized arrays from the Fortran frontend
>              though.  Also verify the types are compatible.  */
>           if (!((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1))
>                    || (TYPE_SIZE (TREE_TYPE (arg0))
>                        && TYPE_SIZE (TREE_TYPE (arg1))
>                        && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)),
>                                            TYPE_SIZE (TREE_TYPE (arg1)),
> flags)))
>                   && types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1))
>                   && ((flags & OEP_ADDRESS_OF)
>                       || (alias_ptr_types_compatible_p
>                             (TREE_TYPE (TREE_OPERAND (arg0, 1)),
>                              TREE_TYPE (TREE_OPERAND (arg1, 1)))
>                           && (MR_DEPENDENCE_CLIQUE (arg0)
>                               == MR_DEPENDENCE_CLIQUE (arg1))
>                           && (MR_DEPENDENCE_BASE (arg0)
>                               == MR_DEPENDENCE_BASE (arg1))
>                           && (TYPE_ALIGN (TREE_TYPE (arg0))
>                             == TYPE_ALIGN (TREE_TYPE (arg1)))))))
> 
> specifically, a pointer to int, and a pointer to an array of int, are not
> alias_ptr_types_compatible_p. (I'm not clear that they should be, either!?)

As said neither the hashing nor operand_equal_p are perfect fits for
the constraints on equality DOM needs to put on memory references.


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-08-04  9:30 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-63679-4@http.gcc.gnu.org/bugzilla/>
2014-10-30 10:04 ` [Bug target/63679] [5.0 Regression][AArch64] Failure to constant fold belagod at gcc dot gnu.org
2014-10-30 10:10 ` rguenth at gcc dot gnu.org
2014-11-04 11:41 ` belagod at gcc dot gnu.org
2014-11-04 16:32 ` belagod at gcc dot gnu.org
2014-11-04 20:58 ` rguenth at gcc dot gnu.org
2014-11-20 12:44 ` rguenth at gcc dot gnu.org
2014-11-20 16:21 ` belagod at gcc dot gnu.org
2014-11-20 16:49 ` pinskia at gcc dot gnu.org
2014-11-21  8:48 ` rguenth at gcc dot gnu.org
2014-11-21 10:17 ` rguenth at gcc dot gnu.org
2014-11-21 10:26 ` belagod at gcc dot gnu.org
2014-11-21 10:36 ` rguenther at suse dot de
2014-11-21 10:54 ` belagod at gcc dot gnu.org
2014-11-21 11:25 ` jgreenhalgh at gcc dot gnu.org
2014-11-21 18:20 ` jgreenhalgh at gcc dot gnu.org
2014-11-24  8:52 ` rguenther at suse dot de
2014-11-24 11:16 ` belagod at gcc dot gnu.org
2014-11-24 11:31 ` rguenther at suse dot de
2014-11-24 12:01 ` jgreenhalgh at gcc dot gnu.org
2014-11-24 12:19 ` rguenther at suse dot de
2014-11-24 13:45 ` rguenth at gcc dot gnu.org
2014-11-24 13:45 ` rguenth at gcc dot gnu.org
2014-11-24 14:07 ` rguenth at gcc dot gnu.org
2015-02-07 10:55 ` [Bug target/63679] [5 " jakub at gcc dot gnu.org
2015-02-09  9:08 ` rguenth at gcc dot gnu.org
2015-02-09 10:10 ` jakub at gcc dot gnu.org
2015-02-09 12:20 ` belagod at gcc dot gnu.org
2015-02-09 13:17 ` rguenther at suse dot de
2015-02-09 13:34 ` belagod at gcc dot gnu.org
2015-03-12 16:53 ` [Bug target/63679] [5 / 6 " ramana at gcc dot gnu.org
2015-07-28 17:15 ` [Bug target/63679] [5/6 " alalaw01 at gcc dot gnu.org
2015-07-28 18:36 ` pinskia at gcc dot gnu.org
2015-07-29  7:23 ` rguenther at suse dot de
2015-07-29 17:50 ` alalaw01 at gcc dot gnu.org
2015-08-03 15:38 ` alalaw01 at gcc dot gnu.org
2015-08-04  9:30 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).