[Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization
@ 2021-09-17  2:35 crazylht at gmail dot com
  2021-09-17  3:15 ` [Bug tree-optimization/102383] " crazylht at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2021-09-17  2:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

            Bug ID: 102383
           Summary: Missing optimization for PRE after enable O2
                    vectorization
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: x86_64-*-* i?86-*-*

testcase is form gcc.dg/tree-ssa/predcom-1.c

void abort (void);

unsigned fib[1000];

__attribute__ ((noinline))
void count_fib(void)
{
  int i;

  fib[0] = 0;
  fib[1] = 1;
  for (i = 2; i < 1000; i++)
    fib[i] = (fib[i-1] + fib[i - 2]) & 0xffff;
}

git diff novectorize vectorize dump

diff --git a/../novectorize/predcom-1.c.248t.optimized
b/./predcom-1.c.248t.optimized
index 9e4783d..7846af6 100644
--- a/../novectorize/predcom-1.c.248t.optimized
+++ b/./predcom-1.c.248t.optimized
@@ -5,53 +5,57 @@ Removing basic block 5
 __attribute__((noinline))
 void count_fib ()
 {
-  sizetype ivtmp.13;
+  sizetype ivtmp.16;
+  unsigned int fib_I_lsm1.6;
   unsigned int fib_I_lsm0.5;
   int i;
-  unsigned int _2;
-  unsigned int _4;
   unsigned int _5;
   unsigned int _6;
-  unsigned int prephitmp_21;
-  unsigned int prephitmp_24;
-  unsigned int _41;
+  unsigned int _19;
+  unsigned int _20;
+  unsigned int _21;
+  unsigned int _37;
+  int _38;
+  unsigned int _46;
   unsigned int _47;
-  unsigned int _48;
-  unsigned int _59;
-  int _65;
-  unsigned int pretmp_66;
+  int _54;
+  unsigned int _55;
+  unsigned int _56;
+  unsigned int _57;

   <bb 2> [local count: 10737416]:
-  MEM <unsigned long> [(unsigned int *)&fib] = 4294967296;
+  MEM <vector(2) unsigned int> [(unsigned int *)&fib] = { 0, 1 };

   <bb 3> [local count: 10737417]:
-  # prephitmp_21 = PHI <1(2), _48(3)>
-  # prephitmp_24 = PHI <0(2), _6(3)>
-  # fib_I_lsm0.5_38 = PHI <1(2), _48(3)>
-  # ivtmp.13_7 = PHI <4(2), ivtmp.13_8(3)>
-  _5 = prephitmp_21 + prephitmp_24;
+  # fib_I_lsm0.5_32 = PHI <0(2), _6(3)>
+  # fib_I_lsm1.6_33 = PHI <1(2), _47(3)>
+  # ivtmp.16_11 = PHI <4(2), ivtmp.16_10(3)>
+  _5 = fib_I_lsm0.5_32 + fib_I_lsm1.6_33;
   _6 = _5 & 65535;
-  MEM[(unsigned int *)&fib + -8B + ivtmp.13_7 * 4] = _6;
-  _47 = _6 + fib_I_lsm0.5_38;
-  _48 = _47 & 65535;
-  MEM[(unsigned int *)&fib + -4B + ivtmp.13_7 * 4] = _48;
-  ivtmp.13_8 = ivtmp.13_7 + 2;
-  if (ivtmp.13_8 != 1000)
+  MEM[(unsigned int *)&fib + -8B + ivtmp.16_11 * 4] = _6;
+  _46 = _6 + fib_I_lsm1.6_33;
+  _47 = _46 & 65535;
+  MEM[(unsigned int *)&fib + -4B + ivtmp.16_11 * 4] = _47;
+  ivtmp.16_10 = ivtmp.16_11 + 2;
+  if (ivtmp.16_10 != 1000)
     goto <bb 3>; [98.00%]
   else
     goto <bb 4>; [2.00%]

   <bb 4> [local count: 10737416]:
-  i_51 = (int) ivtmp.13_7;
-  _41 = _6 + _48;
-  _59 = _41 & 65535;
-  fib[i_51] = _59;
-  i_61 = i_51 + 1;
-  _65 = i_51 + -1;
-  pretmp_66 = fib[_65];
-  _2 = _59 + pretmp_66;
-  _4 = _2 & 65535;
-  fib[i_61] = _4;
+  i_50 = (int) ivtmp.16_11;
+  _38 = i_50 + -1;
+  _37 = fib[_38]; ----- missing optimization here
+  _54 = i_50 + -2;
+  _55 = fib[_54]; ----- and here.
+  _56 = _37 + _55;
+  _57 = _56 & 65535;
+  fib[i_50] = _57;
+  i_59 = i_50 + 1;
+  _19 = fib[_38];
+  _20 = _19 + _57;
+  _21 = _20 & 65535;
+  fib[i_59] = _21;
   return;

 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization
  2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
@ 2021-09-17  3:15 ` crazylht at gmail dot com
  2021-09-17  7:06 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2021-09-17  3:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
Similar issue for gfortran.dg/pr77498.f?(not quite sure)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization
  2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
  2021-09-17  3:15 ` [Bug tree-optimization/102383] " crazylht at gmail dot com
@ 2021-09-17  7:06 ` rguenth at gcc dot gnu.org
  2021-09-17  7:56 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-17  7:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-09-17

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we tame PRE because it tends to inhibit vectorization.

      /* Inhibit the use of an inserted PHI on a loop header when
         the address of the memory reference is a simple induction
         variable.  In other cases the vectorizer won't do anything
         anyway (either it's loop invariant or a complicated
         expression).  */
      if (sprime
          && TREE_CODE (sprime) == SSA_NAME
          && do_pre
          && (flag_tree_loop_vectorize || flag_tree_parallelize_loops > 1)
          && loop_outer (b->loop_father)
          && has_zero_uses (sprime)
          && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime))
          && gimple_assign_load_p (stmt))

the heuristic would either need to become much more elaborate (do more
checks whether vectorization is likely) or we could make the behavior
depend on the cost model as well, for example exclude very-cheap here.
That might have an influence on the performance benefit seen from
-O2 default vectorization though.

IIRC we suggested to enable predictive commoning at -O2 but avoid
unroll factors > 1 when it was not explicitely enabled.

Note that the issue for this testcase is that w/o PRE the predcom
behaves differently (but the testcase comment suggests that we'd
have to undo PRE).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization
  2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
  2021-09-17  3:15 ` [Bug tree-optimization/102383] " crazylht at gmail dot com
  2021-09-17  7:06 ` rguenth at gcc dot gnu.org
@ 2021-09-17  7:56 ` crazylht at gmail dot com
  2021-09-17  8:12 ` linkw at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2021-09-17  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
The issue also exists for -O3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization
  2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-17  7:56 ` crazylht at gmail dot com
@ 2021-09-17  8:12 ` linkw at gcc dot gnu.org
  2023-11-01  4:10 ` crazylht at gmail dot com
  2023-11-02 13:20 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-09-17  8:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |linkw at gcc dot gnu.org

--- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> The issue is that we tame PRE because it tends to inhibit vectorization.
> 
>       /* Inhibit the use of an inserted PHI on a loop header when
>          the address of the memory reference is a simple induction
>          variable.  In other cases the vectorizer won't do anything
>          anyway (either it's loop invariant or a complicated
>          expression).  */
>       if (sprime
>           && TREE_CODE (sprime) == SSA_NAME
>           && do_pre
>           && (flag_tree_loop_vectorize || flag_tree_parallelize_loops > 1)
>           && loop_outer (b->loop_father)
>           && has_zero_uses (sprime)
>           && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime))
>           && gimple_assign_load_p (stmt))
> 
> the heuristic would either need to become much more elaborate (do more
> checks whether vectorization is likely) or we could make the behavior
> depend on the cost model as well, for example exclude very-cheap here.
> That might have an influence on the performance benefit seen from
> -O2 default vectorization though.
> 
> IIRC we suggested to enable predictive commoning at -O2 but avoid
> unroll factors > 1 when it was not explicitely enabled.
> 

Yeah, it's PR100794.  I also collected some data for different approaches at
that time.  Recently I opened another issue PR102054 which is also related to
that we restrict PRE due to loop-vect.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization
  2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2021-09-17  8:12 ` linkw at gcc dot gnu.org
@ 2023-11-01  4:10 ` crazylht at gmail dot com
  2023-11-02 13:20 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2023-11-01  4:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
It's fixed in GCC12.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization
  2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2023-11-01  4:10 ` crazylht at gmail dot com
@ 2023-11-02 13:20 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-02 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
By r12-1275-g4db34072d5336d indeed which enables predcom when vectorization is
enabled.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-02 13:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17  2:35 [Bug tree-optimization/102383] New: Missing optimization for PRE after enable O2 vectorization crazylht at gmail dot com
2021-09-17  3:15 ` [Bug tree-optimization/102383] " crazylht at gmail dot com
2021-09-17  7:06 ` rguenth at gcc dot gnu.org
2021-09-17  7:56 ` crazylht at gmail dot com
2021-09-17  8:12 ` linkw at gcc dot gnu.org
2023-11-01  4:10 ` crazylht at gmail dot com
2023-11-02 13:20 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).