[Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count
@ 2011-08-10 14:44 enkovich.gnu at gmail dot com
  2011-08-10 15:14 ` [Bug rtl-optimization/50037] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: enkovich.gnu at gmail dot com @ 2011-08-10 14:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

             Bug #: 50037
           Summary: Unroll factor exceeds max trip count
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: enkovich.gnu@gmail.com


Created attachment 24971
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24971
Reproducer

Here is a small loop on which GCC performs inefficient unroll:

for ( count = ((*(hdrptr)) & 0xf) * 2; count > 0; count--, addr++ )
    sum += *addr;

This loop has maximum 30 iterations. If we use -O3 then this loop is
vectorized. Resulting loop has maximum 30 / 8 = 3 iteration. Also vectorizer
generates prologue and epilogue loops. Each of them has maximum 7 iterations.

If we add -funroll-loops option then each of 3 generated by vectorizer loops is
unrolled with unroll factor 8. It creates a lot of code which is never executed
and also decreases performance due to additional checks and branches.

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc1/configure --prefix=/export/gcc-perf/install
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.7.0 20110615 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-O3' '-funroll-loops' '-S' '-v' '-mtune=generic'
'-march=x86-64'
 /export/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet
-v unroll_test.c -quiet -dumpbase unroll_test.c -mtune=generic -march=x86-64
-auxbase unroll_test -O3 -version -funroll-loops -o unroll_test.s
GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2,
MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
@ 2011-08-10 15:14 ` rguenth at gcc dot gnu.org
  2011-08-10 15:33 ` enkovich.gnu at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-10 15:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-08-10
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-10 15:13:58 UTC ---
Confirmed (I think we have some dups here).  We are not able to determine
the number of iterations anymore after the vectorizer has messed with the
loop structures.

Possibly simplifying the loop guards to use multiple test and branches
instead of globbing everything into a single .. | .. | .. chain could
recover this.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
  2011-08-10 15:14 ` [Bug rtl-optimization/50037] " rguenth at gcc dot gnu.org
@ 2011-08-10 15:33 ` enkovich.gnu at gmail dot com
  2011-08-10 15:42 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: enkovich.gnu at gmail dot com @ 2011-08-10 15:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #2 from Ilya Enkovich <enkovich.gnu at gmail dot com> 2011-08-10 15:33:22 UTC ---
I wouldn't blame vectorizer here. Following loop is unrolled with unroll factor
8 even if vectorizer is disabled:

for ( count = ((*(hdrptr)) & 0x3) * 2; count > 0; count--, addr++ )
    sum += *addr;

BTW prologue loops generated by vectorizer also compute iterations count using
'AND' expression. Therefore we may frequently get prologue loops unrolled which
is never profitable if we use such huge unroll factor.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
  2011-08-10 15:14 ` [Bug rtl-optimization/50037] " rguenth at gcc dot gnu.org
  2011-08-10 15:33 ` enkovich.gnu at gmail dot com
@ 2011-08-10 15:42 ` rguenth at gcc dot gnu.org
  2011-08-10 16:57 ` matz at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-10 15:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rakdver at gcc dot gnu.org

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-10 15:42:19 UTC ---
Confirmed with

int foo (int n, int *addr)
{
  int count, sum;
  for ( count = n & 0x3; count > 0; count--, addr++ )
    sum += *addr;
  return sum;
}

and -O2 -funroll-loops.

Using

int foo (int n, int *addr)
{
  int count, sum;
  for ( count = n & 0x3; count >= 0; count--, addr++ )
    sum += *addr;
  return sum;
}

it works.  It looks like RTL number of iteration analysis is confused
by the adjustment

(insn 28 27 29 3 (parallel [
            (set (reg:SI 92)
                (plus:SI (reg/v:SI 77 [ count ])
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (reg:CC 17 flags))
        ]) t.c:1 251 {*addsi_1}
     (expr_list:REG_DEAD (reg/v:SI 77 [ count ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

neither works with

int foo (int n, int *addr)
{
  int count, sum;
  for ( count = 0; count < n & 0x3; count++, addr++ )
    sum += *addr;
  return sum;
}

or

int foo (int n, int *addr)
{
  int count, sum;
  for ( count = 0; count <= n & 0x3; count++, addr++ )
    sum += *addr;
  return sum;
}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
                   ` (2 preceding siblings ...)
  2011-08-10 15:42 ` rguenth at gcc dot gnu.org
@ 2011-08-10 16:57 ` matz at gcc dot gnu.org
  2011-08-11 11:23 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: matz at gcc dot gnu.org @ 2011-08-10 16:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

Michael Matz <matz at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matz at gcc dot gnu.org

--- Comment #4 from Michael Matz <matz at gcc dot gnu.org> 2011-08-10 16:56:47 UTC ---
(In reply to comment #3)
> Using
> 
> int foo (int n, int *addr)
> {
>   int count, sum;
>   for ( count = n & 0x3; count >= 0; count--, addr++ )
>     sum += *addr;
>   return sum;
> }
> 
> it works.

Not for me.  It's still unrolled by 7.

> neither works with
> 
> int foo (int n, int *addr)
> {
>   int count, sum;
>   for ( count = 0; count < n & 0x3; count++, addr++ )

Careful.  Operator precedence.  count < (n & 3)

Doesn't work with those examples either, though.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
                   ` (3 preceding siblings ...)
  2011-08-10 16:57 ` matz at gcc dot gnu.org
@ 2011-08-11 11:23 ` rguenth at gcc dot gnu.org
  2011-08-11 11:41 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-11 11:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-11 11:21:34 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > Using
> > 
> > int foo (int n, int *addr)
> > {
> >   int count, sum;
> >   for ( count = n & 0x3; count >= 0; count--, addr++ )
> >     sum += *addr;
> >   return sum;
> > }
> > 
> > it works.
> 
> Not for me.  It's still unrolled by 7.

Ah, but something cleans up the jump sequence afterwards.  At least.
Doesn't happen for the original testcase either.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
                   ` (4 preceding siblings ...)
  2011-08-11 11:23 ` rguenth at gcc dot gnu.org
@ 2011-08-11 11:41 ` rguenth at gcc dot gnu.org
  2011-08-11 12:14 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-11 11:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-11 11:39:44 UTC ---
It probably doesn't help that tree IVOPTs replaces the nice induction variable
with a pointer one:

  # BLOCK 2 freq:900
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  count_5 = n_4(D) & 3;
  D.2721_17 = (long unsigned int) count_5;
  D.2722_16 = D.2721_17 * 4;
  D.2723_15 = (long unsigned int) addr_6(D);
  D.2724_21 = D.2723_15 + 4;
  D.2725_22 = D.2724_21 + D.2722_16;
  D.2726_23 = (int *) D.2725_22;
  # SUCC: 3 [100.0%]  (fallthru,exec)

  # BLOCK 3 freq:9100
  # PRED: 3 [90.1%]  (dfs_back,true,exec) 2 [100.0%]  (fallthru,exec)
  # addr_18 = PHI <addr_11(3), addr_6(D)(2)>
  # sum_20 = PHI <sum_9(3), sum_7(D)(2)>
  D.2703_8 = MEM[base: addr_18, offset: 0B];
  sum_9 = D.2703_8 + sum_20;
  addr_11 = addr_18 + 4;
  if (addr_11 != D.2726_23)
    goto <bb 3>;
  else
    goto <bb 4>;
  # SUCC: 3 [90.1%]  (dfs_back,true,exec) 4 [9.9%]  (false,exec)

but even without IVOPTs RTL has difficulties and unrolls 8 times:

Loop 1 is simple:
  simple exit 3 -> 4
  number of iterations: (reg/v:SI 62 [ count ])
  upper bound: -2

Most canonical testcase:

int foo (int n, int *addr)
{
  int count, sum;
  n = n & 0x3;
  for (count = 0; count < n; count++)
    sum += addr[count];
  return sum;
}

with IVOPTs on (which preserves a count != n exit test and count):

Loop 1 is simple:
  simple exit 4 -> 5
  does not roll if: (expr_list:REG_DEP_TRUE (le:SI (and:SI (reg/v:SI 86 [ n ])
            (const_int 3 [0x3]))
        (const_int 0 [0]))
    (expr_list:REG_DEP_TRUE (eq:SI (and:SI (reg/v:SI 86 [ n ])
                (const_int 3 [0x3]))
            (const_int -2147483648 [0xffffffff80000000]))
        (nil)))
  number of iterations: (subreg:SI (plus:DI (not:DI (reg:DI 83 [ ivtmp.11 ]))
        (sign_extend:DI (reg/v:SI 76 [ n ]))) 0)
  upper bound: 4294967295

it looks like we do not look for the definition of n in the exit test

(insn 31 29 32 4 (set (reg:CCGC 17 flags)
        (compare:CCGC (reg/v:SI 76 [ n ])
            (subreg:SI (reg:DI 83 [ ivtmp.11 ]) 0))) t.c:5 6 {*cmpsi_1}
     (expr_list:REG_DEAD (reg/v:SI 76 [ n ])
        (nil)))

which would show

(insn 23 19 24 2 (parallel [
            (set (reg/v:SI 76 [ n ])
                (and:SI (reg/v:SI 86 [ n ])
                    (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 378 {*andsi_1}
     (expr_list:REG_DEAD (reg/v:SI 86 [ n ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
                   ` (5 preceding siblings ...)
  2011-08-11 11:41 ` rguenth at gcc dot gnu.org
@ 2011-08-11 12:14 ` rguenth at gcc dot gnu.org
  2011-08-15  9:12 ` enkovich.gnu at gmail dot com
  2021-12-18 23:15 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-11 12:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-11 12:12:56 UTC ---
The following patch makes us handle the canonical testcase on the tree level,
but not yet the original testcase (because of the * 2).

We should really preserve value-range information from VRP.

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c   (revision 177649)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -126,6 +126,32 @@ determine_value_range (tree type, tree v
       return;
     }

+  /* Check if we can determine a value-range from VARs definition.
+     ???  We should simply preserve VRP information.  */
+  if (TREE_CODE (var) == SSA_NAME)
+    {
+      gimple def_stmt = SSA_NAME_DEF_STMT (var);
+      if (is_gimple_assign (def_stmt)
+         && gimple_assign_rhs_code (def_stmt) == BIT_AND_EXPR)
+       {
+         tree rhs2 = gimple_assign_rhs2 (def_stmt);
+         if (TREE_CODE (rhs2) == INTEGER_CST
+             && tree_int_cst_sgn (rhs2) > 0)
+           {
+             double_int maxdi;
+             if (TREE_INT_CST_HIGH (rhs2) == 0)
+               maxdi = double_int_mask (HOST_BITS_PER_WIDE_INT
+                                        - clz_hwi (TREE_INT_CST_LOW (rhs2)));
+             else
+               maxdi = double_int_mask (HOST_BITS_PER_DOUBLE_INT
+                                        - clz_hwi (TREE_INT_CST_LOW (rhs2)));
+             mpz_set_si (min, 0);
+             mpz_set_double_int (max, maxdi, true);
+             return;
+           }
+       }
+    }
+
   /* If the computation may wrap, we know nothing about the value, except for
      the range of the type.  */
   get_type_static_bounds (type, min, max);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
                   ` (6 preceding siblings ...)
  2011-08-11 12:14 ` rguenth at gcc dot gnu.org
@ 2011-08-15  9:12 ` enkovich.gnu at gmail dot com
  2021-12-18 23:15 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: enkovich.gnu at gmail dot com @ 2011-08-15  9:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #8 from Ilya Enkovich <enkovich.gnu at gmail dot com> 2011-08-15 09:06:18 UTC ---
This patch did not work for me. Tried on following loop (-O2 -funroll-loops):

  for ( count = ((*(hdrptr)) & 0x7); count > 0; count--, addr++ )
    sum += *addr;

No multiplication by 2 but still have the same unroll.

I also was hoping this patch would prevent unroll of prologue loop generated by
vectorizer.It uses '& 7' expression for iterations computation but this loop
also uses MIN expression to limit number of iteration and is still unrolled.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/50037] Unroll factor exceeds max trip count
  2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
                   ` (7 preceding siblings ...)
  2011-08-15  9:12 ` enkovich.gnu at gmail dot com
@ 2021-12-18 23:15 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-18 23:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
On the trunk, I see no loop at all for the original testcase at -O3.
At -O2 -ftree-vectorize, I do see the loop but when I add -funroll-loops, the
loop is completely unrolled at the gimple level.

So maybe the vectorizer is much better now really.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-18 23:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 14:44 [Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count enkovich.gnu at gmail dot com
2011-08-10 15:14 ` [Bug rtl-optimization/50037] " rguenth at gcc dot gnu.org
2011-08-10 15:33 ` enkovich.gnu at gmail dot com
2011-08-10 15:42 ` rguenth at gcc dot gnu.org
2011-08-10 16:57 ` matz at gcc dot gnu.org
2011-08-11 11:23 ` rguenth at gcc dot gnu.org
2011-08-11 11:41 ` rguenth at gcc dot gnu.org
2011-08-11 12:14 ` rguenth at gcc dot gnu.org
2011-08-15  9:12 ` enkovich.gnu at gmail dot com
2021-12-18 23:15 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).