public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/41881]  New: Complete unrolling (inner) versus vectorization of reduction
@ 2009-10-30 20:30 drow at gcc dot gnu dot org
  2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: drow at gcc dot gnu dot org @ 2009-10-30 20:30 UTC (permalink / raw)
  To: gcc-bugs

Here are two pretty straight-forward ways to write the same operation:

#define TYPE int

TYPE fun1(TYPE *x, TYPE *y, unsigned int n)
{
  int i, j;
  TYPE dot = 0;

  for (i = 0; i < n; i++)
    dot += *(x++) * *(y++);

  return dot;
}

TYPE fun2(TYPE *x, TYPE *y, unsigned int n)
{
  int i, j;
  TYPE dot = 0;

  for (i = 0; i < n / 8; i++)
    for (j = 0; j < 8; j++)
      dot += *(x++) * *(y++);

  return dot;
}

GCC 4.3 can vectorize both of them.  GCC 4.4 can only vectorize fun1.  I figure
this is why:

reduc.c:17: note: === vect_analyze_scalar_cycles ===
reduc.c:17: note: Analyze phi: dot_103 = PHI <dot_110(5), 0(3)>

reduc.c:17: note: Access function of PHI: {0, +, ((((((D.1621_32 + D.1621_43) +
D.1621_54) + D.1621_65)
+ D.1621_76) + D.1621_87) + D.1621_98) + D.1621_109}_1
reduc.c:17: note: step: ((((((D.1621_32 + D.1621_43) + D.1621_54) + D.1621_65)
+ D.1621_76) + D.1621_87)
 + D.1621_98) + D.1621_109,  init: 0
reduc.c:17: note: step unknown.

The cunrolli pass (which there is no way to disable) has completely unrolled
the inner loop.  Vectorizer SLP support can not handle the unrolled version of
the loop.

Also observed on ARM NEON with TYPE == float.


-- 
           Summary: Complete unrolling (inner) versus vectorization of
                    reduction
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: drow at gcc dot gnu dot org
GCC target triplet: x86_64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
@ 2009-10-30 22:20 ` rguenth at gcc dot gnu dot org
  2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-30 22:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-10-30 22:20 -------
You could use -O2 -ftree-vectorize.

Another pretty straight-forward way to write the operation is

TYPE fun3(TYPE *x, TYPE *y, unsigned int n)
{
  int i, j;
  TYPE dot = 0;

  for (i = 0; i < n / 8; i++)
    {
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
      dot += *(x++) * *(y++);
    }

  return dot;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
  2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
@ 2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
  2009-10-30 22:42 ` drow at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-30 22:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-10-30 22:21 -------
See also PR41647.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |41647
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2009-10-30 22:21:43
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
  2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
  2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
@ 2009-10-30 22:42 ` drow at gcc dot gnu dot org
  2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: drow at gcc dot gnu dot org @ 2009-10-30 22:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from drow at gcc dot gnu dot org  2009-10-30 22:41 -------
Subject: Re:  Complete unrolling (inner)
 versus vectorization of reduction

On Fri, Oct 30, 2009 at 10:20:46PM -0000, rguenth at gcc dot gnu dot org wrote:
> You could use -O2 -ftree-vectorize.

No:

static bool
gate_tree_complete_unroll_inner (void)
{
  return optimize >= 2;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2009-10-30 22:42 ` drow at gcc dot gnu dot org
@ 2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
  2010-08-10 16:02 ` [Bug tree-optimization/41881] [4.5/4.6 regression] " drow at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-30 22:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2009-10-30 22:50 -------
But:

static unsigned int
tree_complete_unroll_inner (void)
{
  unsigned ret = 0;

  loop_optimizer_init (LOOPS_NORMAL
                       | LOOPS_HAVE_RECORDED_EXITS);
  if (number_of_loops () > 1)
    {
      scev_initialize ();
      ret = tree_unroll_loops_completely (optimize >= 3, false);

so it will not allow the body to increase in code-size when not
compiling with -O3.  Unrolling the loop in your example does
increase code size.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
@ 2010-08-10 16:02 ` drow at gcc dot gnu dot org
  2010-08-11  9:29 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: drow at gcc dot gnu dot org @ 2010-08-10 16:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from drow at gcc dot gnu dot org  2010-08-10 16:01 -------
Verified on x86_64 using:

gcc-4.3 -O3 -o 43.s -S reduc.c -ftree-vectorizer-verbose=1
[two loops vectorized]
gcc-4.4 -O3 -o 43.s -S reduc.c -ftree-vectorizer-verbose=1
[one loop vectorized]


-- 

drow at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at gcc dot gnu dot org
      Known to fail|                            |4.4.4
      Known to work|                            |4.3.5
            Summary|Complete unrolling (inner)  |[4.5/4.6 regression]
                   |versus vectorization of     |Complete unrolling (inner)
                   |reduction                   |versus vectorization of
                   |                            |reduction


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2010-08-10 16:02 ` [Bug tree-optimization/41881] [4.5/4.6 regression] " drow at gcc dot gnu dot org
@ 2010-08-11  9:29 ` rguenth at gcc dot gnu dot org
  2010-08-11 10:24 ` irar at il dot ibm dot com
  2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-11  9:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2010-08-11 09:28 -------
I think that SLP doesn't handle reduction.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.5.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2010-08-11  9:29 ` rguenth at gcc dot gnu dot org
@ 2010-08-11 10:24 ` irar at il dot ibm dot com
  2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 12+ messages in thread
From: irar at il dot ibm dot com @ 2010-08-11 10:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from irar at il dot ibm dot com  2010-08-11 10:24 -------
(In reply to comment #6)
> I think that SLP doesn't handle reduction.
> 

Not all kinds of reduction. We handle

#a1 = phi <a0, a2>
#b1 = phi <b0, b2>
...
a2 = a1 + x
b2 = b1 + y

Here we also have:
#a1 = phi <a0, a9>
...
a2 = a1 + x
...
a3 = a2 + y
...

a9 = a8 + z


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
  2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2010-08-11 10:24 ` irar at il dot ibm dot com
@ 2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-30 15:51 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
       [not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
  2010-12-16 13:12 ` rguenth at gcc dot gnu.org
  2012-01-05 17:15 ` jakub at gcc dot gnu.org
@ 2012-07-02 11:38 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-02 11:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.5.4                       |4.6.4

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-02 11:37:51 UTC ---
The 4.5 branch is being closed, adjusting target milestone.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
       [not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
  2010-12-16 13:12 ` rguenth at gcc dot gnu.org
@ 2012-01-05 17:15 ` jakub at gcc dot gnu.org
  2012-07-02 11:38 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-01-05 17:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org
            Summary|[4.5/4.6/4.7 regression]    |[4.5/4.6 regression]
                   |Complete unrolling (inner)  |Complete unrolling (inner)
                   |versus vectorization of     |versus vectorization of
                   |reduction                   |reduction

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-01-05 17:14:11 UTC ---
Fixed on the trunk, thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
       [not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
@ 2010-12-16 13:12 ` rguenth at gcc dot gnu.org
  2012-01-05 17:15 ` jakub at gcc dot gnu.org
  2012-07-02 11:38 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-16 13:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.5.2                       |4.5.3

--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-16 13:03:20 UTC ---
GCC 4.5.2 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-02 11:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
2009-10-30 22:42 ` drow at gcc dot gnu dot org
2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
2010-08-10 16:02 ` [Bug tree-optimization/41881] [4.5/4.6 regression] " drow at gcc dot gnu dot org
2010-08-11  9:29 ` rguenth at gcc dot gnu dot org
2010-08-11 10:24 ` irar at il dot ibm dot com
2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
     [not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
2010-12-16 13:12 ` rguenth at gcc dot gnu.org
2012-01-05 17:15 ` jakub at gcc dot gnu.org
2012-07-02 11:38 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).