public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction
@ 2009-10-30 20:30 drow at gcc dot gnu dot org
2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: drow at gcc dot gnu dot org @ 2009-10-30 20:30 UTC (permalink / raw)
To: gcc-bugs
Here are two pretty straight-forward ways to write the same operation:
#define TYPE int
TYPE fun1(TYPE *x, TYPE *y, unsigned int n)
{
int i, j;
TYPE dot = 0;
for (i = 0; i < n; i++)
dot += *(x++) * *(y++);
return dot;
}
TYPE fun2(TYPE *x, TYPE *y, unsigned int n)
{
int i, j;
TYPE dot = 0;
for (i = 0; i < n / 8; i++)
for (j = 0; j < 8; j++)
dot += *(x++) * *(y++);
return dot;
}
GCC 4.3 can vectorize both of them. GCC 4.4 can only vectorize fun1. I figure
this is why:
reduc.c:17: note: === vect_analyze_scalar_cycles ===
reduc.c:17: note: Analyze phi: dot_103 = PHI <dot_110(5), 0(3)>
reduc.c:17: note: Access function of PHI: {0, +, ((((((D.1621_32 + D.1621_43) +
D.1621_54) + D.1621_65)
+ D.1621_76) + D.1621_87) + D.1621_98) + D.1621_109}_1
reduc.c:17: note: step: ((((((D.1621_32 + D.1621_43) + D.1621_54) + D.1621_65)
+ D.1621_76) + D.1621_87)
+ D.1621_98) + D.1621_109, init: 0
reduc.c:17: note: step unknown.
The cunrolli pass (which there is no way to disable) has completely unrolled
the inner loop. Vectorizer SLP support can not handle the unrolled version of
the loop.
Also observed on ARM NEON with TYPE == float.
--
Summary: Complete unrolling (inner) versus vectorization of
reduction
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: drow at gcc dot gnu dot org
GCC target triplet: x86_64-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
@ 2009-10-30 22:20 ` rguenth at gcc dot gnu dot org
2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-30 22:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2009-10-30 22:20 -------
You could use -O2 -ftree-vectorize.
Another pretty straight-forward way to write the operation is
TYPE fun3(TYPE *x, TYPE *y, unsigned int n)
{
int i, j;
TYPE dot = 0;
for (i = 0; i < n / 8; i++)
{
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
dot += *(x++) * *(y++);
}
return dot;
}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
@ 2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
2009-10-30 22:42 ` drow at gcc dot gnu dot org
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-30 22:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2009-10-30 22:21 -------
See also PR41647.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |41647
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2009-10-30 22:21:43
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
@ 2009-10-30 22:42 ` drow at gcc dot gnu dot org
2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: drow at gcc dot gnu dot org @ 2009-10-30 22:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from drow at gcc dot gnu dot org 2009-10-30 22:41 -------
Subject: Re: Complete unrolling (inner)
versus vectorization of reduction
On Fri, Oct 30, 2009 at 10:20:46PM -0000, rguenth at gcc dot gnu dot org wrote:
> You could use -O2 -ftree-vectorize.
No:
static bool
gate_tree_complete_unroll_inner (void)
{
return optimize >= 2;
}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
` (2 preceding siblings ...)
2009-10-30 22:42 ` drow at gcc dot gnu dot org
@ 2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
2010-08-10 16:02 ` [Bug tree-optimization/41881] [4.5/4.6 regression] " drow at gcc dot gnu dot org
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-30 22:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2009-10-30 22:50 -------
But:
static unsigned int
tree_complete_unroll_inner (void)
{
unsigned ret = 0;
loop_optimizer_init (LOOPS_NORMAL
| LOOPS_HAVE_RECORDED_EXITS);
if (number_of_loops () > 1)
{
scev_initialize ();
ret = tree_unroll_loops_completely (optimize >= 3, false);
so it will not allow the body to increase in code-size when not
compiling with -O3. Unrolling the loop in your example does
increase code size.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
` (3 preceding siblings ...)
2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
@ 2010-08-10 16:02 ` drow at gcc dot gnu dot org
2010-08-11 9:29 ` rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: drow at gcc dot gnu dot org @ 2010-08-10 16:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from drow at gcc dot gnu dot org 2010-08-10 16:01 -------
Verified on x86_64 using:
gcc-4.3 -O3 -o 43.s -S reduc.c -ftree-vectorizer-verbose=1
[two loops vectorized]
gcc-4.4 -O3 -o 43.s -S reduc.c -ftree-vectorizer-verbose=1
[one loop vectorized]
--
drow at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |irar at gcc dot gnu dot org
Known to fail| |4.4.4
Known to work| |4.3.5
Summary|Complete unrolling (inner) |[4.5/4.6 regression]
|versus vectorization of |Complete unrolling (inner)
|reduction |versus vectorization of
| |reduction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
` (4 preceding siblings ...)
2010-08-10 16:02 ` [Bug tree-optimization/41881] [4.5/4.6 regression] " drow at gcc dot gnu dot org
@ 2010-08-11 9:29 ` rguenth at gcc dot gnu dot org
2010-08-11 10:24 ` irar at il dot ibm dot com
2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-11 9:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenth at gcc dot gnu dot org 2010-08-11 09:28 -------
I think that SLP doesn't handle reduction.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.5.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
` (5 preceding siblings ...)
2010-08-11 9:29 ` rguenth at gcc dot gnu dot org
@ 2010-08-11 10:24 ` irar at il dot ibm dot com
2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
7 siblings, 0 replies; 12+ messages in thread
From: irar at il dot ibm dot com @ 2010-08-11 10:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from irar at il dot ibm dot com 2010-08-11 10:24 -------
(In reply to comment #6)
> I think that SLP doesn't handle reduction.
>
Not all kinds of reduction. We handle
#a1 = phi <a0, a2>
#b1 = phi <b0, b2>
...
a2 = a1 + x
b2 = b1 + y
Here we also have:
#a1 = phi <a0, a9>
...
a2 = a1 + x
...
a3 = a2 + y
...
a9 = a8 + z
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
` (6 preceding siblings ...)
2010-08-11 10:24 ` irar at il dot ibm dot com
@ 2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
7 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-30 15:51 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
[not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
2010-12-16 13:12 ` rguenth at gcc dot gnu.org
2012-01-05 17:15 ` jakub at gcc dot gnu.org
@ 2012-07-02 11:38 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-02 11:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.5.4 |4.6.4
--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-02 11:37:51 UTC ---
The 4.5 branch is being closed, adjusting target milestone.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
[not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
2010-12-16 13:12 ` rguenth at gcc dot gnu.org
@ 2012-01-05 17:15 ` jakub at gcc dot gnu.org
2012-07-02 11:38 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-01-05 17:15 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
Summary|[4.5/4.6/4.7 regression] |[4.5/4.6 regression]
|Complete unrolling (inner) |Complete unrolling (inner)
|versus vectorization of |versus vectorization of
|reduction |reduction
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-01-05 17:14:11 UTC ---
Fixed on the trunk, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
[not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
@ 2010-12-16 13:12 ` rguenth at gcc dot gnu.org
2012-01-05 17:15 ` jakub at gcc dot gnu.org
2012-07-02 11:38 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-16 13:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.5.2 |4.5.3
--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-16 13:03:20 UTC ---
GCC 4.5.2 is being released, adjusting target milestone.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-07-02 11:38 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-30 20:30 [Bug tree-optimization/41881] New: Complete unrolling (inner) versus vectorization of reduction drow at gcc dot gnu dot org
2009-10-30 22:20 ` [Bug tree-optimization/41881] " rguenth at gcc dot gnu dot org
2009-10-30 22:21 ` rguenth at gcc dot gnu dot org
2009-10-30 22:42 ` drow at gcc dot gnu dot org
2009-10-30 22:50 ` rguenth at gcc dot gnu dot org
2010-08-10 16:02 ` [Bug tree-optimization/41881] [4.5/4.6 regression] " drow at gcc dot gnu dot org
2010-08-11 9:29 ` rguenth at gcc dot gnu dot org
2010-08-11 10:24 ` irar at il dot ibm dot com
2010-08-30 15:51 ` rguenth at gcc dot gnu dot org
[not found] <bug-41881-4@http.gcc.gnu.org/bugzilla/>
2010-12-16 13:12 ` rguenth at gcc dot gnu.org
2012-01-05 17:15 ` jakub at gcc dot gnu.org
2012-07-02 11:38 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).