public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads
@ 2009-09-24 23:14 nmiell at comcast dot net
2009-09-25 9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: nmiell at comcast dot net @ 2009-09-24 23:14 UTC (permalink / raw)
To: gcc-bugs
gcc (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
The testcase (built with -Wall -O3):
#include <math.h>
void MulPi(float * __attribute__((aligned(16))) i, float *
__attribute__((aligned(16))) f, int n)
{
for (int j = 0; j < n; j++)
f[j] = (float) M_PI * i[j];
}
produces the following for the vectorized version of the loop:
.L7:
movaps %xmm1, %xmm0 # zero XMM0
incl %ecx
movlps (%rdi,%rax), %xmm0 # load the low half into XMM0
movhps 8(%rdi,%rax), %xmm0 # load the high half into XMM0
mulps %xmm2, %xmm0 # multiply by pi
movaps %xmm0, (%rsi,%rax) # store to memory
addq $16, %rax
cmpl %r8d, %ecx
jb .L7
--
Summary: vector loads are unnecessarily split into high and low
loads
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: nmiell at comcast dot net
GCC build triplet: x86_64-linux-gnu
GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
@ 2009-09-25 9:06 ` rguenth at gcc dot gnu dot org
2009-09-25 17:12 ` nmiell at comcast dot net
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-25 9:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2009-09-25 09:06 -------
The interesting thing is that data-ref analysis sees 128bit alignment but
the vectorizer still produces
vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
D.2564_12 = *D.2563_11;
vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
D.2565_13 = D.2564_12 * 2.2999999523162841796875e+0;
M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;
thus, unknown misalignment.
(instantiate_scev
(instantiate_below = 3)
(evolution_loop = 1)
(chrec = {i_10(D), +, 4}_1)
(res = {i_10(D), +, 4}_1))
base_address: i_10(D)
offset from base address: 0
constant offset from base address: 0
step: 4
aligned to: 128
base_object: *i_10(D)
Creating dr for *D.2562_7
(res = {f_6(D), +, 4}_1))
base_address: f_6(D)
offset from base address: 0
constant offset from base address: 0
step: 4
aligned to: 128
base_object: *f_6(D)
t2.i:5: note: === vect_enhance_data_refs_alignment ===
t2.i:5: note: Vectorizing an unaligned access.
t2.i:5: note: Vectorizing an unaligned access.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |irar at gcc dot gnu dot org
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2009-09-25 09:06:12
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
2009-09-25 9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
@ 2009-09-25 17:12 ` nmiell at comcast dot net
2009-09-25 17:34 ` ubizjak at gmail dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: nmiell at comcast dot net @ 2009-09-25 17:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from nmiell at comcast dot net 2009-09-25 17:12 -------
Even if it thinks the arrays aren't aligned, that doesn't explain the
completely unnecessarily zeroing of XMM0 or the choice of the load high/low
instructions over MOVUPS.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
2009-09-25 9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
2009-09-25 17:12 ` nmiell at comcast dot net
@ 2009-09-25 17:34 ` ubizjak at gmail dot com
2009-09-27 8:06 ` irar at il dot ibm dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2009-09-25 17:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from ubizjak at gmail dot com 2009-09-25 17:33 -------
(In reply to comment #2)
> Even if it thinks the arrays aren't aligned, that doesn't explain the
> completely unnecessarily zeroing of XMM0 or the choice of the load high/low
> instructions over MOVUPS.
This is by design, see config/i386/i386.c, ix86_expand_vector_move_misalign and
the comment above this function.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
` (2 preceding siblings ...)
2009-09-25 17:34 ` ubizjak at gmail dot com
@ 2009-09-27 8:06 ` irar at il dot ibm dot com
2009-09-27 9:43 ` rguenther at suse dot de
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-09-27 8:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from irar at il dot ibm dot com 2009-09-27 08:06 -------
(In reply to comment #1)
> The interesting thing is that data-ref analysis sees 128bit alignment but
> the vectorizer still produces
> vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
> D.2564_12 = *D.2563_11;
> vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
> D.2565_13 = D.2564_12 * 2.2999999523162841796875e+0;
> M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;
> thus, unknown misalignment.
> (instantiate_scev
> (instantiate_below = 3)
> (evolution_loop = 1)
> (chrec = {i_10(D), +, 4}_1)
> (res = {i_10(D), +, 4}_1))
> base_address: i_10(D)
> offset from base address: 0
> constant offset from base address: 0
> step: 4
> aligned to: 128
> base_object: *i_10(D)
> Creating dr for *D.2562_7
> (res = {f_6(D), +, 4}_1))
> base_address: f_6(D)
> offset from base address: 0
> constant offset from base address: 0
> step: 4
> aligned to: 128
> base_object: *f_6(D)
> t2.i:5: note: === vect_enhance_data_refs_alignment ===
> t2.i:5: note: Vectorizing an unaligned access.
> t2.i:5: note: Vectorizing an unaligned access.
"aligned to" refers to the offset misalignment and not to the misalignment of
base.
attribute aligned works only for arrays, i.e., declarations, and not for
pointer arguments. For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
the base type.
Ira
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
` (3 preceding siblings ...)
2009-09-27 8:06 ` irar at il dot ibm dot com
@ 2009-09-27 9:43 ` rguenther at suse dot de
2009-09-27 9:56 ` irar at il dot ibm dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2009-09-27 9:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenther at suse dot de 2009-09-27 09:43 -------
Subject: Re: vector loads are unnecessarily
split into high and low loads
On Sun, 27 Sep 2009, irar at il dot ibm dot com wrote:
> ------- Comment #4 from irar at il dot ibm dot com 2009-09-27 08:06 -------
> (In reply to comment #1)
> > The interesting thing is that data-ref analysis sees 128bit alignment but
> > the vectorizer still produces
> > vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
> > D.2564_12 = *D.2563_11;
> > vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
> > D.2565_13 = D.2564_12 * 2.2999999523162841796875e+0;
> > M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;
> > thus, unknown misalignment.
> > (instantiate_scev
> > (instantiate_below = 3)
> > (evolution_loop = 1)
> > (chrec = {i_10(D), +, 4}_1)
> > (res = {i_10(D), +, 4}_1))
> > base_address: i_10(D)
> > offset from base address: 0
> > constant offset from base address: 0
> > step: 4
> > aligned to: 128
> > base_object: *i_10(D)
> > Creating dr for *D.2562_7
> > (res = {f_6(D), +, 4}_1))
> > base_address: f_6(D)
> > offset from base address: 0
> > constant offset from base address: 0
> > step: 4
> > aligned to: 128
> > base_object: *f_6(D)
> > t2.i:5: note: === vect_enhance_data_refs_alignment ===
> > t2.i:5: note: Vectorizing an unaligned access.
> > t2.i:5: note: Vectorizing an unaligned access.
>
> "aligned to" refers to the offset misalignment and not to the misalignment of
> base.
Hmm, I believe it refers to base + offset + constant offset.
> attribute aligned works only for arrays, i.e., declarations, and not for
> pointer arguments.
I have to check that - I believe that in principle it should work.
> For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
> the base type.
That should be ok. I guess I have to see what's going on here.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
` (4 preceding siblings ...)
2009-09-27 9:43 ` rguenther at suse dot de
@ 2009-09-27 9:56 ` irar at il dot ibm dot com
2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-09-27 9:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from irar at il dot ibm dot com 2009-09-27 09:56 -------
(In reply to comment #5)
> >
> > "aligned to" refers to the offset misalignment and not to the misalignment of
> > base.
> Hmm, I believe it refers to base + offset + constant offset.
tree-data-refs.h:
/* Alignment information. ALIGNED_TO is set to the largest power of two
that divides OFFSET. */
tree aligned_to;
tree-dat-refs.c:
DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));
> > attribute aligned works only for arrays, i.e., declarations, and not for
> > pointer arguments.
> I have to check that - I believe that in principle it should work.
> > For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
> > the base type.
> That should be ok.
But we need TYPE_ALIGN_UNIT to be 16, and we are checking scalar type here, so
without user defined alignment it will be 4.
Ira
> I guess I have to see what's going on here.
> Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
` (5 preceding siblings ...)
2009-09-27 9:56 ` irar at il dot ibm dot com
@ 2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
2010-02-04 20:29 ` bredelin at ucla dot edu
8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-24 11:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenth at gcc dot gnu dot org 2010-01-24 11:52 -------
*** Bug 42846 has been marked as a duplicate of this bug. ***
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bredelin at ucla dot edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
` (6 preceding siblings ...)
2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
@ 2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
2010-02-04 20:29 ` bredelin at ucla dot edu
8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-24 12:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from rguenth at gcc dot gnu dot org 2010-01-24 12:08 -------
In the testcase from PR42846 one issue is that
base_address: p__3(D)
offset from base address: 0
constant offset from base address: 0
step: 4
aligned to: 128
base_object: *(const aligned_real * restrict) p__3(D)
only in the base object we see the cast to the aligned pointer, but it is
stripped for the base address in the innermost loop.
So in the end all this boils down to the Frontend / middle-end issue of
weak handling of aligned types.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
` (7 preceding siblings ...)
2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
@ 2010-02-04 20:29 ` bredelin at ucla dot edu
8 siblings, 0 replies; 11+ messages in thread
From: bredelin at ucla dot edu @ 2010-02-04 20:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from bredelin at ucla dot edu 2010-02-04 20:29 -------
In reply to comment #8
> So in the end all this boils down to the Frontend / middle-end issue of
> weak handling of aligned types.
Would you mind giving a general idea of what the outlook for improvement on
this front is?
Also, this is interesting:
http://eigen.tuxfamily.org/index.php?title=Benchmark
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
[not found] <bug-41464-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-13 0:09 ` pinskia at gcc dot gnu.org
0 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-13 0:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.9.0
Known to work| |4.9.0
Known to fail| |4.4.7, 4.5.3, 4.7.1, 4.8.1,
| |4.8.5
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
4.8.5 did:
movaps xmm0, xmm1
add ecx, 1
movlps xmm0, QWORD PTR [rdi+rax]
movhps xmm0, QWORD PTR [rdi+8+rax]
mulps xmm0, xmm2
movlps QWORD PTR [rsi+rax], xmm0
movhps QWORD PTR [rsi+8+rax], xmm0
But 4.9.0 has:
movaps xmm0, XMMWORD PTR [rbp+0+r9]
add r10d, 1
mulps xmm0, xmm1
movups XMMWORD PTR [rax+r9], xmm0
So all fixed for GCC 4.9.0.
4.9.0
vect__11.13_94 = MEM[base: vectp_i.12_90, index: ivtmp.28_28, offset: 0B];
vect__12.14_96 = vect__11.13_94 * { 3.1415927410125732421875e+0,
3.1415927410125732421875e+0, 3.1415927410125732421875e+0,
3.1415927410125732421875e+0 };
MEM[base: vectp_f.17_97, index: ivtmp.28_28, offset: 0B] = vect__12.14_96;
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-12-13 0:09 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
2009-09-25 9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
2009-09-25 17:12 ` nmiell at comcast dot net
2009-09-25 17:34 ` ubizjak at gmail dot com
2009-09-27 8:06 ` irar at il dot ibm dot com
2009-09-27 9:43 ` rguenther at suse dot de
2009-09-27 9:56 ` irar at il dot ibm dot com
2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
2010-02-04 20:29 ` bredelin at ucla dot edu
[not found] <bug-41464-4@http.gcc.gnu.org/bugzilla/>
2021-12-13 0:09 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).