public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* optimization/10469: [3.3, 3.4] constant V4SF loads get moved inside loop
@ 2003-04-23 20:46 rguenth
0 siblings, 0 replies; only message in thread
From: rguenth @ 2003-04-23 20:46 UTC (permalink / raw)
To: gcc-gnats
>Number: 10469
>Category: optimization
>Synopsis: [3.3, 3.4] constant V4SF loads get moved inside loop
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: unassigned
>State: open
>Class: pessimizes-code
>Submitter-Id: net
>Arrival-Date: Wed Apr 23 20:46:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Richard Guenther
>Release: gcc-3.3 (GCC) 3.3 20030423 (prerelease), gcc-3.4 (GCC) 3.4 20030422 (experimental)
>Organization:
>Environment:
ia32 (sse2), powerpc (altivec)
>Description:
For the following code, while good() creates perfectly optimal code for the loop, bad() moves the fv initialization inside the loop body as can be seen from the asm snippets below:
typedef float v4sf __attribute__((mode(V4SF)));
void good(float *r, float f, int cnt)
{
float fv[4] __attribute__((aligned(__alignof__(v4sf)))) = { f, f, f, f };
while (cnt--) {
*(v4sf *)r = *(v4sf *)fv;
r += 4;
}
}
void bad(float *r, float f, int cnt)
{
v4sf fv = { f, f, f, f };
while (cnt--) {
*(v4sf *)r = fv;
r += 4;
}
}
powerpc asm, generated with gcc-3.3 -O2 -S -fverbose-asm -maltivec simd.c:
good:
cmpwi 0,4,0 # cnt
stwu 1,-32(1)
addi 4,4,-1 # cnt, cnt
stfs 1,20(1) # fv, f
stfs 1,8(1) # fv, f
stfs 1,12(1) # fv, f
stfs 1,16(1) # fv, f
beq- 0,.L7
addi 4,4,1 # cnt
addi 9,1,8
mtctr 4
lvx 0,0,9
.L8:
stvx 0,0,3 # * r
addi 3,3,16 # r, r
bdnz .L8
bad:
stwu 1,-32(1)
cmpwi 0,4,0 # cnt
addi 4,4,-1 # cnt, cnt
stfs 1,8(1)
lwz 9,8(1)
mr 10,9
mr 11,9
mr 12,9
beq- 0,.L15
addi 4,4,1 # cnt
mtctr 4
.L16:
addi 8,1,16
stw 9,0(8) # fv
stw 10,4(8) # fv
stw 11,8(8) # fv
stw 12,12(8) # fv
lvx 0,0,8
stvx 0,0,3 # * r
addi 3,3,16 # r, r
bdnz .L16
for ia32 similar things happen, not as bad, but
good:
...
movaps -24(%ebp), %xmm0
.L5:
subl $1, %edx # cnt
movaps %xmm0, (%ecx) # * r
addl $16, %ecx # r
cmpl $-1, %edx # cnt
jne .L5
bad:
...
.L12:
movaps -24(%ebp), %xmm0 # fv
subl $1, %eax # cnt
movaps %xmm0, (%edx) # * r
addl $16, %edx # r
cmpl $-1, %eax # cnt
jne .L12
For gcc 3.4 similar things happen (ia32 only tested):
good:
movaps -24(%ebp), %xmm0 #, tmp62
.L4:
subl $1, %edx #, cnt
movaps %xmm0, (%ecx) # tmp62,* r
addl $16, %ecx #, r
cmpl $-1, %edx #, cnt
jne .L4 #,
bad:
jmp .L14 #
.L15:
movaps -24(%ebp), %xmm0 # fv,
movaps %xmm0, (%edx) #,* r
addl $16, %edx #, r
.L14:
subl $1, %eax #, cnt
cmpl $-1, %eax #, cnt
jne .L15 #,
So the more natural way to write the code pessimizes it without appearant reason.
>How-To-Repeat:
Compile the testcase with SSE2 or Altivec support on ia32/powerpc.
>Fix:
A workaround is to use a temporary array, as given in the example.
>Release-Note:
>Audit-Trail:
>Unformatted:
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2003-04-23 20:46 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-23 20:46 optimization/10469: [3.3, 3.4] constant V4SF loads get moved inside loop rguenth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).