public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly
@ 2010-02-19 10:52 samuel dot rodal at nokia dot com
2010-02-19 11:08 ` [Bug target/43118] " rguenth at gcc dot gnu dot org
` (6 more replies)
0 siblings, 7 replies; 10+ messages in thread
From: samuel dot rodal at nokia dot com @ 2010-02-19 10:52 UTC (permalink / raw)
To: gcc-bugs
The vldX and vstX variation of NEON intrinsics, where X > 1, seem to cause the
compiler to generate an obscene amount of code.
Example:
void blend1(uint8_t *src, uint8_t *dst)
{
uint8x8_t temp = vld1_u8(src);
vst1_u8(dst, temp);
}
generates the sensible
vld1.8 {d16}, [r0]
vst1.8 {d16}, [r1]
bx lr
Whereas:
void blend4(uint8_t *src, uint8_t *dst)
{
uint8x8x4_t temp = vld4_u8(src);
vst4_u8(dst, temp);
}
generates
stmfd sp!, {r4, r5, r6}
.save {r4, r5, r6}
.LCFI4:
.pad #132
sub sp, sp, #132
.LCFI5:
vld4.8 {d16-d19}, [r0]
add r6, sp, #64
vstmia r6, {d16-d19}
mov r5, r1
ldmia r6!, {r0, r1, r2, r3}
add ip, sp, #96
mov r4, ip
stmia r4!, {r0, r1, r2, r3}
ldmia r6, {r0, r1, r2, r3}
stmia r4, {r0, r1, r2, r3}
ldmia ip!, {r0, r1, r2, r3}
add ip, sp, #32
mov r6, ip
stmia r6!, {r0, r1, r2, r3}
ldmia r4, {r0, r1, r2, r3}
stmia r6, {r0, r1, r2, r3}
ldmia ip!, {r0, r1, r2, r3}
mov r4, sp
stmia r4!, {r0, r1, r2, r3}
ldmia r6, {r0, r1, r2, r3}
stmia r4, {r0, r1, r2, r3}
vldmia sp, {d16-d19}
vst4.8 {d16-d19}, [r5]
add sp, sp, #132
ldmfd sp!, {r4, r5, r6}
bx lr
Compile flags used were "-mfloat-abi=softfp -mfpu=neon -O3".
--
Summary: vld4 and vst4 intrinsics are not handled correctly
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: samuel dot rodal at nokia dot com
GCC build triplet: arm-none-linux-gnueabi-gcc
GCC host triplet: arm-none-linux-gnueabi-gcc
GCC target triplet: arm-none-linux-gnueabi-gcc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
@ 2010-02-19 11:08 ` rguenth at gcc dot gnu dot org
2010-02-19 13:46 ` ramana at gcc dot gnu dot org
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-02-19 11:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2010-02-19 11:08 -------
Likely because of the union in
__extension__ static __inline void __attribute__ ((__always_inline__))
vst4_u8 (uint8_t * __a, uint8x8x4_t __b)
{
union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
__builtin_neon_vst4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
}
which does copy-initialization of __bu. Also try GCC 4.5.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
2010-02-19 11:08 ` [Bug target/43118] " rguenth at gcc dot gnu dot org
@ 2010-02-19 13:46 ` ramana at gcc dot gnu dot org
2010-02-22 21:14 ` drow at false dot org
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-02-19 13:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ramana at gcc dot gnu dot org 2010-02-19 13:45 -------
Trunk behaves similarly - I wonder if this is similar to 41021.
Here's what trunk generates.
push {r4, r5, r6, r7}
vld4.8 {d16-d19}, [r0]
sub sp, sp, #96
mov r7, r1
vstmia sp, {d16-d19}
mov r6, sp
add r5, sp, #64
add ip, sp, #32
ldmia r6!, {r0, r1, r2, r3}
mov r4, r5
stmia r5!, {r0, r1, r2, r3}
ldmia r6, {r0, r1, r2, r3}
stmia r5, {r0, r1, r2, r3}
ldmia r4!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldmia r4, {r0, r1, r2, r3}
stmia ip, {r0, r1, r2, r3}
add r3, sp, #32
vldmia r3, {d16-d19}
vst4.8 {d16-d19}, [r7]
add sp, sp, #96
pop {r4, r5, r6, r7}
bx lr
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Known to fail| |4.4.3 4.5.0
Last reconfirmed|0000-00-00 00:00:00 |2010-02-19 13:45:57
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
2010-02-19 11:08 ` [Bug target/43118] " rguenth at gcc dot gnu dot org
2010-02-19 13:46 ` ramana at gcc dot gnu dot org
@ 2010-02-22 21:14 ` drow at false dot org
2010-02-23 10:42 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: drow at false dot org @ 2010-02-22 21:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from drow at gcc dot gnu dot org 2010-02-22 21:14 -------
Subject: Re: vld4 and vst4 intrinsics are not handled
correctly
On Fri, Feb 19, 2010 at 11:08:18AM -0000, rguenth at gcc dot gnu dot org wrote:
> Likely because of the union in
>
> __extension__ static __inline void __attribute__ ((__always_inline__))
> vst4_u8 (uint8_t * __a, uint8x8x4_t __b)
> {
> union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
> __builtin_neon_vst4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
> }
>
> which does copy-initialization of __bu.
Right. FYI, my best idea to date of how to fix this is to convert the
multiple-vector types (like uint8x8x4_t) to builtin types. At that
point we can use the neon_reinterpret patterns to do the necessary
type punning without involving __builtin_neon_oi and the union.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
` (2 preceding siblings ...)
2010-02-22 21:14 ` drow at false dot org
@ 2010-02-23 10:42 ` rguenth at gcc dot gnu dot org
2010-03-20 14:53 ` ramana at gcc dot gnu dot org
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-02-23 10:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2010-02-23 10:42 -------
(In reply to comment #3)
> Subject: Re: vld4 and vst4 intrinsics are not handled
> correctly
>
> On Fri, Feb 19, 2010 at 11:08:18AM -0000, rguenth at gcc dot gnu dot org wrote:
> > Likely because of the union in
> >
> > __extension__ static __inline void __attribute__ ((__always_inline__))
> > vst4_u8 (uint8_t * __a, uint8x8x4_t __b)
> > {
> > union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
> > __builtin_neon_vst4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
> > }
> >
> > which does copy-initialization of __bu.
>
> Right. FYI, my best idea to date of how to fix this is to convert the
> multiple-vector types (like uint8x8x4_t) to builtin types. At that
> point we can use the neon_reinterpret patterns to do the necessary
> type punning without involving __builtin_neon_oi and the union.
Ideally we'd be able to get rid of the extra temporary at the tree level.
Value-numbering can in theory do that, but I suppose the testcase at
hand is obfuscated enough to not do it.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
` (3 preceding siblings ...)
2010-02-23 10:42 ` rguenth at gcc dot gnu dot org
@ 2010-03-20 14:53 ` ramana at gcc dot gnu dot org
2010-04-28 21:56 ` justin dot lebar+bug at gmail dot com
2010-09-15 20:54 ` generalruzzmo at gmail dot com
6 siblings, 0 replies; 10+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-03-20 14:53 UTC (permalink / raw)
To: gcc-bugs
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Keywords| |missed-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
` (4 preceding siblings ...)
2010-03-20 14:53 ` ramana at gcc dot gnu dot org
@ 2010-04-28 21:56 ` justin dot lebar+bug at gmail dot com
2010-09-15 20:54 ` generalruzzmo at gmail dot com
6 siblings, 0 replies; 10+ messages in thread
From: justin dot lebar+bug at gmail dot com @ 2010-04-28 21:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from justin dot lebar+bug at gmail dot com 2010-04-28 21:56 -------
Is there a workaround for this, short of writing inline assembly?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
` (5 preceding siblings ...)
2010-04-28 21:56 ` justin dot lebar+bug at gmail dot com
@ 2010-09-15 20:54 ` generalruzzmo at gmail dot com
6 siblings, 0 replies; 10+ messages in thread
From: generalruzzmo at gmail dot com @ 2010-09-15 20:54 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from generalruzzmo at gmail dot com 2010-09-15 20:54 -------
this bug is bugging me too..
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
[not found] <bug-43118-4@http.gcc.gnu.org/bugzilla/>
2011-07-08 11:58 ` ramana at gcc dot gnu.org
@ 2011-07-19 8:42 ` rsandifo at gcc dot gnu.org
1 sibling, 0 replies; 10+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2011-07-19 8:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
rsandifo@gcc.gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |rsandifo at gcc dot gnu.org
Resolution| |FIXED
--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 2011-07-19 08:41:04 UTC ---
(In reply to comment #7)
> A recent version of 4.6.1 at O1 appears to give me . That would indicate this
> is fixed in trunk.
Yeah, the bug was fixed as part of the load-lanes stuff.
Since it isn't a regression, and since the fix is too
invasive to backport, I hope it's OK to close as fixed.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/43118] vld4 and vst4 intrinsics are not handled correctly
[not found] <bug-43118-4@http.gcc.gnu.org/bugzilla/>
@ 2011-07-08 11:58 ` ramana at gcc dot gnu.org
2011-07-19 8:42 ` rsandifo at gcc dot gnu.org
1 sibling, 0 replies; 10+ messages in thread
From: ramana at gcc dot gnu.org @ 2011-07-08 11:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu.org
Known to fail| |
--- Comment #7 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2011-07-08 11:57:04 UTC ---
A recent version of 4.6.1 at O1 appears to give me . That would indicate this
is fixed in trunk.
blend4:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vld4.8 {d16-d19}, [r0]
vst4.8 {d16-d19}, [r1]
bx lr
.size blend4, .-blend4
.ident "GCC: (GNU) 4.7.0 20110616 (experimental)"
.section .note.GNU-stack,"",%progbits
Ramana
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-07-19 8:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-19 10:52 [Bug target/43118] New: vld4 and vst4 intrinsics are not handled correctly samuel dot rodal at nokia dot com
2010-02-19 11:08 ` [Bug target/43118] " rguenth at gcc dot gnu dot org
2010-02-19 13:46 ` ramana at gcc dot gnu dot org
2010-02-22 21:14 ` drow at false dot org
2010-02-23 10:42 ` rguenth at gcc dot gnu dot org
2010-03-20 14:53 ` ramana at gcc dot gnu dot org
2010-04-28 21:56 ` justin dot lebar+bug at gmail dot com
2010-09-15 20:54 ` generalruzzmo at gmail dot com
[not found] <bug-43118-4@http.gcc.gnu.org/bugzilla/>
2011-07-08 11:58 ` ramana at gcc dot gnu.org
2011-07-19 8:42 ` rsandifo at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).