public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic
@ 2014-03-04 10:57 mans at mansr dot com
2014-03-04 11:37 ` [Bug target/60408] " ktkachov at gcc dot gnu.org
2015-03-23 17:04 ` wilson at tuliptree dot org
0 siblings, 2 replies; 3+ messages in thread
From: mans at mansr dot com @ 2014-03-04 10:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408
Bug ID: 60408
Summary: ARM: inefficient code for vget_lane_f32 intrinsic
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: mans at mansr dot com
Consider this trivial function:
#include <arm_neon.h>
float foo(float32x2_t v)
{
return vget_lane_f32(v, 0) + vget_lane_f32(v, 1);
}
Compiling with gcc 4.9 trunk from 2014-03-02 yields this (non-code output
removed):
$ gcc -O3 -march=armv7-a -mfpu=neon -S -o - test.c
foo:
vmov.32 r3, d0[0]
vmov.32 r2, d0[1]
fmsr s15, r3
fmsr s0, r2
fadds s0, s0, s15
bx lr
A simple "fadds s0, s0, s1" is what one would expect from code like this.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
2014-03-04 10:57 [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic mans at mansr dot com
@ 2014-03-04 11:37 ` ktkachov at gcc dot gnu.org
2015-03-23 17:04 ` wilson at tuliptree dot org
1 sibling, 0 replies; 3+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-03-04 11:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2014-03-04
CC| |ktkachov at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from ktkachov at gcc dot gnu.org ---
confirmed, this is in part due to the "r constraint" in the vec_extract<mode>
pattern in neon.md.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
2014-03-04 10:57 [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic mans at mansr dot com
2014-03-04 11:37 ` [Bug target/60408] " ktkachov at gcc dot gnu.org
@ 2015-03-23 17:04 ` wilson at tuliptree dot org
1 sibling, 0 replies; 3+ messages in thread
From: wilson at tuliptree dot org @ 2015-03-23 17:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408
--- Comment #3 from Jim Wilson <wilson at tuliptree dot org> ---
Even if we could fix the vec_extract constraints, we still end up with 3
instructions, as the optimizer can't do anything interesting with the
vec_extract RTL.
For a 32-bit SFmode value though, we can just use a subreg instead of a vector
extract. The ARM port models the vector registers as 32-bit registers, so a
subreg for a 32-bit mode will always be valid. Using a subreg instead of a
vector extract here, I get 2 instructions.
vmov.f32 s15, s0
vadd.f32 s0, s1, s15
That is because the register allocator thinks it needs a temp because
inputs and ouputs partially overlap. That is a harder problem to fix.
Subregs should also work for 64-bit modes.
I have an experimental patch which is mostly untested. I don't know if this
works for both big-endian and little-endian. I don't know if this works for
all 32-bit modes and all vector types. Etc. All I know is that it seems to
work for this testcase.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-03-23 16:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-04 10:57 [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic mans at mansr dot com
2014-03-04 11:37 ` [Bug target/60408] " ktkachov at gcc dot gnu.org
2015-03-23 17:04 ` wilson at tuliptree dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).