public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic
@ 2014-03-04 10:57 mans at mansr dot com
  2014-03-04 11:37 ` [Bug target/60408] " ktkachov at gcc dot gnu.org
  2015-03-23 17:04 ` wilson at tuliptree dot org
  0 siblings, 2 replies; 3+ messages in thread
From: mans at mansr dot com @ 2014-03-04 10:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408

            Bug ID: 60408
           Summary: ARM: inefficient code for vget_lane_f32 intrinsic
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mans at mansr dot com

Consider this trivial function:

#include <arm_neon.h>
float foo(float32x2_t v)
{
    return vget_lane_f32(v, 0) + vget_lane_f32(v, 1);
}

Compiling with gcc 4.9 trunk from 2014-03-02 yields this (non-code output
removed):

$ gcc -O3 -march=armv7-a -mfpu=neon -S -o - test.c
foo:
        vmov.32 r3, d0[0]
        vmov.32 r2, d0[1]
        fmsr    s15, r3
        fmsr    s0, r2
        fadds   s0, s0, s15
        bx      lr

A simple "fadds s0, s0, s1" is what one would expect from code like this.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
  2014-03-04 10:57 [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic mans at mansr dot com
@ 2014-03-04 11:37 ` ktkachov at gcc dot gnu.org
  2015-03-23 17:04 ` wilson at tuliptree dot org
  1 sibling, 0 replies; 3+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-03-04 11:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-03-04
                 CC|                            |ktkachov at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
confirmed, this is in part due to the "r constraint" in the vec_extract<mode>
pattern in neon.md.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
  2014-03-04 10:57 [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic mans at mansr dot com
  2014-03-04 11:37 ` [Bug target/60408] " ktkachov at gcc dot gnu.org
@ 2015-03-23 17:04 ` wilson at tuliptree dot org
  1 sibling, 0 replies; 3+ messages in thread
From: wilson at tuliptree dot org @ 2015-03-23 17:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408

--- Comment #3 from Jim Wilson <wilson at tuliptree dot org> ---
Even if we could fix the vec_extract constraints, we still end up with 3
instructions, as the optimizer can't do anything interesting with the
vec_extract RTL.

For a 32-bit SFmode value though, we can just use a subreg instead of a vector
extract.  The ARM port models the vector registers as 32-bit registers, so a
subreg for a 32-bit mode will always be valid.  Using a subreg instead of a
vector extract here, I get 2 instructions.
vmov.f32 s15, s0
vadd.f32 s0, s1, s15
That is because the register allocator thinks it needs a temp because
inputs and ouputs partially overlap.  That is a harder problem to fix.

Subregs should also work for 64-bit modes.

I have an experimental patch which is mostly untested.  I don't know if this
works for both big-endian and little-endian.  I don't know if this works for
all 32-bit modes and all vector types.  Etc.  All I know is that it seems to
work for this testcase.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-23 16:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-04 10:57 [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic mans at mansr dot com
2014-03-04 11:37 ` [Bug target/60408] " ktkachov at gcc dot gnu.org
2015-03-23 17:04 ` wilson at tuliptree dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).