public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/65375] New: poor codegen for ld[234]/st[234]
@ 2015-03-10  8:15 kugan at gcc dot gnu.org
  2015-03-10  8:16 ` [Bug target/65375] " kugan at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: kugan at gcc dot gnu.org @ 2015-03-10  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

            Bug ID: 65375
           Summary: poor codegen for ld[234]/st[234]
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kugan at gcc dot gnu.org

#include <arm_neon.h>
void hello_vst2(float* fout, float *fin)
{
float32x4x2_t a;
a = vld2q_f32 (fin);
vst2q_f32 (fout, a);
}


with aarch64-none-linux-gnu-gcc  -O2 -ffast-math -unsafe-math-optimisations
produces:

    .cpu generic+fp+simd
    .file    "neon.c"
    .text
    .align    2
    .p2align 3,,7
    .global    hello_vst2
    .type    hello_vst2, %function
hello_vst2:
    ld2    {v0.4s - v1.4s}, [x1]
    sub    sp, sp, #32
    umov    x1, v0.d[0]
    umov    x2, v0.d[1]
    str    q1, [sp, 16]
    mov    x5, x1
    stp    x5, x2, [sp]
    ld1    {v0.16b - v1.16b}, [sp]
    st2    {v0.4s - v1.4s}, [x0]
    add    sp, sp, 32
    ret
    .size    hello_vst2, .-hello_vst2
    .ident    "GCC: (GNU) 5.0.0 20150305 (experimental)"
    .section    .note.GNU-stack,"",%progbits


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] poor codegen for ld[234]/st[234]
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
@ 2015-03-10  8:16 ` kugan at gcc dot gnu.org
  2015-03-10  8:18 ` kugan at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: kugan at gcc dot gnu.org @ 2015-03-10  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #1 from kugan at gcc dot gnu.org ---
arm-none-linux-gnueabi-gcc  -O2 -ffast-math -unsafe-math-optimisations  
-mfpu=neon produces just:

hello_vst2:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    vld2.32    {d16-d19}, [r1]
    vst2.32    {d16-d19}, [r0]
    bx


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] poor codegen for ld[234]/st[234]
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
  2015-03-10  8:16 ` [Bug target/65375] " kugan at gcc dot gnu.org
@ 2015-03-10  8:18 ` kugan at gcc dot gnu.org
  2015-03-10  8:19 ` kugan at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: kugan at gcc dot gnu.org @ 2015-03-10  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #2 from kugan at gcc dot gnu.org ---
aarch64-none-linux-gnu-gcc -O2 -ffast-math -unsafe-math-optimisations
-fno-split-wide-types produces :

    ld2    {v2.4s - v3.4s}, [x1]
    orr    v0.16b, v2.16b, v2.16b
    orr    v1.16b, v3.16b, v3.16b
    st2    {v0.4s - v1.4s}, [x0]
    ret


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] poor codegen for ld[234]/st[234]
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
  2015-03-10  8:16 ` [Bug target/65375] " kugan at gcc dot gnu.org
  2015-03-10  8:18 ` kugan at gcc dot gnu.org
@ 2015-03-10  8:19 ` kugan at gcc dot gnu.org
  2015-03-10  8:32 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: kugan at gcc dot gnu.org @ 2015-03-10  8:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

kugan at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64
           Assignee|unassigned at gcc dot gnu.org      |kugan at gcc dot gnu.org

--- Comment #3 from kugan at gcc dot gnu.org ---
aarch64-none-linux-gnu-gcc  -v
Using built-in specs.
COLLECT_GCC=/home/kugan/work/builds/gcc-fsf-gcc/tools/bin/aarch64-none-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/kugan/work/builds/gcc-fsf-gcc/tools/libexec/gcc/aarch64-none-linux-gnu/5.0.0/lto-wrapper
Target: aarch64-none-linux-gnu
Configured with: /home/kugan/work/sources/gcc-fsf/gcc/configure
--target=aarch64-none-linux-gnu
--prefix=/home/kugan/work/builds/gcc-fsf-gcc/tools
--with-sysroot=/home/kugan/work/builds/gcc-fsf-gcc/sysroot-aarch64-none-linux-gnu
--disable-libssp --disable-libgomp --disable-libmudflap
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 5.0.0 20150305 (experimental) (GCC)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] poor codegen for ld[234]/st[234]
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-03-10  8:19 ` kugan at gcc dot gnu.org
@ 2015-03-10  8:32 ` pinskia at gcc dot gnu.org
  2015-04-13 16:36 ` [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32 mkuvyrkov at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-03-10  8:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
;; _6 = __builtin_aarch64_get_qregoiv4sf (__o_5, 0);

(insn 8 7 0 (set (reg:V4SF 74 [ D.16774 ])
        (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
/data1/src/gcc-cavium/toolchain-thunder/thunderx-tools/lib/gcc/aarch64-thunderx-linux-gnu/5.0.0/include/arm_neon.h:15586
-1
     (nil))

;; _7 = __builtin_aarch64_get_qregoiv4sf (__o_5, 1);

(insn 9 8 0 (set (reg:V4SF 75 [ D.16774 ])
        (subreg:V4SF (reg/v:OI 73 [ __o ]) 16))
/data1/src/gcc-cavium/toolchain-thunder/thunderx-tools/lib/gcc/aarch64-thunderx-linux-gnu/5.0.0/include/arm_neon.h:15587
-1
     (nil))


Actually maybe we should use POI here, the partial integer mode will cause
splitting subreg not do anything.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-03-10  8:32 ` pinskia at gcc dot gnu.org
@ 2015-04-13 16:36 ` mkuvyrkov at gcc dot gnu.org
  2015-04-14  8:05 ` jgreenhalgh at gcc dot gnu.org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2015-04-13 16:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #5 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
Kugan and Jim Wilson have posted a patch for this on March 26th.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2015-04-13 16:36 ` [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32 mkuvyrkov at gcc dot gnu.org
@ 2015-04-14  8:05 ` jgreenhalgh at gcc dot gnu.org
  2015-04-14  8:06 ` mkuvyrkov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-14  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

James Greenhalgh <jgreenhalgh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |jgreenhalgh at gcc dot gnu.org
         Resolution|---                         |FIXED

--- Comment #6 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
So, fixed then?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2015-04-14  8:05 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-14  8:06 ` mkuvyrkov at gcc dot gnu.org
  2015-04-14  9:11 ` kugan at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2015-04-14  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |ASSIGNED
   Last reconfirmed|                            |2015-04-14
         Resolution|FIXED                       |---
     Ever confirmed|0                           |1

--- Comment #7 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
The patch is not approved yet.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2015-04-14  8:06 ` mkuvyrkov at gcc dot gnu.org
@ 2015-04-14  9:11 ` kugan at gcc dot gnu.org
  2015-06-23 15:40 ` wilson at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: kugan at gcc dot gnu.org @ 2015-04-14  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #8 from kugan at gcc dot gnu.org ---
Patch is at https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00857.html and not
approved yet.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2015-04-14  9:11 ` kugan at gcc dot gnu.org
@ 2015-06-23 15:40 ` wilson at gcc dot gnu.org
  2015-06-24  9:06 ` ramana at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: wilson at gcc dot gnu.org @ 2015-06-23 15:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #10 from Jim Wilson <wilson at gcc dot gnu.org> ---
Improved, but not completely resolved.  We still get unnecessary orr
instructions, same as in comment 2.  This is partly an issue with the register
allocator not handling partially overlapping register reads/writes very well. 
We already have a few other bugs for that.  This is also partly an issue with
how the aarch64 builtins work, via __builtin_aarch64_[gs]et_qregoiv4sf which
create the partially overlapping register reads/writes.  The ARM builtins don't
work this way, they use a union for type punning, and hence don't have the same
problem.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2015-06-23 15:40 ` wilson at gcc dot gnu.org
@ 2015-06-24  9:06 ` ramana at gcc dot gnu.org
  2015-06-24  9:13 ` kugan at gcc dot gnu.org
  2015-06-25 20:49 ` ramana at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-06-24  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #11 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to Jim Wilson from comment #10)
> Improved, but not completely resolved.  We still get unnecessary orr
> instructions, same as in comment 2.  This is partly an issue with the
> register allocator not handling partially overlapping register reads/writes
> very well.  We already have a few other bugs for that.  This is also partly
> an issue with how the aarch64 builtins work, via
> __builtin_aarch64_[gs]et_qregoiv4sf which create the partially overlapping
> register reads/writes.  The ARM builtins don't work this way, they use a
> union for type punning, and hence don't have the same problem.

Both the ARM and the AArch64 ports have the issues with partially overlapping
register reads / writes especially with the vzip / vuzip style intrinsics in
AArch32 world or even the larger vld3/4 intrinsics in both ARM and AArch64
states. It would be nice to fix that finally.

If that is the only issue left in the ticket - maybe we should just park this
example in that ticket - IIRC PR43725 and close this one out ?

regards
Ramana


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2015-06-24  9:06 ` ramana at gcc dot gnu.org
@ 2015-06-24  9:13 ` kugan at gcc dot gnu.org
  2015-06-25 20:49 ` ramana at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: kugan at gcc dot gnu.org @ 2015-06-24  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

kugan at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #12 from kugan at gcc dot gnu.org ---
Fixed in trunk except for the additional orr instruction (overlapping register
reads / write). As Ramana mentioned, that is a known problem and tracked in
PR43725.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32
  2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2015-06-24  9:13 ` kugan at gcc dot gnu.org
@ 2015-06-25 20:49 ` ramana at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-06-25 20:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375

--- Comment #13 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Or indeed PR 63277...


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-06-25 20:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-10  8:15 [Bug target/65375] New: poor codegen for ld[234]/st[234] kugan at gcc dot gnu.org
2015-03-10  8:16 ` [Bug target/65375] " kugan at gcc dot gnu.org
2015-03-10  8:18 ` kugan at gcc dot gnu.org
2015-03-10  8:19 ` kugan at gcc dot gnu.org
2015-03-10  8:32 ` pinskia at gcc dot gnu.org
2015-04-13 16:36 ` [Bug target/65375] aarch64: poor codegen for vld2q_f32 and vst2q_f32 mkuvyrkov at gcc dot gnu.org
2015-04-14  8:05 ` jgreenhalgh at gcc dot gnu.org
2015-04-14  8:06 ` mkuvyrkov at gcc dot gnu.org
2015-04-14  9:11 ` kugan at gcc dot gnu.org
2015-06-23 15:40 ` wilson at gcc dot gnu.org
2015-06-24  9:06 ` ramana at gcc dot gnu.org
2015-06-24  9:13 ` kugan at gcc dot gnu.org
2015-06-25 20:49 ` ramana at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).