[Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
@ 2010-09-29 20:50 ` rearnsha at gcc dot gnu.org
  2010-10-04 23:00 ` siarhei.siamashka at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2010-09-29 20:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2010-05-11 07:35:23         |2010-09-29 7:35:23
               date|                            |
                 CC|                            |rearnsha at gcc dot gnu.org

--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> 2010-09-29 16:28:17 UTC ---
So the compiler is correct not to be using vld1 for this code.  The memory
format of int32x4_t is defined to be the format of a neon register that has
been filled from an array of int32 values and then stored to memory using VSTM
(or equivalent sequence).  The implication of all this is that int32x4_t does
not (necessarily) have the same memory layout as int32_t[4].

arm_neon.h provides intrinsics for filling neon registers from arrays in
memory, and in this case I think you should be using these directly.  That is,
your macro should be modified to contain:

#define X(n) {int32x4_t v; v = vld1q_s32((const int32_t*)&p[n]); v =
vaddq_s32(v, a); v = vorrq_s32(v, b); vst1q_s32 ((int32_t*)&p[n], v);}

There are still problems after doing this, however.  In particular the compiler
is not correctly tracking alias information for the load/store intrinsics,
which means it is unable to move stores past loads to reduce stalls in the
pipeline.

The stack wastage appears to be fixed in trunk gcc; at least I don't see any
stack allocation for your testcase.

I haven't looked into the scheduling issues at this time.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
  2010-09-29 20:50 ` [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics rearnsha at gcc dot gnu.org
@ 2010-10-04 23:00 ` siarhei.siamashka at gmail dot com
  2010-10-04 23:46 ` joseph at codesourcery dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2010-10-04 23:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #2 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-04 22:59:56 UTC ---
(In reply to comment #1)
> So the compiler is correct not to be using vld1 for this code.  The memory
> format of int32x4_t is defined to be the format of a neon register that has
> been filled from an array of int32 values and then stored to memory using VSTM
> (or equivalent sequence).  The implication of all this is that int32x4_t does
> not (necessarily) have the same memory layout as int32_t[4].

Could you elaborate on this? Specifically about the case when memory format for
VSTM and VST1 may differ.

I thought that VST1 instruction could be always used as a replacement for VSTM,
it is just a little bit less convenient in some cases because it is lacking
some more advanced addressing modes. Moreover, VSTM is VFP instruction and VST1
is NEON one. So I guess mixing VSTM with true NEON instructions may be
additionally a bad idea (for performance reasons on Cortex-A9 or other
processors?).

There also used to be FLDMX/FSTMX instructions, but they are deprecated now. I
believe they existed specifically to reserve the use of normal VFP load/store
instructions for floating point data formats only, but later this turned out to
be unnecessary.

> arm_neon.h provides intrinsics for filling neon registers from arrays in
> memory, and in this case I think you should be using these directly.  That is,
> your macro should be modified to contain:
> 
> #define X(n) {int32x4_t v; v = vld1q_s32((const int32_t*)&p[n]); v =
> vaddq_s32(v, a); v = vorrq_s32(v, b); vst1q_s32 ((int32_t*)&p[n], v);}

I'm sorry, but this looks like a completely unjustified limitation to me. Why
intrinsics should be so much more difficult and less intuitive to use than just
inline assembly? Additionally, gcc allows to use normal arithmetic operations
on vector data types, something like:

void x(int32x4_t a, int32x4_t b, int32x4_t *p)
{
        #define X(n) p[n] += a; p[n] |= b;
        X(0); X(1); X(2); X(3); X(4); X(5); X(6); X(7);
        X(8); X(9); X(10); X(11); X(12);
}

> There are still problems after doing this, however.  In particular the compiler
> is not correctly tracking alias information for the load/store intrinsics,
> which means it is unable to move stores past loads to reduce stalls in the
> pipeline.

OK, thanks for the explanation.

> The stack wastage appears to be fixed in trunk gcc; at least I don't see any
> stack allocation for your testcase.

Yes, looks like it got a little bit better. Anyway stack allocation shows up
again after adding just a few more of these X() macros:
... X(13); X(14); X(15); X(16); ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
  2010-09-29 20:50 ` [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics rearnsha at gcc dot gnu.org
  2010-10-04 23:00 ` siarhei.siamashka at gmail dot com
@ 2010-10-04 23:46 ` joseph at codesourcery dot com
  2010-10-05  7:16 ` ramana at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: joseph at codesourcery dot com @ 2010-10-04 23:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #3 from joseph at codesourcery dot com <joseph at codesourcery dot com> 2010-10-04 23:45:57 UTC ---
On Mon, 4 Oct 2010, siarhei.siamashka at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725
> 
> --- Comment #2 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-04 22:59:56 UTC ---
> (In reply to comment #1)
> > So the compiler is correct not to be using vld1 for this code.  The memory
> > format of int32x4_t is defined to be the format of a neon register that has
> > been filled from an array of int32 values and then stored to memory using VSTM
> > (or equivalent sequence).  The implication of all this is that int32x4_t does
> > not (necessarily) have the same memory layout as int32_t[4].
> 
> Could you elaborate on this? Specifically about the case when memory format for
> VSTM and VST1 may differ.

Big-endian.

I previously explained the issues with big-endian NEON vectors in GCC at 
length:

http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2010-10-04 23:46 ` joseph at codesourcery dot com
@ 2010-10-05  7:16 ` ramana at gcc dot gnu.org
  2010-10-08 14:13 ` siarhei.siamashka at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ramana at gcc dot gnu.org @ 2010-10-05  7:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2010-10-05 07:16:35 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > So the compiler is correct not to be using vld1 for this code.  The memory
> > format of int32x4_t is defined to be the format of a neon register that has
> > been filled from an array of int32 values and then stored to memory using VSTM
> > (or equivalent sequence).  The implication of all this is that int32x4_t does
> > not (necessarily) have the same memory layout as int32_t[4].
> 
> Could you elaborate on this? Specifically about the case when memory format for
> VSTM and VST1 may differ.
> 
> I thought that VST1 instruction could be always used as a replacement for VSTM,
> it is just a little bit less convenient in some cases because it is lacking
> some more advanced addressing modes. Moreover, VSTM is VFP instruction and VST1
> is NEON one. So I guess mixing VSTM with true NEON instructions may be
> additionally a bad idea (for performance reasons on Cortex-A9 or other
> processors?).

The ARM ARM states that VLDM / VSTM and VLDR / VSTR for 64 bit values are
compliant with VFPv2 / VFPv3 and advanced SIMD i.e. they can be executed by
both the units . Thus there should be no performance regressions on the A9
AFAIK for VLDM and VSTM / VLDR and VSTR of 64 bit registers interleaved with
other Neon instructions. 


cheers
Ramana


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2010-10-05  7:16 ` ramana at gcc dot gnu.org
@ 2010-10-08 14:13 ` siarhei.siamashka at gmail dot com
  2011-06-29 13:35 ` siarhei.siamashka at gmail dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2010-10-08 14:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #5 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-08 14:13:08 UTC ---
(In reply to comment #3)
> On Mon, 4 Oct 2010, siarhei.siamashka at gmail dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725
> > 
> > --- Comment #2 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-04 22:59:56 UTC ---
> > (In reply to comment #1)
> > > So the compiler is correct not to be using vld1 for this code.  The memory
> > > format of int32x4_t is defined to be the format of a neon register that has
> > > been filled from an array of int32 values and then stored to memory using VSTM
> > > (or equivalent sequence).  The implication of all this is that int32x4_t does
> > > not (necessarily) have the same memory layout as int32_t[4].
> > 
> > Could you elaborate on this? Specifically about the case when memory format for
> > VSTM and VST1 may differ.
> 
> Big-endian.

OK, I see. Looks like VLDM/VSTM instructions could be replaced with VLD1/VST1
(by artificially forcing element size to 64) in almost all cases except when
SCTLR.A == 1 due to unwanted alignment traps potentially happening in this
case.

But the question is whether it is really necessary to suffer from a performance
penalty on little endian systems?

> I previously explained the issues with big-endian NEON vectors in GCC at 
> length:
> 
> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html

Thanks for the link, something seems to be seriously overengineered. Looks like
you brought a problem upon yourself and now are trying to valiantly solve it.

Does (efficient) support of NEON intrinsics on big endian systems even have any
practical value? Maybe it makes sense to get a reasonable performance at least
on little endian systems first. To me it looks like you are just running after
two hares...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2010-10-08 14:13 ` siarhei.siamashka at gmail dot com
@ 2011-06-29 13:35 ` siarhei.siamashka at gmail dot com
  2014-07-09 12:26 ` m.zakirov at samsung dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2011-06-29 13:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #6 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2011-06-29 13:35:13 UTC ---
Created attachment 24630
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24630
test.c

Attached a slightly updated testcase, which can demonstrate unnecessary spills
to stack even with more recent versions of gcc as explained in comment 2
earlier (just slightly increased the number of uses for X() macro)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2011-06-29 13:35 ` siarhei.siamashka at gmail dot com
@ 2014-07-09 12:26 ` m.zakirov at samsung dot com
  2014-07-29 11:35 ` m.zakirov at samsung dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: m.zakirov at samsung dot com @ 2014-07-09 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

Marat Zakirov <m.zakirov at samsung dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joseph at codesourcery dot com,
                   |                            |m.zakirov at samsung dot com

--- Comment #7 from Marat Zakirov <m.zakirov at samsung dot com> ---
Another neon alloc issue.

Code:

#include <arm_neon.h>
#include <inttypes.h>

extern  uint16x8x4_t m0;
extern  uint16x8x4_t m1;

void foo(uint16_t * in_ptr)
{
    uint16x8x4_t t0, t1;
    t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]);
    t1 = vld4q_u16((uint16_t *)&in_ptr[64]);
    t0.val[0] *= 333;
    t0.val[1] *= 333;
    t0.val[2] *= 333;
    t0.val[3] *= 333;
    t1.val[0] *= 333;
    t1.val[1] *= 333;
    t1.val[2] *= 333;
    t1.val[3] *= 333;
    m0 = t0;
    m1 = t1;
}

Asm file:

       .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
        add     r1, r0, #160
        vld4.16 {d8, d10, d12, d14}, [r0]
        add     r0, r0, #32
        .pad #64
        sub     sp, sp, #64
        vld4.16 {d16, d18, d20, d22}, [r2]
        movw    r3, #:lower16:m1
        movw    r2, #:lower16:m0
        vldr    d6, .L3
        vldr    d7, .L3+8
        movt    r3, #:upper16:m1
        movt    r2, #:upper16:m0
        vld4.16 {d9, d11, d13, d15}, [r0]
        vld4.16 {d17, d19, d21, d23}, [r1]
        vmul.i16        q12, q3, q4
        vstmia  sp, {d16-d23}      <<< *
        vld1.64 {d4-d5}, [sp:64]   <<< *
        vmul.i16        q13, q3, q5  <<< **
        vmul.i16        q9, q3, q9   
        vmul.i16        q14, q3, q6  <<< **
        vmul.i16        q10, q3, q10
        vmul.i16        q8, q3, q2   <<< **, ***
        vmul.i16        q15, q3, q7  <<< **
        vmul.i16        q11, q3, q11
        vstmia  r2, {d24-d31}
        vstmia  r3, {d16-d23}
        add     sp, sp, #64
        @ sp needed
        fldmfdd sp!, {d8-d15}
        bx      lr

So my qustion are:
1) Why do we need * and why compiler used q2 in *** ?
2) Why compiler didn't reuse registers q5,q6,q2,q7 in ** ?

Command line:

cc1 -quiet -v t.c -quiet -dumpbase t.c -mfpu=neon -mcpu=cortex-a15
-mfloat-abi=softfp -marm -mtls-dialect=gnu -auxbase-strip t.s -O3
-Wno-error=unused-local-typedefs -version -fdump-tree-all -fdump-rtl-all
-funwind-tables -o t.s

gcc version = 4.10.0
--build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu
--target=arm-v7a15v5r2-linux-gnueabi

--Marat


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2014-07-09 12:26 ` m.zakirov at samsung dot com
@ 2014-07-29 11:35 ` m.zakirov at samsung dot com
  2014-07-29 11:46 ` m.zakirov at samsung dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: m.zakirov at samsung dot com @ 2014-07-29 11:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #8 from Marat Zakirov <m.zakirov at samsung dot com> ---
UPDATE

Using little fix you may got a much better code...

transpose_16x16:
        .fnstart
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        add     r2, r0, #128
        vld4.16 {d24, d26, d28, d30}, [r0]
        add     r1, r0, #160
        vld4.16 {d16, d18, d20, d22}, [r2]
        add     r0, r0, #32
        movw    r3, #:lower16:m1
        vldr    d6, .L2
        vldr    d7, .L2+8(in CSE)
        movw    r2, #:lower16:m0
        movt    r3, #:upper16:m1
        movt    r2, #:upper16:m0
        vld4.16 {d25, d27, d29, d31}, [r0]
        vld4.16 {d17, d19, d21, d23}, [r1]
        vmul.i16        q12, q3, q12
        vmul.i16        q8, q3, q8
        vmul.i16        q13, q3, q13
        vmul.i16        q9, q3, q9
        vmul.i16        q14, q3, q14
        vmul.i16        q10, q3, q10
        vmul.i16        q15, q3, q15
        vmul.i16        q11, q3, q11
        vstmia  r2, {d24-d31}
        vstmia  r3, {d16-d23}
        bx      lr
.L3:

About fix:

I discovered that GCC register allocator has 'weak' support for stream (in my
case NEON) registers. RA works with stream resgisters as with unsplitible
ranges. So if some register of range becomes free GCC do not reuse them untill
whole range becomes free.

Is actually OK, but...

I found that GCC CSE phase makes partly substitution for register-ranges and
this leads to terrible register pressure increse.

Example

Before CSE

a = b
a0 = a0 * 3
a1 = a1 * 3
a2 = a2 * 3
a3 = a3 * 3

After

a = b
a0 = b0 * 3 
a1 = a1 * 3 <<< *
a2 = a2 * 3
a3 = a3 * 3

CSE do not substitute b1 to a1 because at the moment (*) a0 was define so
actually a != b. Yes but a1 = b1, unfortuanatly CSE also do not how to handle
register-ranges parts as RA does. And I am not sure that 'unfortuanatly'.

Because.

a0 = b0 * 3 
a1 = b1 * 3
a2 = b2 * 3
a3 = b3 * 3

Also requres x2 more stream registers than its really need to.

My solution here is to forbid CSE for XImode registers.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2014-07-29 11:35 ` m.zakirov at samsung dot com
@ 2014-07-29 11:46 ` m.zakirov at samsung dot com
  2014-08-20 16:44 ` mkuvyrkov at gcc dot gnu.org
  2021-09-27  7:21 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: m.zakirov at samsung dot com @ 2014-07-29 11:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #9 from Marat Zakirov <m.zakirov at samsung dot com> ---
I used following patch

diff --git a/gcc/cse.c b/gcc/cse.c
index 34f9364..a9e0442 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -2862,6 +2862,9 @@ canon_reg (rtx x, rtx insn)
            || ! REGNO_QTY_VALID_P (REGNO (x)))
          return x;

+        if (GET_MODE (x) == XImode)
+          return x;
+
        q = REG_QTY (REGNO (x));
        ent = &qty_table[q];
        first = ent->first_reg;
diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 547fcd6..eadc729 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -1317,6 +1317,9 @@ forward_propagate_and_simplify (df_ref use, rtx def_insn,
rtx def_set)
   if (!new_rtx)
     return false;

+  if (GET_MODE (reg) == XImode)
+    return false;
+
   return try_fwprop_subst (use, loc, new_rtx, def_insn, set_reg_equal);
 }


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2014-07-29 11:46 ` m.zakirov at samsung dot com
@ 2014-08-20 16:44 ` mkuvyrkov at gcc dot gnu.org
  2021-09-27  7:21 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2014-08-20 16:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |mkuvyrkov at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |mkuvyrkov at gcc dot gnu.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
       [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2014-08-20 16:44 ` mkuvyrkov at gcc dot gnu.org
@ 2021-09-27  7:21 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-27  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
  2010-04-12  7:27 [Bug target/43725] New: " siarhei dot siamashka at gmail dot com
@ 2010-05-11  7:35 ` ramana at gcc dot gnu dot org
  0 siblings, 0 replies; 12+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-05-11  7:35 UTC (permalink / raw)
  To: gcc-bugs



-- 

ramana at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2010-05-11 07:35:23
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-27  7:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-43725-4@http.gcc.gnu.org/bugzilla/>
2010-09-29 20:50 ` [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics rearnsha at gcc dot gnu.org
2010-10-04 23:00 ` siarhei.siamashka at gmail dot com
2010-10-04 23:46 ` joseph at codesourcery dot com
2010-10-05  7:16 ` ramana at gcc dot gnu.org
2010-10-08 14:13 ` siarhei.siamashka at gmail dot com
2011-06-29 13:35 ` siarhei.siamashka at gmail dot com
2014-07-09 12:26 ` m.zakirov at samsung dot com
2014-07-29 11:35 ` m.zakirov at samsung dot com
2014-07-29 11:46 ` m.zakirov at samsung dot com
2014-08-20 16:44 ` mkuvyrkov at gcc dot gnu.org
2021-09-27  7:21 ` pinskia at gcc dot gnu.org
2010-04-12  7:27 [Bug target/43725] New: " siarhei dot siamashka at gmail dot com
2010-05-11  7:35 ` [Bug target/43725] " ramana at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).