public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] ARM/NEON: vld1q_dup_s64 builtin
@ 2012-05-09 10:18 Christophe Lyon
  2012-05-10 11:41 ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2012-05-09 10:18 UTC (permalink / raw)
  To: gcc-patches

Hello,

On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins currently fails to load the second vector element.

Here is a small patch to address this problem:

2012-05-07  Christophe Lyon <christophe.lyon@st.com>

     * gcc/config/arm/neon.md (neon_vld1_dup): Fix vld1q_dup_s64.

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md    (revision 2659)
+++ gcc/config/arm/neon.md    (revision 2660)
@@ -4203,7 +4203,7 @@
    if (GET_MODE_NUNITS (<MODE>mode) > 2)
      return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
    else
-    return "vld1.<V_sz_elem>\t%h0, %A1";
+    return "vld1.<V_sz_elem>\t%e0, %A1 \;vmov\t%f0, %e0";
  }
    [(set (attr "neon_type")
        (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))

OK?

Thanks,

Christophe.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-09 10:18 [PATCH] ARM/NEON: vld1q_dup_s64 builtin Christophe Lyon
@ 2012-05-10 11:41 ` Ramana Radhakrishnan
  2012-05-10 15:32   ` Christophe Lyon
  0 siblings, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2012-05-10 11:41 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On 9 May 2012 11:18, Christophe Lyon <christophe.lyon@st.com> wrote:
> Hello,
>
> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins
> currently fails to load the second vector element.

Thanks for the patch but this is not acceptable as it stands today.
You need to set the length attributes in this case to 8 for the
appropriate alternative at the very least. You also don't mention how
this patch was tested. Alternatively it might be worth splitting the
vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
followed by a subreg to subreg move which should end up having the
same effect . That splitting would allow for better instruction
scheduling. In addition it would be nice to have a testcase in
gcc.target/arm .

As a follow up patch I'd like these patterns merged with the vdup_n
patterns in neon.md (allowing them to grow a memory operand variant)
which should then allow merging of (I think)

scalarval = scalar_load ()
vreg = vdup ( scalarval)

into

vreg = vld1_dup_n ( scalar_address).

Thanks,
Ramana

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-10 11:41 ` Ramana Radhakrishnan
@ 2012-05-10 15:32   ` Christophe Lyon
  2012-05-10 15:52     ` Julian Brown
  2012-05-11 14:48     ` Ramana Radhakrishnan
  0 siblings, 2 replies; 13+ messages in thread
From: Christophe Lyon @ 2012-05-10 15:32 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

On 10.05.2012 13:41, Ramana Radhakrishnan wrote:
> On 9 May 2012 11:18, Christophe Lyon<christophe.lyon@st.com>  wrote:
>> Hello,
>>
>> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins
>> currently fails to load the second vector element.
> Thanks for the patch but this is not acceptable as it stands today.
> You need to set the length attributes in this case to 8 for the
> appropriate alternative at the very least.
OK I'll look at this.

> You also don't mention how this patch was tested.
I used the testsuite I developed some time ago to test all the Neon builtins, which I posted last year on the qemu mailing-list. With the current GCCs, this bug is the only remaining one I could detect.

>   Alternatively it might be worth splitting the
> vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
> followed by a subreg to subreg move which should end up having the
> same effect . That splitting would allow for better instruction
> scheduling.
Are you aware of examples of similar cases I could use as a model?

>   In addition it would be nice to have a testcase in
> gcc.target/arm .
Well. Prior to sending my patch I did look at that directory, but I supposed that such a test ought to belong to the neon/ subdir where the tests are described as autogenerated. Any doc on how to do that?

Thanks,

Christophe.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-10 15:32   ` Christophe Lyon
@ 2012-05-10 15:52     ` Julian Brown
  2012-05-11 14:48     ` Ramana Radhakrishnan
  1 sibling, 0 replies; 13+ messages in thread
From: Julian Brown @ 2012-05-10 15:52 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Ramana Radhakrishnan, gcc-patches

On Thu, 10 May 2012 17:31:43 +0200
Christophe Lyon <christophe.lyon@st.com> wrote:

> On 10.05.2012 13:41, Ramana Radhakrishnan wrote:
> > On 9 May 2012 11:18, Christophe Lyon<christophe.lyon@st.com>  wrote:
> >> Hello,
> >>
> >> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64()
> >> builtins currently fails to load the second vector element.
> > Thanks for the patch but this is not acceptable as it stands today.
> > You need to set the length attributes in this case to 8 for the
> > appropriate alternative at the very least.
> OK I'll look at this.
> 
> > You also don't mention how this patch was tested.
> I used the testsuite I developed some time ago to test all the Neon
> builtins, which I posted last year on the qemu mailing-list. With the
> current GCCs, this bug is the only remaining one I could detect.
> 
> >   Alternatively it might be worth splitting the
> > vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
> > followed by a subreg to subreg move which should end up having the
> > same effect . That splitting would allow for better instruction
> > scheduling.
> Are you aware of examples of similar cases I could use as a model?
> 
> >   In addition it would be nice to have a testcase in
> > gcc.target/arm .
> Well. Prior to sending my patch I did look at that directory, but I
> supposed that such a test ought to belong to the neon/ subdir where
> the tests are described as autogenerated. Any doc on how to do that?

I'd recommend not to autogenerate such a test, FWIW -- the
autogenerated neon tests aren't very good. I think a manually-written
execute test would be better in this case.

If you do try autogenerating tests, look at "Disassembles_as" in
neon.ml, and neon-testgen.ml.

Julian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-10 15:32   ` Christophe Lyon
  2012-05-10 15:52     ` Julian Brown
@ 2012-05-11 14:48     ` Ramana Radhakrishnan
  2012-05-16 13:51       ` Christophe Lyon
  1 sibling, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2012-05-11 14:48 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

>
>
>> You also don't mention how this patch was tested.
>
> I used the testsuite I developed some time ago to test all the Neon
> builtins, which I posted last year on the qemu mailing-list. With the
> current GCCs, this bug is the only remaining one I could detect.
>

Fair enough.


>
>>  Alternatively it might be worth splitting the
>> vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
>> followed by a subreg to subreg move which should end up having the
>> same effect . That splitting would allow for better instruction
>> scheduling.
>
> Are you aware of examples of similar cases I could use as a model?

I would change the iterator from VQX to VQ in the pattern above (you
can also simplify the setting of neon_type in that case as well as
change that to be a vec_duplicate as below and get rid of any
lingering definitions of UNSPEC_VLD1_DUP if they exist), define a
separate pattern that expressed this as a define_insn_and_split as
below.

 (define_insn_and_split "neon_vld1_dupv2di"
   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
     (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
   "TARGET_NEON"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
    rtx tmprtx = gen_lowpart (DImode, operands[0]);
    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
    DONE;
    }
(set_attr "length" "8")
(set_attr "neon_type" "<fromearlierpattern">)
)

Do you want to try this and see what you get ?

>
>
>>  In addition it would be nice to have a testcase in
>> gcc.target/arm .
>
> Well. Prior to sending my patch I did look at that directory, but I supposed
> that such a test ought to belong to the neon/ subdir where the tests are
> described as autogenerated. Any doc on how to do that?

 I'd rather have an extra regression test in gcc.target/arm that was a
run time test. for e.g. take a look at gcc.target/arm/neon-vadds64.c .

Ramana

>
> Thanks,
>
> Christophe.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-11 14:48     ` Ramana Radhakrishnan
@ 2012-05-16 13:51       ` Christophe Lyon
  2012-05-18 22:46         ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2012-05-16 13:51 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

On 11.05.2012 16:48, Ramana Radhakrishnan wrote:
> I would change the iterator from VQX to VQ in the pattern above (you
> can also simplify the setting of neon_type in that case as well as
> change that to be a vec_duplicate as below and get rid of any
> lingering definitions of UNSPEC_VLD1_DUP if they exist), define a
> separate pattern that expressed this as a define_insn_and_split as
> below.
>
>   (define_insn_and_split "neon_vld1_dupv2di"
>     [(set (match_operand:V2DI 0 "s_register_operand" "=w")
>       (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
>     "TARGET_NEON"
>     "#"
>     "&&  reload_completed"
>     [(const_int 0)]
>     {
>      rtx tmprtx = gen_lowpart (DImode, operands[0]);
>      emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
>      emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
>      DONE;
>      }
> (set_attr "length" "8")
> (set_attr "neon_type" "<fromearlierpattern">)
> )
>
> Do you want to try this and see what you get ?

Thanks for this example and suggestion, it does work.

> I'd rather have an extra regression test in gcc.target/arm that was a run time test. for e.g. take a look at gcc.target/arm/neon-vadds64.c . 

Here is an updated patch:

2012-05-16  Christophe Lyon <christophe.lyon@st.com>

     * gcc/config/arm/neon.md (neon_vld1_dup): Restrict to VQ
     operands.
     (neon_vld1_dupv2di): New, fixes vld1q_dup_s64.
     * gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c: New test.

Index: gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c    (revision 0)
+++ gcc.new/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c    (revision 0)
@@ -0,0 +1,24 @@
+/* Test the `vld1q_s64' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O0" } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+#include <stdlib.h>
+
+int main (void)
+{
+  int64x1_t input[2] = {(int64x1_t)0x0123456776543210LL,
+            (int64x1_t)0x89abcdeffedcba90LL};
+  int64x1_t output[2] = {0, 0};
+  int64x2_t var = vld1q_dup_s64(input);
+
+  vst1q_s64(output, var);
+  if (output[0] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  if (output[1] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  return 0;
+}
Index: gcc/config/arm/neon.md
===================================================================
--- gcc.orig/gcc/config/arm/neon.md    (revision 2659)
+++ gcc.new/gcc/config/arm/neon.md    (working copy)
@@ -4195,20 +4195,32 @@
  )

  (define_insn "neon_vld1_dup<mode>"
-  [(set (match_operand:VQX 0 "s_register_operand" "=w")
-        (unspec:VQX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
+  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+        (unspec:VQ [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
                      UNSPEC_VLD1_DUP))]
    "TARGET_NEON"
  {
-  if (GET_MODE_NUNITS (<MODE>mode) > 2)
      return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
-  else
-    return "vld1.<V_sz_elem>\t%h0, %A1";
  }
    [(set (attr "neon_type")
-      (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))
-                    (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")
-                    (const_string "neon_vld1_1_2_regs")))]
+      (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
+)
+
+(define_insn_and_split "neon_vld1_dupv2di"
+   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+    (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
+   "TARGET_NEON"
+   "#"
+   "&& reload_completed"
+   [(const_int 0)]
+   {
+    rtx tmprtx = gen_lowpart (DImode, operands[0]);
+    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
+    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
+    DONE;
+    }
+  [(set_attr "length" "8")
+   (set (attr "neon_type") (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
  )

  (define_expand "vec_store_lanes<mode><mode>"



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-16 13:51       ` Christophe Lyon
@ 2012-05-18 22:46         ` Ramana Radhakrishnan
  2012-05-21  9:16           ` Christophe Lyon
  0 siblings, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2012-05-18 22:46 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On 16 May 2012 14:51, Christophe Lyon <christophe.lyon@st.com> wrote:
> On 11.05.2012 16:48, Ramana Radhakrishnan wrote:
>>
>> I would change the iterator from VQX to VQ in the pattern above (you
>> can also simplify the setting of neon_type in that case as well as
>> change that to be a vec_duplicate as below and get rid of any
>> lingering definitions of UNSPEC_VLD1_DUP if they exist), define a
>> separate pattern that expressed this as a define_insn_and_split as
>> below.
>>
>>  (define_insn_and_split "neon_vld1_dupv2di"
>>    [(set (match_operand:V2DI 0 "s_register_operand" "=w")
>>      (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand"
>> "Um")))]
>>    "TARGET_NEON"
>>    "#"
>>    "&&  reload_completed"
>>    [(const_int 0)]
>>    {
>>     rtx tmprtx = gen_lowpart (DImode, operands[0]);
>>     emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
>>     emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
>>     DONE;
>>     }
>> (set_attr "length" "8")
>> (set_attr "neon_type" "<fromearlierpattern">)
>> )
>>
>> Do you want to try this and see what you get ?
>
>
> Thanks for this example and suggestion, it does work.
>
>
>> I'd rather have an extra regression test in gcc.target/arm that was a run
>> time test. for e.g. take a look at gcc.target/arm/neon-vadds64.c .
>
>
> Here is an updated patch:

I tried applying your patch but ran into trouble with patch not liking
this . My suspicion is mailer munging white spaces in some form -
Could you send the patch as an attachment please rather than inline in
your mail ?

regards,
Ramana
> 2012-05-16  Christophe Lyon <christophe.lyon@st.com>
>
>    * gcc/config/arm/neon.md (neon_vld1_dup): Restrict to VQ
>    operands.
>    (neon_vld1_dupv2di): New, fixes vld1q_dup_s64.
>    * gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c: New test.
>
> Index: gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
> ===================================================================
> --- gcc.orig/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c    (revision 0)
> +++ gcc.new/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c    (revision 0)
> @@ -0,0 +1,24 @@
> +/* Test the `vld1q_s64' ARM Neon intrinsic.  */
> +
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-options "-O0" } */
> +/* { dg-add-options arm_neon } */
> +
> +#include "arm_neon.h"
> +#include <stdlib.h>
> +
> +int main (void)
> +{
> +  int64x1_t input[2] = {(int64x1_t)0x0123456776543210LL,
> +            (int64x1_t)0x89abcdeffedcba90LL};
> +  int64x1_t output[2] = {0, 0};
> +  int64x2_t var = vld1q_dup_s64(input);
> +
> +  vst1q_s64(output, var);
> +  if (output[0] != (int64x1_t)0x0123456776543210LL)
> +    abort();
> +  if (output[1] != (int64x1_t)0x0123456776543210LL)
> +    abort();
> +  return 0;
> +}
> Index: gcc/config/arm/neon.md
> ===================================================================
> --- gcc.orig/gcc/config/arm/neon.md    (revision 2659)
> +++ gcc.new/gcc/config/arm/neon.md    (working copy)
> @@ -4195,20 +4195,32 @@
>  )
>
>  (define_insn "neon_vld1_dup<mode>"
> -  [(set (match_operand:VQX 0 "s_register_operand" "=w")
> -        (unspec:VQX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
> +  [(set (match_operand:VQ 0 "s_register_operand" "=w")
> +        (unspec:VQ [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
>                     UNSPEC_VLD1_DUP))]
>   "TARGET_NEON"
>  {
> -  if (GET_MODE_NUNITS (<MODE>mode) > 2)
>
>     return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
> -  else
>
> -    return "vld1.<V_sz_elem>\t%h0, %A1";
>  }
>   [(set (attr "neon_type")
> -      (if_then_else (gt (const_string "<V_mode_nunits>") (const_string
> "1"))
> -                    (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")
> -                    (const_string "neon_vld1_1_2_regs")))]
> +      (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
> +)
> +
> +(define_insn_and_split "neon_vld1_dupv2di"
> +   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
> +    (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
> +   "TARGET_NEON"
> +   "#"
> +   "&& reload_completed"
> +   [(const_int 0)]
> +   {
> +    rtx tmprtx = gen_lowpart (DImode, operands[0]);
> +    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
> +    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
> +    DONE;
> +    }
> +  [(set_attr "length" "8")
> +   (set (attr "neon_type") (const_string
> "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
>  )
>
>  (define_expand "vec_store_lanes<mode><mode>"
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-18 22:46         ` Ramana Radhakrishnan
@ 2012-05-21  9:16           ` Christophe Lyon
  2012-05-25 14:48             ` Christophe Lyon
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2012-05-21  9:16 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

> I tried applying your patch but ran into trouble with patch not liking
> this . My suspicion is mailer munging white spaces in some form -
> Could you send the patch as an attachment please rather than inline in
> your mail ?
>
> regards,
> Ramana
>
Here it is, as an attachment. Note however that this patch is against GCC-4.6.3.

Thanks for testing.

Christophe.


[-- Attachment #2: gcc-4.6.3-arm-vld1q_dup_s64.patch --]
[-- Type: text/plain, Size: 2860 bytes --]

2012-05-16  Christophe Lyon  <christophe.lyon@st.com>

	* gcc/config/arm/neon.md (neon_vld1_dup): Restrict to VQ
	operands.
	(neon_vld1_dupv2di): New, fixes vld1q_dup_s64.
	* gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c: New test.

 2012-04-25  Christophe Lyon  <christophe.lyon@st.com>
 
 	Fix codex #161546:
Index: gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c	(revision 0)
+++ gcc.new/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c	(revision 0)
@@ -0,0 +1,24 @@
+/* Test the `vld1q_s64' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O0" } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+#include <stdlib.h>
+
+int main (void)
+{
+  int64x1_t input[2] = {(int64x1_t)0x0123456776543210LL,
+			(int64x1_t)0x89abcdeffedcba90LL};
+  int64x1_t output[2] = {0, 0};
+  int64x2_t var = vld1q_dup_s64(input);
+
+  vst1q_s64(output, var);
+  if (output[0] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  if (output[1] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  return 0;
+}
Index: gcc/config/arm/neon.md
===================================================================
--- gcc.orig/gcc/config/arm/neon.md	(revision 2659)
+++ gcc.new/gcc/config/arm/neon.md	(working copy)
@@ -4195,20 +4195,32 @@
 )
 
 (define_insn "neon_vld1_dup<mode>"
-  [(set (match_operand:VQX 0 "s_register_operand" "=w")
-        (unspec:VQX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
+  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+        (unspec:VQ [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
                     UNSPEC_VLD1_DUP))]
   "TARGET_NEON"
 {
-  if (GET_MODE_NUNITS (<MODE>mode) > 2)
     return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
-  else
-    return "vld1.<V_sz_elem>\t%h0, %A1";
 }
   [(set (attr "neon_type")
-      (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))
-                    (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")
-                    (const_string "neon_vld1_1_2_regs")))]
+      (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
+)
+
+(define_insn_and_split "neon_vld1_dupv2di"
+   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+    (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
+   "TARGET_NEON"
+   "#"
+   "&& reload_completed"
+   [(const_int 0)]
+   {
+    rtx tmprtx = gen_lowpart (DImode, operands[0]);
+    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
+    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
+    DONE;
+    }
+  [(set_attr "length" "8")
+   (set (attr "neon_type") (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
 )
 
 (define_expand "vec_store_lanes<mode><mode>"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-21  9:16           ` Christophe Lyon
@ 2012-05-25 14:48             ` Christophe Lyon
  2012-06-06  9:22               ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2012-05-25 14:48 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

On 21.05.2012 11:16, Christophe Lyon wrote:
>> I tried applying your patch but ran into trouble with patch not liking
>> this . My suspicion is mailer munging white spaces in some form -
>> Could you send the patch as an attachment please rather than inline in
>> your mail ?
>>
>> regards,
>> Ramana
>>
> Here it is, as an attachment. Note however that this patch is against GCC-4.6.3.
>
> Thanks for testing.
>
> Christophe.
>
Hi,
I have attached the version for GCC trunk.

Christophe.


[-- Attachment #2: gcc-arm-vld1q_dup.patch --]
[-- Type: text/plain, Size: 2735 bytes --]

2012-05-25  Christophe Lyon  <christophe.lyon@st.com>

	* gcc/config/arm/neon.md (neon_vld1_dup): Restrict to VQ
	operands.
	(neon_vld1_dupv2di): New, fixes vld1q_dup_s64.
	* gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c: New test.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 4568dea..0a4d00b 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4397,20 +4397,32 @@
 )
 
 (define_insn "neon_vld1_dup<mode>"
-  [(set (match_operand:VQX 0 "s_register_operand" "=w")
-        (unspec:VQX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
+  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+        (unspec:VQ [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
                     UNSPEC_VLD1_DUP))]
   "TARGET_NEON"
 {
-  if (GET_MODE_NUNITS (<MODE>mode) > 2)
-    return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
-  else
-    return "vld1.<V_sz_elem>\t%h0, %A1";
+  return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
 }
   [(set (attr "neon_type")
-      (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))
-                    (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")
-                    (const_string "neon_vld1_1_2_regs")))]
+      (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
+)
+
+(define_insn_and_split "neon_vld1_dupv2di"
+   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+    (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
+   "TARGET_NEON"
+   "#"
+   "&& reload_completed"
+   [(const_int 0)]
+   {
+    rtx tmprtx = gen_lowpart (DImode, operands[0]);
+    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
+    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
+    DONE;
+    }
+  [(set_attr "length" "8")
+   (set (attr "neon_type") (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
 )
 
 (define_expand "vec_store_lanes<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c b/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
new file mode 100644
index 0000000..b5793bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
@@ -0,0 +1,24 @@
+/* Test the `vld1q_s64' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O0" } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+#include <stdlib.h>
+
+int main (void)
+{
+  int64x1_t input[2] = {(int64x1_t)0x0123456776543210LL,
+			(int64x1_t)0x89abcdeffedcba90LL};
+  int64x1_t output[2] = {0, 0};
+  int64x2_t var = vld1q_dup_s64(input);
+
+  vst1q_s64(output, var);
+  if (output[0] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  if (output[1] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-05-25 14:48             ` Christophe Lyon
@ 2012-06-06  9:22               ` Ramana Radhakrishnan
  2012-06-20 13:49                 ` Christophe Lyon
  0 siblings, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-06  9:22 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

Hi Christophe,

Sorry it's taken me a while to get back on this patch - I've been traveling.


> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 4568dea..0a4d00b 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -4397,20 +4397,32 @@
>  )

>  (define_insn "neon_vld1_dup<mode>"
> -  [(set (match_operand:VQX 0 "s_register_operand" "=w")
> -        (unspec:VQX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
> +  [(set (match_operand:VQ 0 "s_register_operand" "=w")
> +        (unspec:VQ [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
>                      UNSPEC_VLD1_DUP))]

Why do we still have UNSPEC:VQ here ? I probably wasn't clear enough
in my earlier mail. There's no reason for this to remain an unspec,
we might as well replace this with a vec_duplicate form as below.

Please do the same with the neon_vld1_dup that iterates over VDX as well.


>    "TARGET_NEON"
>  {
> -  if (GET_MODE_NUNITS (<MODE>mode) > 2)
> -    return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
> -  else
> -    return "vld1.<V_sz_elem>\t%h0, %A1";
> +  return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
>  }
>    [(set (attr "neon_type")

Use the shorter set_attr "neon_type" form . In that case you don't need
a const_string in this case.


> -      (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))
> -                    (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")
> -                    (const_string "neon_vld1_1_2_regs")))]
> +      (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]
> +)
> +
> +(define_insn_and_split "neon_vld1_dupv2di"
> +   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
> +    (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
> +   "TARGET_NEON"
> +   "#"
> +   "&& reload_completed"
> +   [(const_int 0)]
> +   {
> +    rtx tmprtx = gen_lowpart (DImode, operands[0]);
> +    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
> +    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
> +    DONE;
> +    }
> +  [(set_attr "length" "8")
> +   (set (attr "neon_type") (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes"))]

Same comment about set_attr vs set (attr

Ok with those changes.

Ramana

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-06-06  9:22               ` Ramana Radhakrishnan
@ 2012-06-20 13:49                 ` Christophe Lyon
  2012-06-22 18:16                   ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2012-06-20 13:49 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 135 bytes --]

On 06.06.2012 11:00, Ramana Radhakrishnan wrote:
> Ok with those changes. Ramana . 

Hi Ramana,

How about this version?

Christophe.


[-- Attachment #2: gcc-arm-vld1q_dup.patch --]
[-- Type: text/plain, Size: 3436 bytes --]

commit f57ce4b63ca1c30ee88e8c1a431d6e90ffbecb82
Author: Christophe Lyon <christophe.lyon@st.com>
Date:   Wed Jun 20 15:30:50 2012 +0200

    2012-06-20  Christophe Lyon  <christophe.lyon@st.com>
    
    	* gcc/config/arm/neon.md (UNSPEC_VLD1_DUP): Remove.
    	(neon_vld1_dup): Restrict to VQ operands.
    	(neon_vld1_dupv2di): New, fixes vld1q_dup_s64.
    	* gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c: New test.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 4568dea..b3b925c 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -45,7 +45,6 @@
   UNSPEC_VHADD
   UNSPEC_VHSUB
   UNSPEC_VLD1
-  UNSPEC_VLD1_DUP
   UNSPEC_VLD1_LANE
   UNSPEC_VLD2
   UNSPEC_VLD2_DUP
@@ -4381,8 +4380,7 @@
 
 (define_insn "neon_vld1_dup<mode>"
   [(set (match_operand:VDX 0 "s_register_operand" "=w")
-        (unspec:VDX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
-                    UNSPEC_VLD1_DUP))]
+        (vec_duplicate:VDX (match_operand:<V_elem> 1 "neon_struct_operand" "Um")))]
   "TARGET_NEON"
 {
   if (GET_MODE_NUNITS (<MODE>mode) > 1)
@@ -4397,20 +4395,30 @@
 )
 
 (define_insn "neon_vld1_dup<mode>"
-  [(set (match_operand:VQX 0 "s_register_operand" "=w")
-        (unspec:VQX [(match_operand:<V_elem> 1 "neon_struct_operand" "Um")]
-                    UNSPEC_VLD1_DUP))]
+  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+        (vec_duplicate:VQ (match_operand:<V_elem> 1 "neon_struct_operand" "Um")))]
   "TARGET_NEON"
 {
-  if (GET_MODE_NUNITS (<MODE>mode) > 2)
-    return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
-  else
-    return "vld1.<V_sz_elem>\t%h0, %A1";
+  return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
 }
-  [(set (attr "neon_type")
-      (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))
-                    (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")
-                    (const_string "neon_vld1_1_2_regs")))]
+  [(set_attr "neon_type" "neon_vld2_2_regs_vld1_vld2_all_lanes")]
+)
+
+(define_insn_and_split "neon_vld1_dupv2di"
+   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+    (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
+   "TARGET_NEON"
+   "#"
+   "&& reload_completed"
+   [(const_int 0)]
+   {
+    rtx tmprtx = gen_lowpart (DImode, operands[0]);
+    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
+    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
+    DONE;
+    }
+  [(set_attr "length" "8")
+   (set_attr "neon_type" "neon_vld2_2_regs_vld1_vld2_all_lanes")]
 )
 
 (define_expand "vec_store_lanes<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c b/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
new file mode 100644
index 0000000..b5793bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c
@@ -0,0 +1,24 @@
+/* Test the `vld1q_s64' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O0" } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+#include <stdlib.h>
+
+int main (void)
+{
+  int64x1_t input[2] = {(int64x1_t)0x0123456776543210LL,
+			(int64x1_t)0x89abcdeffedcba90LL};
+  int64x1_t output[2] = {0, 0};
+  int64x2_t var = vld1q_dup_s64(input);
+
+  vst1q_s64(output, var);
+  if (output[0] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  if (output[1] != (int64x1_t)0x0123456776543210LL)
+    abort();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-06-20 13:49                 ` Christophe Lyon
@ 2012-06-22 18:16                   ` Ramana Radhakrishnan
  2012-06-25 20:34                     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-22 18:16 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On 20 June 2012 14:37, Christophe Lyon <christophe.lyon@st.com> wrote:
> On 06.06.2012 11:00, Ramana Radhakrishnan wrote:
>>
>> Ok with those changes. Ramana .
>
>
> Hi Ramana,
>
> How about this version?
>
> Christophe.
>

OK  -

This should also go into the release branches as it fixes wrong code
with an intrinsic .

Thanks,
Ramana

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
  2012-06-22 18:16                   ` Ramana Radhakrishnan
@ 2012-06-25 20:34                     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 13+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-25 20:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Christophe Lyon

On 22 June 2012 18:58, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 20 June 2012 14:37, Christophe Lyon <christophe.lyon@st.com> wrote:
>> On 06.06.2012 11:00, Ramana Radhakrishnan wrote:
>>>
>>> Ok with those changes. Ramana .
>>
>>
>> Hi Ramana,
>>
>> How about this version?
>>
>> Christophe.
>>
>
> OK  -
>
> This should also go into the release branches as it fixes wrong code
> with an intrinsic .

I have now applied this to trunk for Christophe.

Ramana


>
> Thanks,
> Ramana

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-06-25 20:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-09 10:18 [PATCH] ARM/NEON: vld1q_dup_s64 builtin Christophe Lyon
2012-05-10 11:41 ` Ramana Radhakrishnan
2012-05-10 15:32   ` Christophe Lyon
2012-05-10 15:52     ` Julian Brown
2012-05-11 14:48     ` Ramana Radhakrishnan
2012-05-16 13:51       ` Christophe Lyon
2012-05-18 22:46         ` Ramana Radhakrishnan
2012-05-21  9:16           ` Christophe Lyon
2012-05-25 14:48             ` Christophe Lyon
2012-06-06  9:22               ` Ramana Radhakrishnan
2012-06-20 13:49                 ` Christophe Lyon
2012-06-22 18:16                   ` Ramana Radhakrishnan
2012-06-25 20:34                     ` Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).