[AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
@ 2015-09-09  8:53 James Greenhalgh
  2015-09-09 10:31 ` Christophe Lyon
  2015-09-09 11:43 ` Alan Lawrence
  0 siblings, 2 replies; 10+ messages in thread
From: James Greenhalgh @ 2015-09-09  8:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: christophe.lyon, marcus.shawcroft, tejas.belagod, alan.lawrence

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]


Hi,

This patch clears up some remaining confusion in the vector lane orderings
for the two intrinsics mentioned in the title.

Bootstrapped on aarch64-none-linux-gnu and regression tested for
aarch64_be-none-elf with no issues.

OK?

Thanks,
James

---
2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite
	as an expand.
	(vec_unpacks_hi_v4sf):  Likewise.
	(aarch64_float_extend_lo_v2df): Rename to...
	(aarch64_fcvtl_v4sf): This.
	(aarch64_fcvtl2_v4sf): New.
	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
	(aarch64_float_truncate_hi_v4sf_le): New.
	(aarch64_float_truncate_hi_v4sf_be): Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-Fix-vcvt_high_f64_f32-and-vcvt_figh_f32_f64-.patch --]
[-- Type: text/x-patch;  name=0001-AArch64-Fix-vcvt_high_f64_f32-and-vcvt_figh_f32_f64-.patch, Size: 4035 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 75fa0ab..c7ae956 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1691,39 +1691,65 @@
 
 ;; Float widening operations.
 
-(define_insn "vec_unpacks_lo_v4sf"
+(define_insn "aarch64_float_extend_lo_v2df"
   [(set (match_operand:V2DF 0 "register_operand" "=w")
 	(float_extend:V2DF
-	  (vec_select:V2SF
-	    (match_operand:V4SF 1 "register_operand" "w")
-	    (parallel [(const_int 0) (const_int 1)])
-	  )))]
+	  (match_operand:V2SF 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "fcvtl\\t%0.2d, %1.2s"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "aarch64_float_extend_lo_v2df"
+(define_insn "aarch64_fcvtl_v4sf"
   [(set (match_operand:V2DF 0 "register_operand" "=w")
 	(float_extend:V2DF
-	  (match_operand:V2SF 1 "register_operand" "w")))]
+	  (vec_select:V2SF
+	    (match_operand:V4SF 1 "register_operand" "w")
+	    (match_operand:V4SF 2 "vect_par_cnst_lo_half" ""))))]
   "TARGET_SIMD"
   "fcvtl\\t%0.2d, %1.2s"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "vec_unpacks_hi_v4sf"
+(define_insn "aarch64_fcvtl2_v4sf"
   [(set (match_operand:V2DF 0 "register_operand" "=w")
 	(float_extend:V2DF
 	  (vec_select:V2SF
 	    (match_operand:V4SF 1 "register_operand" "w")
-	    (parallel [(const_int 2) (const_int 3)])
-	  )))]
+	    (match_operand:V4SF 2 "vect_par_cnst_hi_half" ""))))]
   "TARGET_SIMD"
   "fcvtl2\\t%0.2d, %1.4s"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+(define_expand "vec_unpacks_lo_v4sf"
+  [(match_operand:V2DF 0 "register_operand" "=w")
+   (match_operand:V4SF 1 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V4SFmode, false);
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_fcvtl2_v4sf
+			     : gen_aarch64_fcvtl_v4sf;
+  emit_insn (gen (operands[0], operands[1], p));
+  DONE;
+}
+)
+
+(define_expand "vec_unpacks_hi_v4sf"
+  [(match_operand:V2DF 0 "register_operand" "=w")
+   (match_operand:V4SF 1 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V4SFmode, true);
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_fcvtl_v4sf
+			     : gen_aarch64_fcvtl2_v4sf;
+  emit_insn (gen (operands[0], operands[1], p));
+  DONE;
+}
+)
+
 ;; Float narrowing operations.
 
 (define_insn "aarch64_float_truncate_lo_v2sf"
@@ -1735,17 +1761,42 @@
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
-(define_insn "aarch64_float_truncate_hi_v4sf"
+(define_insn "aarch64_float_truncate_hi_v4sf_le"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
     (vec_concat:V4SF
       (match_operand:V2SF 1 "register_operand" "0")
       (float_truncate:V2SF
 	(match_operand:V2DF 2 "register_operand" "w"))))]
-  "TARGET_SIMD"
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "fcvtn2\\t%0.4s, %2.2d"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
+(define_insn "aarch64_float_truncate_hi_v4sf_be"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+    (vec_concat:V4SF
+      (float_truncate:V2SF
+	(match_operand:V2DF 2 "register_operand" "w"))
+      (match_operand:V2SF 1 "register_operand" "0")))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "fcvtn2\\t%0.4s, %2.2d"
+  [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
+)
+
+(define_expand "aarch64_float_truncate_hi_v4sf"
+  [(match_operand:V4SF 0 "register_operand" "=w")
+   (match_operand:V2SF 1 "register_operand" "0")
+   (match_operand:V2DF 2 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_float_truncate_hi_v4sf_be
+			     : gen_aarch64_float_truncate_hi_v4sf_le;
+  emit_insn (gen (operands[0], operands[1], operands[2]));
+  DONE;
+}
+)
+
 (define_expand "vec_pack_trunc_v2df"
   [(set (match_operand:V4SF 0 "register_operand")
       (vec_concat:V4SF

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-09  8:53 [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics James Greenhalgh
@ 2015-09-09 10:31 ` Christophe Lyon
  2015-09-10 14:16   ` James Greenhalgh
  2015-09-09 11:43 ` Alan Lawrence
  1 sibling, 1 reply; 10+ messages in thread
From: Christophe Lyon @ 2015-09-09 10:31 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, Marcus Shawcroft, Tejas Belagod, Alan Lawrence

On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch clears up some remaining confusion in the vector lane orderings
> for the two intrinsics mentioned in the title.
>
> Bootstrapped on aarch64-none-linux-gnu and regression tested for
> aarch64_be-none-elf with no issues.
>

Does this actually fix an existing testcase?


> OK?
>
> Thanks,
> James
>
> ---
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite
>         as an expand.
>         (vec_unpacks_hi_v4sf):  Likewise.
>         (aarch64_float_extend_lo_v2df): Rename to...
>         (aarch64_fcvtl_v4sf): This.
>         (aarch64_fcvtl2_v4sf): New.
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-09  8:53 [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics James Greenhalgh
  2015-09-09 10:31 ` Christophe Lyon
@ 2015-09-09 11:43 ` Alan Lawrence
  2015-09-10 11:21   ` Alan Lawrence
  1 sibling, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-09-09 11:43 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: christophe.lyon, Marcus Shawcroft, Tejas Belagod

Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this 
the same issue as my patch 12 "with BE RTL fix"? 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as 
r227551 last night and since this reparameterizes the patterns I don't think 
your patch will apply to current HEAD.

If my patch is wrong...well, that may be, I haven't understood the issue yet. 
But it sounds like the first thing we need is a decent testcase? (Or is the 
confusion just in the RTL representation, so a testcase would require getting 
constant-folding to happen in RTL, which I tried but failed to make that happen 
myself?)

--Alan

On 09/09/15 09:31, James Greenhalgh wrote:
>
> Hi,
>
> This patch clears up some remaining confusion in the vector lane orderings
> for the two intrinsics mentioned in the title.
>
> Bootstrapped on aarch64-none-linux-gnu and regression tested for
> aarch64_be-none-elf with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite
> 	as an expand.
> 	(vec_unpacks_hi_v4sf):  Likewise.
> 	(aarch64_float_extend_lo_v2df): Rename to...
> 	(aarch64_fcvtl_v4sf): This.
> 	(aarch64_fcvtl2_v4sf): New.
> 	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
> 	(aarch64_float_truncate_hi_v4sf_le): New.
> 	(aarch64_float_truncate_hi_v4sf_be): Likewise.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-09 11:43 ` Alan Lawrence
@ 2015-09-10 11:21   ` Alan Lawrence
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Lawrence @ 2015-09-10 11:21 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: christophe.lyon, Marcus Shawcroft, Tejas Belagod

On 09/09/15 11:31, Alan Lawrence wrote:
> Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this
> the same issue as my patch 12 "with BE RTL fix"?
> (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as
> r227551 last night and since this reparameterizes the patterns I don't think
> your patch will apply to current HEAD.
>
> If my patch is wrong...well, that may be, I haven't understood the issue yet.

In particular, we should expect the vec_unpacks standard pattern to have 
different behaviour (from a tree POV), as this is what I find searching for 
VEC_UNPACK in tree-vect-stmts.c:


bool
supportable_widening_operation
{
...
    switch (code)
      {
...
      CASE_CONVERT:
        c1 = VEC_UNPACK_LO_EXPR;
        c2 = VEC_UNPACK_HI_EXPR;
        break;

      case FLOAT_EXPR:
        c1 = VEC_UNPACK_FLOAT_LO_EXPR;
        c2 = VEC_UNPACK_FLOAT_HI_EXPR;
        break;

      case FIX_TRUNC_EXPR:
        /* ??? Not yet implemented due to missing VEC_UNPACK_FIX_TRUNC_HI_EXPR/
           VEC_UNPACK_FIX_TRUNC_LO_EXPR tree codes and optabs used for
           computing the operation.  */
        return false;

      default:
        gcc_unreachable ();
      }

    if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
      std::swap (c1, c2);



Yes, IIUC this goes against the principle of tree being the same regardless of 
underlying endianness.

--Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-09 10:31 ` Christophe Lyon
@ 2015-09-10 14:16   ` James Greenhalgh
  2015-09-20 20:51     ` Christophe Lyon
  2015-09-21  9:59     ` Alan Lawrence
  0 siblings, 2 replies; 10+ messages in thread
From: James Greenhalgh @ 2015-09-10 14:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: christophe.lyon, marcus.shawcroft, tejas.belagod, alan.lawrence

[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]


On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote:
> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > This patch clears up some remaining confusion in the vector lane orderings
> > for the two intrinsics mentioned in the title.
> >
> > Bootstrapped on aarch64-none-linux-gnu and regression tested for
> > aarch64_be-none-elf with no issues.
> >
>
> Does this actually fix an existing testcase?

Yes, of course, sorry - that was a useless introduction to the patch!

First, I've updated the patch with a testcase, which fails for me on
aarch64_be-none-elf but not aarch64-*-*.

The issue is that the RTL folding routines will happily fold through
a vec_concat or a vec_select, which we have given the wrong operands
to when in BYTES_BIG_ENDIAN mode.  The fix is similar to that which we
have elsewhere in aarch64-simd.md, which is to split out the big
and little endian forms of the patterns which need vec_concat, and
to build a vec_par_cnst_*_half mask for the patterns which need
vec_select. This keeps us in the GCC-view of lane ordering.

There is test coverage that these patterns do the right thing for the
vectorizer (I know, because I initially typoed s/le/be and saw tests
gcc.dg/vect fall over), and the new testcase adds coverage for the
expansion path through intrinsics.

I've rebased on top of Alan's patch, which goes halfway to fixing the
issue, but which didn't fix the float_truncate patterns, which had an
incorrect vec_concat. That simplifies the patch considerably.

Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no
issues.

OK?

Thanks,
James

---
gcc/

2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md

	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
	(aarch64_float_truncate_hi_v4sf_le): New.
	(aarch64_float_truncate_hi_v4sf_be): Likewise.

gcc/testsuite/

2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-Fix-vcvt_high_f64_f32-and-vcvt_figh_f32_f64-.patch --]
[-- Type: text/x-patch;  name=0001-AArch64-Fix-vcvt_high_f64_f32-and-vcvt_figh_f32_f64-.patch, Size: 5402 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a4eaeca..8be9b97 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1703,6 +1703,13 @@
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
+;; is inconsistent with vector ordering elsewhere in the compiler, in that
+;; the meaning of HI and LO is always taken with a little-endian view of
+;; the vector.  Thus, while the patterns below look incorrect in that
+;; vec_unpacks_hi always extracts the high *architectural* lanes of a
+;; vector, their behaviour is as required.
+
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "")
    (match_operand:VQ_HSF 1 "register_operand" "")]
@@ -1757,17 +1764,42 @@
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
-(define_insn "aarch64_float_truncate_hi_<Vdbl>"
+(define_insn "aarch64_float_truncate_hi_<Vdbl>_le"
   [(set (match_operand:<VDBL> 0 "register_operand" "=w")
     (vec_concat:<VDBL>
       (match_operand:VDF 1 "register_operand" "0")
       (float_truncate:VDF
 	(match_operand:<VWIDE> 2 "register_operand" "w"))))]
-  "TARGET_SIMD"
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
+(define_insn "aarch64_float_truncate_hi_<Vdbl>_be"
+  [(set (match_operand:<VDBL> 0 "register_operand" "=w")
+    (vec_concat:<VDBL>
+      (float_truncate:VDF
+	(match_operand:<VWIDE> 2 "register_operand" "w"))
+      (match_operand:VDF 1 "register_operand" "0")))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
+  [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
+)
+
+(define_expand "aarch64_float_truncate_hi_<Vdbl>"
+  [(match_operand:<VDBL> 0 "register_operand" "=w")
+   (match_operand:VDF 1 "register_operand" "0")
+   (match_operand:<VWIDE> 2 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_float_truncate_hi_<Vdbl>_be
+			     : gen_aarch64_float_truncate_hi_<Vdbl>_le;
+  emit_insn (gen (operands[0], operands[1], operands[2]));
+  DONE;
+}
+)
+
 (define_expand "vec_pack_trunc_v2df"
   [(set (match_operand:V4SF 0 "register_operand")
       (vec_concat:V4SF
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c
new file mode 100644
index 0000000..492d6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c
@@ -0,0 +1,97 @@
+#include "arm_neon.h"
+#include <inttypes.h>
+
+void abort (void);
+
+void
+foo (void)
+{
+  /* Test vcvt_high_f32_f64.  */
+  float32x2_t arg1;
+  float64x2_t arg2;
+  float32x4_t result;
+  arg1 = vcreate_f32 (UINT64_C (0x3f0db5793f6e1892));
+  arg2 = vcombine_f64 (vcreate_f64 (UINT64_C (0x3fe8e49d23fb575d)),
+		       vcreate_f64 (UINT64_C (0x3fd921291b3df73e)));
+  //  Expect: "result" = 3ec909483f4724e93f0db5793f6e1892
+  result = vcvt_high_f32_f64 (arg1, arg2);
+  float32_t got;
+  float32_t exp;
+
+  /* Lane 0.  */
+  got = vgetq_lane_f32 (result, 0);
+  exp = ((float32_t) 0.9300624132156372);
+  if (((((exp / got) < ((float32_t) 0.999))
+	 || ((exp / got) > ((float32_t) 1.001)))
+     && (((exp - got) < ((float32_t) -1.0e-4))
+	 || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 1.  */
+  got = vgetq_lane_f32 (result, 1);
+  exp = ((float32_t) 0.5535503029823303);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+     && (((exp - got) < ((float32_t) -1.0e-4))
+	   || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 2.  */
+  got = vgetq_lane_f32 (result, 2);
+  exp = ((float32_t) 0.7779069617051665);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+      && (((exp - got) < ((float32_t) -1.0e-4))
+	  || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 2.  */
+  got = vgetq_lane_f32 (result, 3);
+  exp = ((float32_t) 0.3926489606891329);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+      && (((exp - got) < ((float32_t) -1.0e-4))
+	  || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+}
+
+void
+bar (void)
+{
+  /* Test vcvt_high_f64_f32.  */
+  float32x4_t arg1;
+  float64x2_t result;
+  arg1 = vcombine_f32 (vcreate_f32 (UINT64_C (0x3f7c5cf13f261f74)),
+		       vcreate_f32 (UINT64_C (0x3e3a7bc03f6ccc1d)));
+  //  Expect: "result" = 3fc74f78000000003fed9983a0000000
+  result = vcvt_high_f64_f32 (arg1);
+
+  float64_t got;
+  float64_t exp;
+
+  /* Lane 0.  */
+  got = vgetq_lane_f64 (result, 0);
+  exp = 0.9249895215034485;
+  if (((((exp / got) < 0.999)
+	 || ((exp / got) > 1.001))
+     && (((exp - got) < -1.0e-4)
+	 || ((exp - got) > 1.0e-4))))
+    abort ();
+
+  /* Lane 0.  */
+  got = vgetq_lane_f64 (result, 1);
+  exp = 0.1821126937866211;
+  if (((((exp / got) < 0.999)
+	  || ((exp / got) > 1.001))
+      && (((exp - got) < -1.0e-4)
+	  || ((exp - got) > 1.0e-4))))
+    abort ();
+}
+
+int
+main (int argc, char **argv)
+{
+  foo ();
+  bar ();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-10 14:16   ` James Greenhalgh
@ 2015-09-20 20:51     ` Christophe Lyon
  2015-09-21  9:59     ` Alan Lawrence
  1 sibling, 0 replies; 10+ messages in thread
From: Christophe Lyon @ 2015-09-20 20:51 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, Marcus Shawcroft, Tejas Belagod, Alan Lawrence

On 10 September 2015 at 16:02, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote:
>> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This patch clears up some remaining confusion in the vector lane orderings
>> > for the two intrinsics mentioned in the title.
>> >
>> > Bootstrapped on aarch64-none-linux-gnu and regression tested for
>> > aarch64_be-none-elf with no issues.
>> >
>>
>> Does this actually fix an existing testcase?
>
> Yes, of course, sorry - that was a useless introduction to the patch!
>
> First, I've updated the patch with a testcase, which fails for me on
> aarch64_be-none-elf but not aarch64-*-*.
>
> The issue is that the RTL folding routines will happily fold through
> a vec_concat or a vec_select, which we have given the wrong operands
> to when in BYTES_BIG_ENDIAN mode.  The fix is similar to that which we
> have elsewhere in aarch64-simd.md, which is to split out the big
> and little endian forms of the patterns which need vec_concat, and
> to build a vec_par_cnst_*_half mask for the patterns which need
> vec_select. This keeps us in the GCC-view of lane ordering.
>
> There is test coverage that these patterns do the right thing for the
> vectorizer (I know, because I initially typoed s/le/be and saw tests
> gcc.dg/vect fall over), and the new testcase adds coverage for the
> expansion path through intrinsics.
>
> I've rebased on top of Alan's patch, which goes halfway to fixing the
> issue, but which didn't fix the float_truncate patterns, which had an
> incorrect vec_concat. That simplifies the patch considerably.
>
> Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no
> issues.
>
> OK?

The testcase should be modified so that it is skipped on arm* targets.

Christophe.

>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>
> gcc/testsuite/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-10 14:16   ` James Greenhalgh
  2015-09-20 20:51     ` Christophe Lyon
@ 2015-09-21  9:59     ` Alan Lawrence
  2015-09-21 14:43       ` James Greenhalgh
  1 sibling, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-09-21  9:59 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, christophe.lyon, marcus.shawcroft, tejas.belagod

[Resending in plain text] This makes sense to me now, although I find
your comment slightly confusing:

[....] in that
+;; the meaning of HI and LO is always taken with a little-endian view of
+;; the vector

You mean vec_unpacks_{hi,lo} (which seems to go against the
*architectural* bit after this), or hi/lo in cases other than
vec_unpack (=> not "always"), or something else?

maybe s/always/usually/ or s/always/otherwise/ ?

Cheers, Alan

On 10 September 2015 at 15:02, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote:
>> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This patch clears up some remaining confusion in the vector lane orderings
>> > for the two intrinsics mentioned in the title.
>> >
>> > Bootstrapped on aarch64-none-linux-gnu and regression tested for
>> > aarch64_be-none-elf with no issues.
>> >
>>
>> Does this actually fix an existing testcase?
>
> Yes, of course, sorry - that was a useless introduction to the patch!
>
> First, I've updated the patch with a testcase, which fails for me on
> aarch64_be-none-elf but not aarch64-*-*.
>
> The issue is that the RTL folding routines will happily fold through
> a vec_concat or a vec_select, which we have given the wrong operands
> to when in BYTES_BIG_ENDIAN mode.  The fix is similar to that which we
> have elsewhere in aarch64-simd.md, which is to split out the big
> and little endian forms of the patterns which need vec_concat, and
> to build a vec_par_cnst_*_half mask for the patterns which need
> vec_select. This keeps us in the GCC-view of lane ordering.
>
> There is test coverage that these patterns do the right thing for the
> vectorizer (I know, because I initially typoed s/le/be and saw tests
> gcc.dg/vect fall over), and the new testcase adds coverage for the
> expansion path through intrinsics.
>
> I've rebased on top of Alan's patch, which goes halfway to fixing the
> issue, but which didn't fix the float_truncate patterns, which had an
> incorrect vec_concat. That simplifies the patch considerably.
>
> Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no
> issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>
> gcc/testsuite/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-21  9:59     ` Alan Lawrence
@ 2015-09-21 14:43       ` James Greenhalgh
  2015-09-21 15:12         ` Alan Lawrence
  2015-09-22 16:03         ` Marcus Shawcroft
  0 siblings, 2 replies; 10+ messages in thread
From: James Greenhalgh @ 2015-09-21 14:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: alalaw01, marsha01, tbelagod, christophe.lyon

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]


On Mon, Sep 21, 2015 at 10:44:32AM +0100, Alan Lawrence wrote:
> [Resending in plain text] This makes sense to me now, although I find
> your comment slightly confusing:
>
> [....] in that
> +;; the meaning of HI and LO is always taken with a little-endian view of
> +;; the vector
>
> You mean vec_unpacks_{hi,lo} (which seems to go against the
> *architectural* bit after this), or hi/lo in cases other than
> vec_unpack (=> not "always"), or something else?
>
> maybe s/always/usually/ or s/always/otherwise/ ?
>

What I was aiming for is a description that our implementation of these
standard pattern names looks wrong, because "hi" always extracts the
architectural high lanes, in other big-endian patterns we make the
adjustment that higher numbered lanes map to the low architectural lanes.

I've tried to reword the comment to make it clearer, but I'm assuming some
familiarity with our overall big-endian vector model.

I've also updated the testcase to skip it if we are targetting AArch32,
which does not provide these intrinsics.

OK?

Thanks,
James

---
gcc/

2015-09-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md

	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
	(aarch64_float_truncate_hi_v4sf_le): New.
	(aarch64_float_truncate_hi_v4sf_be): Likewise.

gcc/testsuite/

2015-09-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Re-AArch64-Fix-vcvt_high_f64_f32-and-vcvt_figh_f32_f.patch --]
[-- Type: text/x-patch;  name=0001-Re-AArch64-Fix-vcvt_high_f64_f32-and-vcvt_figh_f32_f.patch, Size: 5611 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index dbe5259..5ab2f2b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1703,6 +1703,15 @@
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
+;; is inconsistent with vector ordering elsewhere in the compiler, in that
+;; the meaning of HI and LO changes depending on the target endianness.
+;; While elsewhere we map the higher numbered elements of a vector to
+;; the lower architectural lanes of the vector, for these patterns we want
+;; to always treat "hi" as referring to the higher architectural lanes.
+;; Consequently, while the patterns below look inconsistent with our
+;; other big-endian patterns their behaviour is as required.
+
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "")
    (match_operand:VQ_HSF 1 "register_operand" "")]
@@ -1757,17 +1766,42 @@
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
-(define_insn "aarch64_float_truncate_hi_<Vdbl>"
+(define_insn "aarch64_float_truncate_hi_<Vdbl>_le"
   [(set (match_operand:<VDBL> 0 "register_operand" "=w")
     (vec_concat:<VDBL>
       (match_operand:VDF 1 "register_operand" "0")
       (float_truncate:VDF
 	(match_operand:<VWIDE> 2 "register_operand" "w"))))]
-  "TARGET_SIMD"
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
+(define_insn "aarch64_float_truncate_hi_<Vdbl>_be"
+  [(set (match_operand:<VDBL> 0 "register_operand" "=w")
+    (vec_concat:<VDBL>
+      (float_truncate:VDF
+	(match_operand:<VWIDE> 2 "register_operand" "w"))
+      (match_operand:VDF 1 "register_operand" "0")))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
+  [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
+)
+
+(define_expand "aarch64_float_truncate_hi_<Vdbl>"
+  [(match_operand:<VDBL> 0 "register_operand" "=w")
+   (match_operand:VDF 1 "register_operand" "0")
+   (match_operand:<VWIDE> 2 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_float_truncate_hi_<Vdbl>_be
+			     : gen_aarch64_float_truncate_hi_<Vdbl>_le;
+  emit_insn (gen (operands[0], operands[1], operands[2]));
+  DONE;
+}
+)
+
 (define_expand "vec_pack_trunc_v2df"
   [(set (match_operand:V4SF 0 "register_operand")
       (vec_concat:V4SF
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c
new file mode 100644
index 0000000..4691da3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c
@@ -0,0 +1,99 @@
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include "arm_neon.h"
+#include <inttypes.h>
+
+void abort (void);
+
+void
+foo (void)
+{
+  /* Test vcvt_high_f32_f64.  */
+  float32x2_t arg1;
+  float64x2_t arg2;
+  float32x4_t result;
+  arg1 = vcreate_f32 (UINT64_C (0x3f0db5793f6e1892));
+  arg2 = vcombine_f64 (vcreate_f64 (UINT64_C (0x3fe8e49d23fb575d)),
+		       vcreate_f64 (UINT64_C (0x3fd921291b3df73e)));
+  //  Expect: "result" = 3ec909483f4724e93f0db5793f6e1892
+  result = vcvt_high_f32_f64 (arg1, arg2);
+  float32_t got;
+  float32_t exp;
+
+  /* Lane 0.  */
+  got = vgetq_lane_f32 (result, 0);
+  exp = ((float32_t) 0.9300624132156372);
+  if (((((exp / got) < ((float32_t) 0.999))
+	 || ((exp / got) > ((float32_t) 1.001)))
+     && (((exp - got) < ((float32_t) -1.0e-4))
+	 || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 1.  */
+  got = vgetq_lane_f32 (result, 1);
+  exp = ((float32_t) 0.5535503029823303);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+     && (((exp - got) < ((float32_t) -1.0e-4))
+	   || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 2.  */
+  got = vgetq_lane_f32 (result, 2);
+  exp = ((float32_t) 0.7779069617051665);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+      && (((exp - got) < ((float32_t) -1.0e-4))
+	  || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 3.  */
+  got = vgetq_lane_f32 (result, 3);
+  exp = ((float32_t) 0.3926489606891329);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+      && (((exp - got) < ((float32_t) -1.0e-4))
+	  || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+}
+
+void
+bar (void)
+{
+  /* Test vcvt_high_f64_f32.  */
+  float32x4_t arg1;
+  float64x2_t result;
+  arg1 = vcombine_f32 (vcreate_f32 (UINT64_C (0x3f7c5cf13f261f74)),
+		       vcreate_f32 (UINT64_C (0x3e3a7bc03f6ccc1d)));
+  //  Expect: "result" = 3fc74f78000000003fed9983a0000000
+  result = vcvt_high_f64_f32 (arg1);
+
+  float64_t got;
+  float64_t exp;
+
+  /* Lane 0.  */
+  got = vgetq_lane_f64 (result, 0);
+  exp = 0.9249895215034485;
+  if (((((exp / got) < 0.999)
+	 || ((exp / got) > 1.001))
+     && (((exp - got) < -1.0e-4)
+	 || ((exp - got) > 1.0e-4))))
+    abort ();
+
+  /* Lane 1.  */
+  got = vgetq_lane_f64 (result, 1);
+  exp = 0.1821126937866211;
+  if (((((exp / got) < 0.999)
+	  || ((exp / got) > 1.001))
+      && (((exp - got) < -1.0e-4)
+	  || ((exp - got) > 1.0e-4))))
+    abort ();
+}
+
+int
+main (int argc, char **argv)
+{
+  foo ();
+  bar ();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-21 14:43       ` James Greenhalgh
@ 2015-09-21 15:12         ` Alan Lawrence
  2015-09-22 16:03         ` Marcus Shawcroft
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Lawrence @ 2015-09-21 15:12 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: Marcus Shawcroft, Tejas Belagod, christophe.lyon

On 21/09/15 15:38, James Greenhalgh wrote:
>
> On Mon, Sep 21, 2015 at 10:44:32AM +0100, Alan Lawrence wrote:
>> [Resending in plain text] This makes sense to me now, although I find
>> your comment slightly confusing:
>>
>> [....] in that
>> +;; the meaning of HI and LO is always taken with a little-endian view of
>> +;; the vector
>>
>> You mean vec_unpacks_{hi,lo} (which seems to go against the
>> *architectural* bit after this), or hi/lo in cases other than
>> vec_unpack (=> not "always"), or something else?
>>
>> maybe s/always/usually/ or s/always/otherwise/ ?
>>
>
> What I was aiming for is a description that our implementation of these
> standard pattern names looks wrong, because "hi" always extracts the
> architectural high lanes, in other big-endian patterns we make the
> adjustment that higher numbered lanes map to the low architectural lanes.
>
> I've tried to reword the comment to make it clearer, but I'm assuming some
> familiarity with our overall big-endian vector model.

Looks good to me, thanks for the clarification.

Cheers, Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
  2015-09-21 14:43       ` James Greenhalgh
  2015-09-21 15:12         ` Alan Lawrence
@ 2015-09-22 16:03         ` Marcus Shawcroft
  1 sibling, 0 replies; 10+ messages in thread
From: Marcus Shawcroft @ 2015-09-22 16:03 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, alalaw01, marsha01, Tejas Belagod, christophe.lyon

On 21 September 2015 at 15:38, James Greenhalgh
<james.greenhalgh@arm.com> wrote:

> ---
> gcc/
>
> 2015-09-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>
> gcc/testsuite/
>
> 2015-09-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.
>

+#include <inttypes.h>
+

I don't think this include is required, otherwise OK /Marcus

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-22 15:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-09  8:53 [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics James Greenhalgh
2015-09-09 10:31 ` Christophe Lyon
2015-09-10 14:16   ` James Greenhalgh
2015-09-20 20:51     ` Christophe Lyon
2015-09-21  9:59     ` Alan Lawrence
2015-09-21 14:43       ` James Greenhalgh
2015-09-21 15:12         ` Alan Lawrence
2015-09-22 16:03         ` Marcus Shawcroft
2015-09-09 11:43 ` Alan Lawrence
2015-09-10 11:21   ` Alan Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).