[PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
@ 2015-10-19 13:58 Kyrill Tkachov
  2015-10-20 15:31 ` Marcus Shawcroft
  0 siblings, 1 reply; 7+ messages in thread
From: Kyrill Tkachov @ 2015-10-19 13:58 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

[-- Attachment #1: Type: text/plain, Size: 2650 bytes --]

Hi all,

The fcvtzs and fcvtzu instructions have a form where they convert to a fixed-point form with a specified number of
fractional bits. In practice this has the effect of multiplying the floating point argument by 2^<number of fractional bits>
and then converting the result to integer. We can exploit that behaviour during combine to eliminate a floating-point multiplication
by an FP immediate that is a power of 2 i.e. 4.0, 8.0, 16.0 etc.
For example for code:
int
sffoo1 (float a)
{
   return a * 4.0f;
}

we currently generate:
sffoo1:
         fmov    s1, 4.0e+0
         fmul    s0, s0, s1
         fcvtzs  w0, s0
         ret

with this patch we can generate:
sffoo1:
         fcvtzs  w0, s0, #2
         ret

We already pefrom the analogous combination for the arm target (see the *combine_vcvtf2i pattern in config/arm/vfp.md)
However, this patch also implements the fcvtzu form i.e. the unsigned_fix form as well as the vector forms.

However, not everything is rosy. The code:
int
foo (float a)
{
   return a * 32.0f;
}

will not trigger the optimisation because 32.0f is stored in the constant pool and due to a deficiency in
simplify-rtx.c the simplification doesn't get through. I have a patch to fix that as part 2/2.

Also, for code:
int
foo (float a)
{
   return a * 2.0f;
}

This gets folded early on as a + a and thus emits an fadd instruction followed by a fcvtzs.
Nothing we can do about that (in this patch at least).

I've seen this trigger once in 453.povray in SPEC2006 and one other time in 435.gromacs after
patch 2/2 is applied. I've heard this can also trigger in codec-like codebases and I did see it
trigger a few times in ffmpeg.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md
  (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
     * config/aarch64/aarch64-simd.md
  (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
     * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
     (aarch64_fpconst_pow_of_2): New function.
     (aarch64_vec_fpconst_pow_of_2): Likewise.
     * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
     prototype.
     (aarch64_vec_fpconst_pow_of_2): Likewise.
     * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
     (aarch64_fp_vec_pow2): Likewise.

2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/fmul_fcvt_1.c: New test.
     * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-fmul-fcvt.patch --]
[-- Type: text/x-patch; name=aarch64-fmul-fcvt.patch, Size: 12125 bytes --]

commit a13a5967a1f94744776d616ca84d5512b24bf546
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Thu Oct 8 15:17:47 2015 +0100

    [AArch64] Add fmul+fcvt optimisation

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a8ac8d3..309dcfb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
+int aarch64_fpconst_pow_of_2 (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 						       machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
+int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 const char *aarch64_output_move_struct (rtx *operands);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6a2ab61..3d2c496 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1654,6 +1654,27 @@ (define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2"
   [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
 )
 
+(define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
+  [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
+	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
+			       [(mult:VDQF
+	 (match_operand:VDQF 1 "register_operand" "w")
+	 (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))]
+			       UNSPEC_FRINTZ)))]
+  "TARGET_SIMD
+   && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (GET_MODE_INNER (<VDQF:MODE>mode)))"
+  {
+    int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]);
+    char buf[64];
+    sprintf (buf, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);
+    output_asm_insn (buf, operands);
+    return "";
+  }
+  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+)
+
+
 (define_expand "<optab><VDQF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2ec76a5..9b76746 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6808,6 +6808,19 @@ cost_plus:
 	  else
 	    *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
 	}
+
+      /* We can combine fmul by a power of 2 followed by a fcvt into a single
+	 fixed-point fcvt.  */
+      if (GET_CODE (x) == MULT
+	  && ((VECTOR_MODE_P (mode)
+	       && aarch64_vec_fpconst_pow_of_2 (XEXP (x, 1)) > 0)
+	      || aarch64_fpconst_pow_of_2 (XEXP (x, 1)) > 0))
+	{
+	  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
+			     (enum rtx_code) code, 0, speed);
+	  return true;
+	}
+
       *cost += rtx_cost (x, VOIDmode, (enum rtx_code) code, 0, speed);
       return true;
 
@@ -13386,6 +13399,54 @@ aarch64_reorg (void)
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG aarch64_reorg
 
+
+/* If X is a positive CONST_DOUBLE with a value that is a power of 2
+   return the log2 of that value.  Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow_of_2 (rtx x)
+{
+  const REAL_VALUE_TYPE *r;
+
+  if (!CONST_DOUBLE_P (x))
+    return -1;
+
+  r = CONST_DOUBLE_REAL_VALUE (x);
+
+  if (REAL_VALUE_NEGATIVE (*r)
+      || REAL_VALUE_ISNAN (*r)
+      || REAL_VALUE_ISINF (*r)
+      || !real_isinteger (r, DFmode))
+    return -1;
+
+  return exact_log2 (real_to_integer (r));
+}
+
+/* If X is a vector of equal CONST_DOUBLE values and that value is
+   Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
+
+int
+aarch64_vec_fpconst_pow_of_2 (rtx x)
+{
+  if (GET_CODE (x) != CONST_VECTOR)
+    return -1;
+
+  if (GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_FLOAT)
+    return -1;
+
+  int firstval = aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, 0));
+  if (firstval <= 0)
+    return -1;
+
+  int count = CONST_VECTOR_NUNITS (x);
+  int i;
+  for (i = 1; i < count; i++)
+    if (aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, i)) != firstval)
+      return -1;
+
+  return firstval;
+}
+
 /* Implement TARGET_PROMOTED_TYPE to promote __fp16 to float.  */
 static tree
 aarch64_promoted_type (const_tree t)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bb821a3..6a9d52c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4287,6 +4287,25 @@ (define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2"
   [(set_attr "type" "f_cvtf2i")]
 )
 
+(define_insn "*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(FIXUORS:GPI
+	  (mult:GPF
+	    (match_operand:GPF 1 "register_operand" "w")
+	    (match_operand:GPF 2 "aarch64_fp_pow2" "F"))))]
+  "TARGET_FLOAT
+   && IN_RANGE (aarch64_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (<GPI:MODE>mode))"
+  {
+    int fbits = aarch64_fpconst_pow_of_2 (operands[2]);
+    char buf[64];
+    sprintf (buf, "fcvtz<su>\\t%%<GPI:w>0, %%<GPF:s>1, #%d", fbits);
+    output_asm_insn (buf, operands);
+    return "";
+  }
+  [(set_attr "type" "f_cvtf2i")]
+)
+
 ;; fma - no throw
 
 (define_insn "fma<mode>4"
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 8af4b81..1bcbf62 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -87,6 +87,13 @@ (define_predicate "aarch64_fp_compare_operand"
        (and (match_code "const_double")
 	    (match_test "aarch64_float_const_zero_rtx_p (op)"))))
 
+(define_predicate "aarch64_fp_pow2"
+  (and (match_code "const_double")
+	(match_test "aarch64_fpconst_pow_of_2 (op) > 0")))
+
+(define_predicate "aarch64_fp_vec_pow2"
+  (match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0"))
+
 (define_predicate "aarch64_plus_immediate"
   (and (match_code "const_int")
        (ior (match_test "aarch64_uimm12_shift (INTVAL (op))")
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
new file mode 100644
index 0000000..5af8290
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+#define FUNC_DEFS(__a)	\
+int			\
+sffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}			\
+			\
+unsigned int		\
+usffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}			\
+			\
+long			\
+lsffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}			\
+			\
+unsigned long		\
+ulsffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}
+
+#define FUNC_DEFD(__a)	\
+long			\
+dffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}			\
+			\
+unsigned long		\
+udffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}			\
+int			\
+sdffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}			\
+			\
+unsigned int		\
+usdffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}
+
+FUNC_DEFS (4)
+FUNC_DEFD (4)
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], s\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], d\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], d\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], s\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], s\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#2" } } */
+
+FUNC_DEFS (8)
+FUNC_DEFD (8)
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], s\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], d\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], d\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], s\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], s\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#3" } } */
+
+FUNC_DEFS (16)
+FUNC_DEFD (16)
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], s\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], d\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], d\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], s\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], s\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" } } */
+
+
+#define FUNC_TESTS(__a, __b)					\
+do								\
+  {								\
+    if (sffoo##__a (__b) != (int)(__b * __a))			\
+      __builtin_abort ();					\
+    if (usffoo##__a (__b) != (unsigned int)(__b * __a))	\
+      __builtin_abort ();					\
+    if (lsffoo##__a (__b) != (long)(__b * __a))		\
+      __builtin_abort ();					\
+    if (ulsffoo##__a (__b) != (unsigned long)(__b * __a))	\
+      __builtin_abort ();					\
+  } while (0)
+
+#define FUNC_TESTD(__a, __b)					\
+do								\
+  {								\
+    if (dffoo##__a (__b) != (long)(__b * __a))			\
+      __builtin_abort ();					\
+    if (udffoo##__a (__b) != (unsigned long)(__b * __a))	\
+      __builtin_abort ();					\
+    if (sdffoo##__a (__b) != (int)(__b * __a))			\
+      __builtin_abort ();					\
+    if (usdffoo##__a (__b) != (unsigned int)(__b * __a))	\
+      __builtin_abort ();					\
+  } while (0)
+
+int
+main (void)
+{
+  float i;
+
+  for (i = -0.001; i < 32.0; i += 1.0f)
+    {
+      FUNC_TESTS (4, i);
+      FUNC_TESTS (8, i);
+      FUNC_TESTS (16, i);
+
+      FUNC_TESTD (4, i);
+      FUNC_TESTD (8, i);
+      FUNC_TESTD (16, i);
+    }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
new file mode 100644
index 0000000..39cf890
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -ftree-vectorize -fno-inline" } */
+
+#define N 1024
+
+#define FUNC_DEF(__a)		\
+void				\
+foo##__a (float *a, int *b)	\
+{				\
+  int i;			\
+  for (i = 0; i < N; i++)	\
+    b[i] = a[i] * __a##.0f;	\
+}
+
+FUNC_DEF (4)
+FUNC_DEF (8)
+FUNC_DEF (16)
+
+int ints[N];
+float floats[N];
+
+void
+reset_ints (int *arr)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    arr[i] = 0;
+}
+
+void
+check_result (int *is, int n)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    if (is[i] != i * n)
+      __builtin_abort ();
+}
+
+#define FUNC_CHECK(__a)		\
+do				\
+  {				\
+    reset_ints (ints);		\
+    foo##__a (floats, ints);	\
+    check_result (ints, __a);	\
+  } while (0)
+
+
+int
+main (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    floats[i] = (float) i;
+
+  FUNC_CHECK (4);
+  FUNC_CHECK (8);
+  FUNC_CHECK (16);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "fmul\tv\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#2" } } */
+/* { dg-final { scan-assembler "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#3" } } */
+/* { dg-final { scan-assembler "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#4" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
  2015-10-19 13:58 [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation Kyrill Tkachov
@ 2015-10-20 15:31 ` Marcus Shawcroft
  2015-10-20 15:48   ` Kyrill Tkachov
  2015-10-20 16:30   ` Ramana Radhakrishnan
  0 siblings, 2 replies; 7+ messages in thread
From: Marcus Shawcroft @ 2015-10-20 15:31 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

On 19 October 2015 at 14:57, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md
>  (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
>     * config/aarch64/aarch64-simd.md
>  (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
>     * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
>     (aarch64_fpconst_pow_of_2): New function.
>     (aarch64_vec_fpconst_pow_of_2): Likewise.
>     * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
>     prototype.
>     (aarch64_vec_fpconst_pow_of_2): Likewise.
>     * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>     (aarch64_fp_vec_pow2): Likewise.
>
> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>     * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.

+    char buf[64];
+    sprintf (buf, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);

Prefer snprintf please.

+  }
+  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+)
+
+

Superflous blank line here ?

+  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
+     (enum rtx_code) code, 0, speed);

My understanding is the unnecessary use of enum  is now discouraged,
(rtx_code) is sufficient in this case.

+  int count = CONST_VECTOR_NUNITS (x);
+  int i;
+  for (i = 1; i < count; i++)

Push the int into the for initializer.
Push the rhs of the count assignment into the for condition and drop
the definition of count.

+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */

I'd prefer scan-assembler-times or do you have a particular reason to
avoid it in these tests?

Cheers
/Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
  2015-10-20 15:31 ` Marcus Shawcroft
@ 2015-10-20 15:48   ` Kyrill Tkachov
  2015-10-20 15:58     ` Marcus Shawcroft
  2015-10-20 16:30   ` Ramana Radhakrishnan
  1 sibling, 1 reply; 7+ messages in thread
From: Kyrill Tkachov @ 2015-10-20 15:48 UTC (permalink / raw)
  To: Marcus Shawcroft
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

[-- Attachment #1: Type: text/plain, Size: 2876 bytes --]


On 20/10/15 16:26, Marcus Shawcroft wrote:
> On 19 October 2015 at 14:57, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * config/aarch64/aarch64.md
>>   (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
>>      * config/aarch64/aarch64-simd.md
>>   (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
>>      * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
>>      (aarch64_fpconst_pow_of_2): New function.
>>      (aarch64_vec_fpconst_pow_of_2): Likewise.
>>      * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
>>      prototype.
>>      (aarch64_vec_fpconst_pow_of_2): Likewise.
>>      * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>>      (aarch64_fp_vec_pow2): Likewise.
>>
>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>>      * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
> +    char buf[64];
> +    sprintf (buf, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);
>
> Prefer snprintf please.
>
> +  }
> +  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
> +)
> +
> +
>
> Superflous blank line here ?
>
> +  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
> +     (enum rtx_code) code, 0, speed);
>
> My understanding is the unnecessary use of enum  is now discouraged,
> (rtx_code) is sufficient in this case.
>
> +  int count = CONST_VECTOR_NUNITS (x);
> +  int i;
> +  for (i = 1; i < count; i++)
>
> Push the int into the for initializer.
> Push the rhs of the count assignment into the for condition and drop
> the definition of count.

Ok, done.

>
> +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */
>
> I'd prefer scan-assembler-times or do you have a particular reason to
> avoid it in these tests?

No reason.
Here's the patch updated as per your feedback.

How's this?

Thanks,
Kyrill

2015-10-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md
(*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
     * config/aarch64/aarch64-simd.md
(*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
     * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
     (aarch64_fpconst_pow_of_2): New function.
     (aarch64_vec_fpconst_pow_of_2): Likewise.
     * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
     prototype.
     (aarch64_vec_fpconst_pow_of_2): Likewise.
     * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
     (aarch64_fp_vec_pow2): Likewise.

2015-10-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/fmul_fcvt_1.c: New test.
     * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-fmul-fcvt.patch --]
[-- Type: text/x-patch; name=aarch64-fmul-fcvt.patch, Size: 12313 bytes --]

commit dc55192dd5f54f69d659ea6cc703c5fc2dc9b88b
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Thu Oct 8 15:17:47 2015 +0100

    [AArch64] Add fmul+fcvt optimisation

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a8ac8d3..309dcfb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
+int aarch64_fpconst_pow_of_2 (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 						       machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
+int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 const char *aarch64_output_move_struct (rtx *operands);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 167277e..cf1ff6d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1654,6 +1654,26 @@ (define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2"
   [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
 )
 
+(define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
+  [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
+	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
+			       [(mult:VDQF
+	 (match_operand:VDQF 1 "register_operand" "w")
+	 (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))]
+			       UNSPEC_FRINTZ)))]
+  "TARGET_SIMD
+   && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (GET_MODE_INNER (<VDQF:MODE>mode)))"
+  {
+    int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]);
+    char buf[64];
+    snprintf (buf, 64, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);
+    output_asm_insn (buf, operands);
+    return "";
+  }
+  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+)
+
 (define_expand "<optab><VDQF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e304d40..17e59e1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6791,6 +6791,19 @@ cost_plus:
 	  else
 	    *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
 	}
+
+      /* We can combine fmul by a power of 2 followed by a fcvt into a single
+	 fixed-point fcvt.  */
+      if (GET_CODE (x) == MULT
+	  && ((VECTOR_MODE_P (mode)
+	       && aarch64_vec_fpconst_pow_of_2 (XEXP (x, 1)) > 0)
+	      || aarch64_fpconst_pow_of_2 (XEXP (x, 1)) > 0))
+	{
+	  *cost += rtx_cost (XEXP (x, 0), VOIDmode, (rtx_code) code,
+			     0, speed);
+	  return true;
+	}
+
       *cost += rtx_cost (x, VOIDmode, (enum rtx_code) code, 0, speed);
       return true;
 
@@ -13357,6 +13370,52 @@ aarch64_reorg (void)
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG aarch64_reorg
 
+
+/* If X is a positive CONST_DOUBLE with a value that is a power of 2
+   return the log2 of that value.  Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow_of_2 (rtx x)
+{
+  const REAL_VALUE_TYPE *r;
+
+  if (!CONST_DOUBLE_P (x))
+    return -1;
+
+  r = CONST_DOUBLE_REAL_VALUE (x);
+
+  if (REAL_VALUE_NEGATIVE (*r)
+      || REAL_VALUE_ISNAN (*r)
+      || REAL_VALUE_ISINF (*r)
+      || !real_isinteger (r, DFmode))
+    return -1;
+
+  return exact_log2 (real_to_integer (r));
+}
+
+/* If X is a vector of equal CONST_DOUBLE values and that value is
+   Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
+
+int
+aarch64_vec_fpconst_pow_of_2 (rtx x)
+{
+  if (GET_CODE (x) != CONST_VECTOR)
+    return -1;
+
+  if (GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_FLOAT)
+    return -1;
+
+  int firstval = aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, 0));
+  if (firstval <= 0)
+    return -1;
+
+  for (int i = 1; i < CONST_VECTOR_NUNITS (x); i++)
+    if (aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, i)) != firstval)
+      return -1;
+
+  return firstval;
+}
+
 /* Implement TARGET_PROMOTED_TYPE to promote __fp16 to float.  */
 static tree
 aarch64_promoted_type (const_tree t)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a3ec371..35f9877 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4287,6 +4287,25 @@ (define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2"
   [(set_attr "type" "f_cvtf2i")]
 )
 
+(define_insn "*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(FIXUORS:GPI
+	  (mult:GPF
+	    (match_operand:GPF 1 "register_operand" "w")
+	    (match_operand:GPF 2 "aarch64_fp_pow2" "F"))))]
+  "TARGET_FLOAT
+   && IN_RANGE (aarch64_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (<GPI:MODE>mode))"
+  {
+    int fbits = aarch64_fpconst_pow_of_2 (operands[2]);
+    char buf[64];
+    snprintf (buf, 64, "fcvtz<su>\\t%%<GPI:w>0, %%<GPF:s>1, #%d", fbits);
+    output_asm_insn (buf, operands);
+    return "";
+  }
+  [(set_attr "type" "f_cvtf2i")]
+)
+
 ;; fma - no throw
 
 (define_insn "fma<mode>4"
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 8af4b81..1bcbf62 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -87,6 +87,13 @@ (define_predicate "aarch64_fp_compare_operand"
        (and (match_code "const_double")
 	    (match_test "aarch64_float_const_zero_rtx_p (op)"))))
 
+(define_predicate "aarch64_fp_pow2"
+  (and (match_code "const_double")
+	(match_test "aarch64_fpconst_pow_of_2 (op) > 0")))
+
+(define_predicate "aarch64_fp_vec_pow2"
+  (match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0"))
+
 (define_predicate "aarch64_plus_immediate"
   (and (match_code "const_int")
        (ior (match_test "aarch64_uimm12_shift (INTVAL (op))")
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
new file mode 100644
index 0000000..4e3ace7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+#define FUNC_DEFS(__a)	\
+int			\
+sffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}			\
+			\
+unsigned int		\
+usffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}			\
+			\
+long			\
+lsffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}			\
+			\
+unsigned long		\
+ulsffoo##__a (float x)	\
+{			\
+  return x * __a##.0f;	\
+}
+
+#define FUNC_DEFD(__a)	\
+long			\
+dffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}			\
+			\
+unsigned long		\
+udffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}			\
+int			\
+sdffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}			\
+			\
+unsigned int		\
+usdffoo##__a (double x)	\
+{			\
+  return x * __a##.0;	\
+}
+
+FUNC_DEFS (4)
+FUNC_DEFD (4)
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#2" 1 } } */
+
+FUNC_DEFS (8)
+FUNC_DEFD (8)
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#3" 1 } } */
+
+FUNC_DEFS (16)
+FUNC_DEFD (16)
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" 1 } } */
+
+
+#define FUNC_TESTS(__a, __b)					\
+do								\
+  {								\
+    if (sffoo##__a (__b) != (int)(__b * __a))			\
+      __builtin_abort ();					\
+    if (usffoo##__a (__b) != (unsigned int)(__b * __a))	\
+      __builtin_abort ();					\
+    if (lsffoo##__a (__b) != (long)(__b * __a))		\
+      __builtin_abort ();					\
+    if (ulsffoo##__a (__b) != (unsigned long)(__b * __a))	\
+      __builtin_abort ();					\
+  } while (0)
+
+#define FUNC_TESTD(__a, __b)					\
+do								\
+  {								\
+    if (dffoo##__a (__b) != (long)(__b * __a))			\
+      __builtin_abort ();					\
+    if (udffoo##__a (__b) != (unsigned long)(__b * __a))	\
+      __builtin_abort ();					\
+    if (sdffoo##__a (__b) != (int)(__b * __a))			\
+      __builtin_abort ();					\
+    if (usdffoo##__a (__b) != (unsigned int)(__b * __a))	\
+      __builtin_abort ();					\
+  } while (0)
+
+int
+main (void)
+{
+  float i;
+
+  for (i = -0.001; i < 32.0; i += 1.0f)
+    {
+      FUNC_TESTS (4, i);
+      FUNC_TESTS (8, i);
+      FUNC_TESTS (16, i);
+
+      FUNC_TESTD (4, i);
+      FUNC_TESTD (8, i);
+      FUNC_TESTD (16, i);
+    }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
new file mode 100644
index 0000000..d8a9335
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -ftree-vectorize -fno-inline" } */
+
+#define N 1024
+
+#define FUNC_DEF(__a)		\
+void				\
+foo##__a (float *a, int *b)	\
+{				\
+  int i;			\
+  for (i = 0; i < N; i++)	\
+    b[i] = a[i] * __a##.0f;	\
+}
+
+FUNC_DEF (4)
+FUNC_DEF (8)
+FUNC_DEF (16)
+
+int ints[N];
+float floats[N];
+
+void
+reset_ints (int *arr)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    arr[i] = 0;
+}
+
+void
+check_result (int *is, int n)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    if (is[i] != i * n)
+      __builtin_abort ();
+}
+
+#define FUNC_CHECK(__a)		\
+do				\
+  {				\
+    reset_ints (ints);		\
+    foo##__a (floats, ints);	\
+    check_result (ints, __a);	\
+  } while (0)
+
+
+int
+main (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    floats[i] = (float) i;
+
+  FUNC_CHECK (4);
+  FUNC_CHECK (8);
+  FUNC_CHECK (16);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "fmul\tv\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#2" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#3" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#4" 1 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
  2015-10-20 15:48   ` Kyrill Tkachov
@ 2015-10-20 15:58     ` Marcus Shawcroft
  0 siblings, 0 replies; 7+ messages in thread
From: Marcus Shawcroft @ 2015-10-20 15:58 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

On 20 October 2015 at 16:47, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> Here's the patch updated as per your feedback.
>
> How's this?
>
> Thanks,
> Kyrill
>
> 2015-10-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md
> (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
>     * config/aarch64/aarch64-simd.md
> (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
>     * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
>     (aarch64_fpconst_pow_of_2): New function.
>     (aarch64_vec_fpconst_pow_of_2): Likewise.
>     * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
>     prototype.
>     (aarch64_vec_fpconst_pow_of_2): Likewise.
>     * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>     (aarch64_fp_vec_pow2): Likewise.
>
> 2015-10-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>
>     * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>     * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
>

OK /Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
  2015-10-20 15:31 ` Marcus Shawcroft
  2015-10-20 15:48   ` Kyrill Tkachov
@ 2015-10-20 16:30   ` Ramana Radhakrishnan
  2015-10-20 16:34     ` Kyrill Tkachov
  1 sibling, 1 reply; 7+ messages in thread
From: Ramana Radhakrishnan @ 2015-10-20 16:30 UTC (permalink / raw)
  To: Marcus Shawcroft
  Cc: Kyrill Tkachov, GCC Patches, Marcus Shawcroft, Richard Earnshaw,
	James Greenhalgh

On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft
<marcus.shawcroft@gmail.com> wrote:
> On 19 October 2015 at 14:57, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * config/aarch64/aarch64.md
>>  (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
>>     * config/aarch64/aarch64-simd.md
>>  (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
>>     * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
>>     (aarch64_fpconst_pow_of_2): New function.
>>     (aarch64_vec_fpconst_pow_of_2): Likewise.
>>     * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
>>     prototype.
>>     (aarch64_vec_fpconst_pow_of_2): Likewise.
>>     * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>>     (aarch64_fp_vec_pow2): Likewise.
>>
>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>>     * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
>
> +    char buf[64];
> +    sprintf (buf, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);
>
> Prefer snprintf please

Should we also update our coding standards for this ? i.e. use the non
`n' versions of the string functions only if you have a very good
reason. Even more so in the run time support libraries !

Maybe we can go to the extreme of poisoning sprintf too, but that's
probably an extreme...

regards
Ramana


>
> +  }
> +  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
> +)
> +
> +
>
> Superflous blank line here ?
>
> +  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
> +     (enum rtx_code) code, 0, speed);
>
> My understanding is the unnecessary use of enum  is now discouraged,
> (rtx_code) is sufficient in this case.
>
> +  int count = CONST_VECTOR_NUNITS (x);
> +  int i;
> +  for (i = 1; i < count; i++)
>
> Push the int into the for initializer.
> Push the rhs of the count assignment into the for condition and drop
> the definition of count.
>
> +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */
>
> I'd prefer scan-assembler-times or do you have a particular reason to
> avoid it in these tests?
>
> Cheers
> /Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
  2015-10-20 16:30   ` Ramana Radhakrishnan
@ 2015-10-20 16:34     ` Kyrill Tkachov
  2015-10-20 17:14       ` Marcus Shawcroft
  0 siblings, 1 reply; 7+ messages in thread
From: Kyrill Tkachov @ 2015-10-20 16:34 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Marcus Shawcroft
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, James Greenhalgh


On 20/10/15 17:28, Ramana Radhakrishnan wrote:
> On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft
> <marcus.shawcroft@gmail.com> wrote:
>> On 19 October 2015 at 14:57, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * config/aarch64/aarch64.md
>>>   (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
>>>      * config/aarch64/aarch64-simd.md
>>>   (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
>>>      * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
>>>      (aarch64_fpconst_pow_of_2): New function.
>>>      (aarch64_vec_fpconst_pow_of_2): Likewise.
>>>      * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
>>>      prototype.
>>>      (aarch64_vec_fpconst_pow_of_2): Likewise.
>>>      * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>>>      (aarch64_fp_vec_pow2): Likewise.
>>>
>>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>>>      * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
>> +    char buf[64];
>> +    sprintf (buf, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);
>>
>> Prefer snprintf please
> Should we also update our coding standards for this ? i.e. use the non
> `n' versions of the string functions only if you have a very good
> reason. Even more so in the run time support libraries !

My understanding is one should *really* use snprintf if any part of the
string comes from the user. So in this case it probably doesn't make a
difference. However, it's not hard to use the 'n' version and thinking
about the maximum length is probably a good practice to get into.

Kyrill

>
> Maybe we can go to the extreme of poisoning sprintf too, but that's
> probably an extreme...
>
> regards
> Ramana
>
>
>> +  }
>> +  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
>> +)
>> +
>> +
>>
>> Superflous blank line here ?
>>
>> +  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
>> +     (enum rtx_code) code, 0, speed);
>>
>> My understanding is the unnecessary use of enum  is now discouraged,
>> (rtx_code) is sufficient in this case.
>>
>> +  int count = CONST_VECTOR_NUNITS (x);
>> +  int i;
>> +  for (i = 1; i < count; i++)
>>
>> Push the int into the for initializer.
>> Push the rhs of the count assignment into the for condition and drop
>> the definition of count.
>>
>> +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */
>>
>> I'd prefer scan-assembler-times or do you have a particular reason to
>> avoid it in these tests?
>>
>> Cheers
>> /Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
  2015-10-20 16:34     ` Kyrill Tkachov
@ 2015-10-20 17:14       ` Marcus Shawcroft
  0 siblings, 0 replies; 7+ messages in thread
From: Marcus Shawcroft @ 2015-10-20 17:14 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Ramana Radhakrishnan, GCC Patches, Marcus Shawcroft,
	Richard Earnshaw, James Greenhalgh

On 20 October 2015 at 17:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
> On 20/10/15 17:28, Ramana Radhakrishnan wrote:
>>
>> On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft
>> <marcus.shawcroft@gmail.com> wrote:
>>>
>>> On 19 October 2015 at 14:57, Kyrill Tkachov <kyrylo.tkachov@arm.com>
>>> wrote:
>>>
>>>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * config/aarch64/aarch64.md
>>>>   (*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult): New pattern.
>>>>      * config/aarch64/aarch64-simd.md
>>>>   (*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult): Likewise.
>>>>      * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above
>>>> patterns.
>>>>      (aarch64_fpconst_pow_of_2): New function.
>>>>      (aarch64_vec_fpconst_pow_of_2): Likewise.
>>>>      * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2):
>>>> Declare
>>>>      prototype.
>>>>      (aarch64_vec_fpconst_pow_of_2): Likewise.
>>>>      * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>>>>      (aarch64_fp_vec_pow2): Likewise.
>>>>
>>>> 2015-10-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>>>>      * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
>>>
>>> +    char buf[64];
>>> +    sprintf (buf, "fcvtz<su>\\t%%0.<Vtype>, %%1.<Vtype>, #%d", fbits);
>>>
>>> Prefer snprintf please
>>
>> Should we also update our coding standards for this ? i.e. use the non
>> `n' versions of the string functions only if you have a very good
>> reason. Even more so in the run time support libraries !
>
>
> My understanding is one should *really* use snprintf if any part of the
> string comes from the user. So in this case it probably doesn't make a
> difference. However, it's not hard to use the 'n' version and thinking
> about the maximum length is probably a good practice to get into.

Using snprintf here means that if/when someone fettles around with the
pattern in the future they won't silently over run the buffer that
they forgot to resize for their new longer pattern. Although in the
case the likely hood of that happening seems rather small it is good
practice to simply avoid the none n version.

Cheers
/Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-10-20 17:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-19 13:58 [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation Kyrill Tkachov
2015-10-20 15:31 ` Marcus Shawcroft
2015-10-20 15:48   ` Kyrill Tkachov
2015-10-20 15:58     ` Marcus Shawcroft
2015-10-20 16:30   ` Ramana Radhakrishnan
2015-10-20 16:34     ` Kyrill Tkachov
2015-10-20 17:14       ` Marcus Shawcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).