public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] arm: Add new vector mode macros
@ 2020-09-08 13:44 Richard Sandiford
  2020-09-16 10:15 ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-09-08 13:44 UTC (permalink / raw)
  To: gcc-patches
  Cc: nickc, richard.earnshaw, ramana.radhakrishnan, kyrylo.tkachov,
	dennis.zhang

[ This is related to Dennis's subtraction patch
  https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
  and the discussion about how the patterns were written.  I wanted
  to see whether there was a way that we could simplify the current
  addition handling that might perhaps make it easier to add other
  MVE operations in future.  It seemed like one of those situations
  in which the most productive thing would be to try it and see,
  rather than just describe it in words.

  One of the questions Ramana had in the thread above was: why does
  MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
  is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
  flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
  doesn't need flag_unsafe_math_optimizations.) ]

The AArch32 port now has three vector extensions: iwMMXt, Neon
and MVE.  We already have some named expanders that are shared
by all three, and soon we'll need more.

One way of handling this would be to use define_mode_iterators
that specify the condition for each mode.  For example,

  (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
  (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
  ...
  (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")

etc.  However, we'll need several mode iterators, and it would
be repetitive to specify the mode condition every time.

This patch therefore introduces per-mode macros that say whether
we can perform general arithmetic on the mode.  Initially there are
two sets of macros:

ARM_HAVE_NEON_<MODE>_ARITH
  true if Neon can handle general arithmetic on <MODE>

ARM_HAVE_<MODE>_ARITH
  true if any vector extension can handle general arithmetic on <MODE>

The macro definitions themselves are undeniably ugly, but hopefully
they're justified by the simplifications they allow.

The patch converts the addition patterns to use this scheme.

Previously there were three copies of the V8HF and V4HF addition
patterns for Neon:

(1) *add<VDQ:mode>3_neon, which provided plus:VnHF even without
    TARGET_NEON_FP16INST.  This was probably harmless since all the
    named patterns had an appropriate guard, but it is possible that
    something could have tried to generate the plus directly, such as
    by using a REG_EQUAL note to generate a new pattern.

(2) addv8hf3_neon and addv4hf3, which had the correct
    TARGET_NEON_FP16INST target condition, but unnecessarily required
    flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
    operations do not force flush to zero.

(3) add<VH:mode>3_fp16, which provided plus:VnHF with the
    correct conditions (TARGET_NEON_FP16INST, with no
    flag_unsafe_math_optimizations test).

The patch in essence renames add<VH:mode>3_fp16 to *add<VH:mode>3_neon
(part of *add<VDQ:mode>3_neon) and removes the other two patterns.

WDYT?  Does this look like a way forward?

Tested on arm-linux-gnueabihf and armeb-eabi.

Thanks,
Richard


gcc/
	* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
	(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
	(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
	(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
	(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
	(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
	(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
	(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
	(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
	Likewise.
	* config/arm/iterators.md (VNIM, VNINOTM): Delete.
	* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
	(add<VNINOTM:mode>3): Replace with...
	(add<VDQ:mode>3): ...this new expander.
	* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
	ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
	(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
	favor of the above.
	(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
	gen_add<mode>3_fp16.
---
 gcc/config/arm/arm.h         | 41 +++++++++++++++++++++++++++++++
 gcc/config/arm/iterators.md  |  8 ------
 gcc/config/arm/neon.md       | 47 ++----------------------------------
 gcc/config/arm/vec-common.md | 42 ++++----------------------------
 4 files changed, 48 insertions(+), 90 deletions(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3887c51eebe..3284ae29d7c 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1106,6 +1106,47 @@ extern const int arm_arch_cde_coproc_bits[];
 #define VALID_MVE_STRUCT_MODE(MODE) \
   ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
 
+/* The conditions under which vector modes are supported for general
+   arithmetic using Neon.  */
+
+#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
+
+#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
+
+/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
+#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
+#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
+
+/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
+   only use them for general arithmetic when -funsafe-math-optimizations
+   is in effect.  */
+#define ARM_HAVE_NEON_V2SF_ARITH \
+  (TARGET_NEON && flag_unsafe_math_optimizations)
+#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
+
+/* The conditions under which vector modes are supported for general
+   arithmetic by any vector extension.  */
+
+#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH || TARGET_REALLY_IWMMXT)
+#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH || TARGET_REALLY_IWMMXT)
+#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH || TARGET_REALLY_IWMMXT)
+
+#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
+
+#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
+#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
+
+#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH || TARGET_HAVE_MVE_FLOAT)
+#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH || TARGET_HAVE_MVE_FLOAT)
+
 /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
 extern int arm_regs_in_sequence[];
 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..c70e3bc2731 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
-;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
-;; arithmetic epxand patterns.
-(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
-
-;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
-;; arithmetic epxand patterns.
-(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
-
 ;; Integer and float modes supported by Neon, IWMMXT and MVE.
 (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab6..96bf277f501 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
 		  (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
                     (const_string "neon_add<q>")))]
 )
 
-;; As with SFmode, full support for HFmode vector arithmetic is only available
-;; when flag-unsafe-math-optimizations is enabled.
-
-;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
-;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
-;; from "addv8hf3" standard pattern inside vec-common.md file.
-
-(define_insn "addv8hf3_neon"
-  [(set
-    (match_operand:V8HF 0 "s_register_operand" "=w")
-    (plus:V8HF
-     (match_operand:V8HF 1 "s_register_operand" "w")
-     (match_operand:V8HF 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_addsub_s_q")]
-)
-
-(define_insn "addv4hf3"
-  [(set
-    (match_operand:V4HF 0 "s_register_operand" "=w")
-    (plus:V4HF
-     (match_operand:V4HF 1 "s_register_operand" "w")
-     (match_operand:V4HF 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_addsub_s_q")]
-)
-
-(define_insn "add<mode>3_fp16"
-  [(set
-    (match_operand:VH 0 "s_register_operand" "=w")
-    (plus:VH
-     (match_operand:VH 1 "s_register_operand" "w")
-     (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST"
- "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set (attr "type")
-   (if_then_else (match_test "<Is_float_mode>")
-    (const_string "neon_fp_addsub_s<q>")
-    (const_string "neon_add<q>")))]
-)
-
 (define_insn "*sub<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
@@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
    (match_operand:VH 2 "s_register_operand")]
   "TARGET_NEON_FP16INST"
 {
-  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b7e3619caf4..c3c86c46355 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -81,43 +81,11 @@ (define_expand "movv8hf"
 ;; patterns separately for Neon, IWMMXT and MVE.
 
 (define_expand "add<mode>3"
-  [(set (match_operand:VNIM 0 "s_register_operand")
-	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
-		   (match_operand:VNIM 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
-   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
-   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
-{
-})
-
-;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
-;; patterns separately for Neon and MVE.
-
-(define_expand "addv8hf3"
-  [(set (match_operand:V8HF 0 "s_register_operand")
-	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
-		   (match_operand:V8HF 2 "s_register_operand")))]
-  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
-   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
-{
-  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
-    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
-})
-
-;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
-;; patterns separately for Neon and IWMMXT.
-
-(define_expand "add<mode>3"
-  [(set (match_operand:VNINOTM 0 "s_register_operand")
-	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
-		      (match_operand:VNINOTM 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+  [(set (match_operand:VDQ 0 "s_register_operand")
+	(plus:VDQ (match_operand:VDQ 1 "s_register_operand")
+		  (match_operand:VDQ 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH"
+)
 
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: Add new vector mode macros
  2020-09-08 13:44 [PATCH] arm: Add new vector mode macros Richard Sandiford
@ 2020-09-16 10:15 ` Richard Sandiford
  2020-09-16 10:49   ` Kyrylo Tkachov
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-09-16 10:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: nickc, richard.earnshaw, ramana.radhakrishnan, kyrylo.tkachov,
	dennis.zhang

Ping

Richard Sandiford <richard.sandiford@arm.com> writes:
> [ This is related to Dennis's subtraction patch
>   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
>   and the discussion about how the patterns were written.  I wanted
>   to see whether there was a way that we could simplify the current
>   addition handling that might perhaps make it easier to add other
>   MVE operations in future.  It seemed like one of those situations
>   in which the most productive thing would be to try it and see,
>   rather than just describe it in words.
>
>   One of the questions Ramana had in the thread above was: why does
>   MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
>   is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
>   flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
>   doesn't need flag_unsafe_math_optimizations.) ]
>
> The AArch32 port now has three vector extensions: iwMMXt, Neon
> and MVE.  We already have some named expanders that are shared
> by all three, and soon we'll need more.
>
> One way of handling this would be to use define_mode_iterators
> that specify the condition for each mode.  For example,
>
>   (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
>   (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
>   ...
>   (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")
>
> etc.  However, we'll need several mode iterators, and it would
> be repetitive to specify the mode condition every time.
>
> This patch therefore introduces per-mode macros that say whether
> we can perform general arithmetic on the mode.  Initially there are
> two sets of macros:
>
> ARM_HAVE_NEON_<MODE>_ARITH
>   true if Neon can handle general arithmetic on <MODE>
>
> ARM_HAVE_<MODE>_ARITH
>   true if any vector extension can handle general arithmetic on <MODE>
>
> The macro definitions themselves are undeniably ugly, but hopefully
> they're justified by the simplifications they allow.
>
> The patch converts the addition patterns to use this scheme.
>
> Previously there were three copies of the V8HF and V4HF addition
> patterns for Neon:
>
> (1) *add<VDQ:mode>3_neon, which provided plus:VnHF even without
>     TARGET_NEON_FP16INST.  This was probably harmless since all the
>     named patterns had an appropriate guard, but it is possible that
>     something could have tried to generate the plus directly, such as
>     by using a REG_EQUAL note to generate a new pattern.
>
> (2) addv8hf3_neon and addv4hf3, which had the correct
>     TARGET_NEON_FP16INST target condition, but unnecessarily required
>     flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
>     operations do not force flush to zero.
>
> (3) add<VH:mode>3_fp16, which provided plus:VnHF with the
>     correct conditions (TARGET_NEON_FP16INST, with no
>     flag_unsafe_math_optimizations test).
>
> The patch in essence renames add<VH:mode>3_fp16 to *add<VH:mode>3_neon
> (part of *add<VDQ:mode>3_neon) and removes the other two patterns.
>
> WDYT?  Does this look like a way forward?
>
> Tested on arm-linux-gnueabihf and armeb-eabi.
>
> Thanks,
> Richard
>
>
> gcc/
> 	* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
> 	(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
> 	(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
> 	(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
> 	(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
> 	(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
> 	(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
> 	(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
> 	(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
> 	Likewise.
> 	* config/arm/iterators.md (VNIM, VNINOTM): Delete.
> 	* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
> 	(add<VNINOTM:mode>3): Replace with...
> 	(add<VDQ:mode>3): ...this new expander.
> 	* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
> 	ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
> 	(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
> 	favor of the above.
> 	(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
> 	gen_add<mode>3_fp16.
> ---
>  gcc/config/arm/arm.h         | 41 +++++++++++++++++++++++++++++++
>  gcc/config/arm/iterators.md  |  8 ------
>  gcc/config/arm/neon.md       | 47 ++----------------------------------
>  gcc/config/arm/vec-common.md | 42 ++++----------------------------
>  4 files changed, 48 insertions(+), 90 deletions(-)
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 3887c51eebe..3284ae29d7c 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1106,6 +1106,47 @@ extern const int arm_arch_cde_coproc_bits[];
>  #define VALID_MVE_STRUCT_MODE(MODE) \
>    ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
>  
> +/* The conditions under which vector modes are supported for general
> +   arithmetic using Neon.  */
> +
> +#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
> +
> +#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
> +
> +/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
> +#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
> +#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
> +
> +/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
> +   only use them for general arithmetic when -funsafe-math-optimizations
> +   is in effect.  */
> +#define ARM_HAVE_NEON_V2SF_ARITH \
> +  (TARGET_NEON && flag_unsafe_math_optimizations)
> +#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> +
> +/* The conditions under which vector modes are supported for general
> +   arithmetic by any vector extension.  */
> +
> +#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH || TARGET_REALLY_IWMMXT)
> +#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH || TARGET_REALLY_IWMMXT)
> +#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH || TARGET_REALLY_IWMMXT)
> +
> +#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH || TARGET_HAVE_MVE)
> +#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH || TARGET_HAVE_MVE)
> +#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH || TARGET_HAVE_MVE)
> +#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
> +
> +#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
> +#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> +
> +#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH || TARGET_HAVE_MVE_FLOAT)
> +#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH || TARGET_HAVE_MVE_FLOAT)
> +
>  /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
>  extern int arm_regs_in_sequence[];
>  
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 0bc9eba0722..c70e3bc2731 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
>  ;; Integer and float modes supported by Neon and IWMMXT.
>  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
>  
> -;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
> -;; arithmetic epxand patterns.
> -(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
> -
> -;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
> -;; arithmetic epxand patterns.
> -(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
> -
>  ;; Integer and float modes supported by Neon, IWMMXT and MVE.
>  (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
>  
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 3e7b51d8ab6..96bf277f501 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
>    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
>          (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
>  		  (match_operand:VDQ 2 "s_register_operand" "w")))]
> -  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +  "ARM_HAVE_NEON_<MODE>_ARITH"
>    "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
>    [(set (attr "type")
>        (if_then_else (match_test "<Is_float_mode>")
> @@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
>                      (const_string "neon_add<q>")))]
>  )
>  
> -;; As with SFmode, full support for HFmode vector arithmetic is only available
> -;; when flag-unsafe-math-optimizations is enabled.
> -
> -;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
> -;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
> -;; from "addv8hf3" standard pattern inside vec-common.md file.
> -
> -(define_insn "addv8hf3_neon"
> -  [(set
> -    (match_operand:V8HF 0 "s_register_operand" "=w")
> -    (plus:V8HF
> -     (match_operand:V8HF 1 "s_register_operand" "w")
> -     (match_operand:V8HF 2 "s_register_operand" "w")))]
> - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> - [(set_attr "type" "neon_fp_addsub_s_q")]
> -)
> -
> -(define_insn "addv4hf3"
> -  [(set
> -    (match_operand:V4HF 0 "s_register_operand" "=w")
> -    (plus:V4HF
> -     (match_operand:V4HF 1 "s_register_operand" "w")
> -     (match_operand:V4HF 2 "s_register_operand" "w")))]
> - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> - [(set_attr "type" "neon_fp_addsub_s_q")]
> -)
> -
> -(define_insn "add<mode>3_fp16"
> -  [(set
> -    (match_operand:VH 0 "s_register_operand" "=w")
> -    (plus:VH
> -     (match_operand:VH 1 "s_register_operand" "w")
> -     (match_operand:VH 2 "s_register_operand" "w")))]
> - "TARGET_NEON_FP16INST"
> - "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> - [(set (attr "type")
> -   (if_then_else (match_test "<Is_float_mode>")
> -    (const_string "neon_fp_addsub_s<q>")
> -    (const_string "neon_add<q>")))]
> -)
> -
>  (define_insn "*sub<mode>3_neon"
>    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
>          (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> @@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
>     (match_operand:VH 2 "s_register_operand")]
>    "TARGET_NEON_FP16INST"
>  {
> -  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index b7e3619caf4..c3c86c46355 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -81,43 +81,11 @@ (define_expand "movv8hf"
>  ;; patterns separately for Neon, IWMMXT and MVE.
>  
>  (define_expand "add<mode>3"
> -  [(set (match_operand:VNIM 0 "s_register_operand")
> -	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
> -		   (match_operand:VNIM 2 "s_register_operand")))]
> -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> -		    || flag_unsafe_math_optimizations))
> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
> -   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
> -   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
> -{
> -})
> -
> -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> -;; patterns separately for Neon and MVE.
> -
> -(define_expand "addv8hf3"
> -  [(set (match_operand:V8HF 0 "s_register_operand")
> -	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
> -		   (match_operand:V8HF 2 "s_register_operand")))]
> -  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
> -   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
> -{
> -  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
> -    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
> -})
> -
> -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> -;; patterns separately for Neon and IWMMXT.
> -
> -(define_expand "add<mode>3"
> -  [(set (match_operand:VNINOTM 0 "s_register_operand")
> -	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
> -		      (match_operand:VNINOTM 2 "s_register_operand")))]
> -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> -		    || flag_unsafe_math_optimizations))
> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
> -{
> -})
> +  [(set (match_operand:VDQ 0 "s_register_operand")
> +	(plus:VDQ (match_operand:VDQ 1 "s_register_operand")
> +		  (match_operand:VDQ 2 "s_register_operand")))]
> +  "ARM_HAVE_<MODE>_ARITH"
> +)
>  
>  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
>  ;; patterns separately for IWMMXT and Neon.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] arm: Add new vector mode macros
  2020-09-16 10:15 ` Richard Sandiford
@ 2020-09-16 10:49   ` Kyrylo Tkachov
  2020-09-22 16:20     ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Kyrylo Tkachov @ 2020-09-16 10:49 UTC (permalink / raw)
  To: Richard Sandiford, gcc-patches
  Cc: nickc, Richard Earnshaw, Ramana Radhakrishnan, Dennis Zhang

Hi Richard,

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 16 September 2020 11:15
> To: gcc-patches@gcc.gnu.org
> Cc: nickc@redhat.com; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> Tkachov <Kyrylo.Tkachov@arm.com>; Dennis Zhang
> <Dennis.Zhang@arm.com>
> Subject: Re: [PATCH] arm: Add new vector mode macros
> 
> Ping
> 
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > [ This is related to Dennis's subtraction patch
> >   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
> >   and the discussion about how the patterns were written.  I wanted
> >   to see whether there was a way that we could simplify the current
> >   addition handling that might perhaps make it easier to add other
> >   MVE operations in future.  It seemed like one of those situations
> >   in which the most productive thing would be to try it and see,
> >   rather than just describe it in words.
> >
> >   One of the questions Ramana had in the thread above was: why does
> >   MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
> >   is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
> >   flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
> >   doesn't need flag_unsafe_math_optimizations.) ]
> >
> > The AArch32 port now has three vector extensions: iwMMXt, Neon
> > and MVE.  We already have some named expanders that are shared
> > by all three, and soon we'll need more.
> >
> > One way of handling this would be to use define_mode_iterators
> > that specify the condition for each mode.  For example,
> >
> >   (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
> >   (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
> >   ...
> >   (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")
> >
> > etc.  However, we'll need several mode iterators, and it would
> > be repetitive to specify the mode condition every time.
> >
> > This patch therefore introduces per-mode macros that say whether
> > we can perform general arithmetic on the mode.  Initially there are
> > two sets of macros:
> >
> > ARM_HAVE_NEON_<MODE>_ARITH
> >   true if Neon can handle general arithmetic on <MODE>
> >
> > ARM_HAVE_<MODE>_ARITH
> >   true if any vector extension can handle general arithmetic on <MODE>
> >
> > The macro definitions themselves are undeniably ugly, but hopefully
> > they're justified by the simplifications they allow.
> >
> > The patch converts the addition patterns to use this scheme.
> >
> > Previously there were three copies of the V8HF and V4HF addition
> > patterns for Neon:
> >
> > (1) *add<VDQ:mode>3_neon, which provided plus:VnHF even without
> >     TARGET_NEON_FP16INST.  This was probably harmless since all the
> >     named patterns had an appropriate guard, but it is possible that
> >     something could have tried to generate the plus directly, such as
> >     by using a REG_EQUAL note to generate a new pattern.
> >
> > (2) addv8hf3_neon and addv4hf3, which had the correct
> >     TARGET_NEON_FP16INST target condition, but unnecessarily required
> >     flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
> >     operations do not force flush to zero.
> >
> > (3) add<VH:mode>3_fp16, which provided plus:VnHF with the
> >     correct conditions (TARGET_NEON_FP16INST, with no
> >     flag_unsafe_math_optimizations test).
> >
> > The patch in essence renames add<VH:mode>3_fp16 to
> *add<VH:mode>3_neon
> > (part of *add<VDQ:mode>3_neon) and removes the other two patterns.
> >
> > WDYT?  Does this look like a way forward?

This looks like a productive way forward to me.
Okay if the other maintainer don't object by the end of the week.

Thanks,
Kyrill

> >
> > Tested on arm-linux-gnueabihf and armeb-eabi.
> >
> > Thanks,
> > Richard
> >
> >
> > gcc/
> > 	* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH,
> ARM_HAVE_NEON_V4HI_ARITH)
> > 	(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH):
> New macros.
> > 	(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH):
> Likewise.
> > 	(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH):
> Likewise.
> > 	(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH):
> Likewise.
> > 	(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH,
> ARM_HAVE_V4HI_ARITH)
> > 	(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH,
> ARM_HAVE_V8HI_ARITH)
> > 	(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH,
> ARM_HAVE_V4HF_ARITH)
> > 	(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH,
> ARM_HAVE_V4SF_ARITH):
> > 	Likewise.
> > 	* config/arm/iterators.md (VNIM, VNINOTM): Delete.
> > 	* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
> > 	(add<VNINOTM:mode>3): Replace with...
> > 	(add<VDQ:mode>3): ...this new expander.
> > 	* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
> > 	ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
> > 	(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
> > 	favor of the above.
> > 	(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
> > 	gen_add<mode>3_fp16.
> > ---
> >  gcc/config/arm/arm.h         | 41 +++++++++++++++++++++++++++++++
> >  gcc/config/arm/iterators.md  |  8 ------
> >  gcc/config/arm/neon.md       | 47 ++----------------------------------
> >  gcc/config/arm/vec-common.md | 42 ++++----------------------------
> >  4 files changed, 48 insertions(+), 90 deletions(-)
> >
> > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > index 3887c51eebe..3284ae29d7c 100644
> > --- a/gcc/config/arm/arm.h
> > +++ b/gcc/config/arm/arm.h
> > @@ -1106,6 +1106,47 @@ extern const int arm_arch_cde_coproc_bits[];
> >  #define VALID_MVE_STRUCT_MODE(MODE) \
> >    ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
> >
> > +/* The conditions under which vector modes are supported for general
> > +   arithmetic using Neon.  */
> > +
> > +#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
> > +
> > +#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
> > +
> > +/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
> > +#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
> > +#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
> > +
> > +/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
> > +   only use them for general arithmetic when -funsafe-math-optimizations
> > +   is in effect.  */
> > +#define ARM_HAVE_NEON_V2SF_ARITH \
> > +  (TARGET_NEON && flag_unsafe_math_optimizations)
> > +#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> > +
> > +/* The conditions under which vector modes are supported for general
> > +   arithmetic by any vector extension.  */
> > +
> > +#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH ||
> TARGET_REALLY_IWMMXT)
> > +#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH ||
> TARGET_REALLY_IWMMXT)
> > +#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH ||
> TARGET_REALLY_IWMMXT)
> > +
> > +#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH ||
> TARGET_HAVE_MVE)
> > +#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH ||
> TARGET_HAVE_MVE)
> > +#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH ||
> TARGET_HAVE_MVE)
> > +#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
> > +
> > +#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
> > +#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> > +
> > +#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH ||
> TARGET_HAVE_MVE_FLOAT)
> > +#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH ||
> TARGET_HAVE_MVE_FLOAT)
> > +
> >  /* The register numbers in sequence, for passing to
> arm_gen_load_multiple.  */
> >  extern int arm_regs_in_sequence[];
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 0bc9eba0722..c70e3bc2731 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
> >  ;; Integer and float modes supported by Neon and IWMMXT.
> >  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI
> V4SF])
> >
> > -;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
> > -;; arithmetic epxand patterns.
> > -(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
> > -
> > -;; Integer and float modes supported by Neon and IWMMXT but not MVE,
> used by
> > -;; arithmetic epxand patterns.
> > -(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
> > -
> >  ;; Integer and float modes supported by Neon, IWMMXT and MVE.
> >  (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
> >
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> > index 3e7b51d8ab6..96bf277f501 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
> >    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
> >          (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> >  		  (match_operand:VDQ 2 "s_register_operand" "w")))]
> > -  "TARGET_NEON && (!<Is_float_mode> ||
> flag_unsafe_math_optimizations)"
> > +  "ARM_HAVE_NEON_<MODE>_ARITH"
> >    "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> >    [(set (attr "type")
> >        (if_then_else (match_test "<Is_float_mode>")
> > @@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
> >                      (const_string "neon_add<q>")))]
> >  )
> >
> > -;; As with SFmode, full support for HFmode vector arithmetic is only
> available
> > -;; when flag-unsafe-math-optimizations is enabled.
> > -
> > -;; Add pattern with modes V8HF and V4HF is split into separate patterns to
> add
> > -;; support for standard pattern addv8hf3 in MVE.  Following pattern is
> called
> > -;; from "addv8hf3" standard pattern inside vec-common.md file.
> > -
> > -(define_insn "addv8hf3_neon"
> > -  [(set
> > -    (match_operand:V8HF 0 "s_register_operand" "=w")
> > -    (plus:V8HF
> > -     (match_operand:V8HF 1 "s_register_operand" "w")
> > -     (match_operand:V8HF 2 "s_register_operand" "w")))]
> > - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> > - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > - [(set_attr "type" "neon_fp_addsub_s_q")]
> > -)
> > -
> > -(define_insn "addv4hf3"
> > -  [(set
> > -    (match_operand:V4HF 0 "s_register_operand" "=w")
> > -    (plus:V4HF
> > -     (match_operand:V4HF 1 "s_register_operand" "w")
> > -     (match_operand:V4HF 2 "s_register_operand" "w")))]
> > - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> > - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > - [(set_attr "type" "neon_fp_addsub_s_q")]
> > -)
> > -
> > -(define_insn "add<mode>3_fp16"
> > -  [(set
> > -    (match_operand:VH 0 "s_register_operand" "=w")
> > -    (plus:VH
> > -     (match_operand:VH 1 "s_register_operand" "w")
> > -     (match_operand:VH 2 "s_register_operand" "w")))]
> > - "TARGET_NEON_FP16INST"
> > - "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > - [(set (attr "type")
> > -   (if_then_else (match_test "<Is_float_mode>")
> > -    (const_string "neon_fp_addsub_s<q>")
> > -    (const_string "neon_add<q>")))]
> > -)
> > -
> >  (define_insn "*sub<mode>3_neon"
> >    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
> >          (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> > @@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
> >     (match_operand:VH 2 "s_register_operand")]
> >    "TARGET_NEON_FP16INST"
> >  {
> > -  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1],
> operands[2]));
> > +  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
> >    DONE;
> >  })
> >
> > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> > index b7e3619caf4..c3c86c46355 100644
> > --- a/gcc/config/arm/vec-common.md
> > +++ b/gcc/config/arm/vec-common.md
> > @@ -81,43 +81,11 @@ (define_expand "movv8hf"
> >  ;; patterns separately for Neon, IWMMXT and MVE.
> >
> >  (define_expand "add<mode>3"
> > -  [(set (match_operand:VNIM 0 "s_register_operand")
> > -	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
> > -		   (match_operand:VNIM 2 "s_register_operand")))]
> > -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode !=
> V4SFmode)
> > -		    || flag_unsafe_math_optimizations))
> > -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE
> (<MODE>mode))
> > -   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
> > -   || (TARGET_HAVE_MVE_FLOAT &&
> VALID_MVE_SF_MODE(<MODE>mode))"
> > -{
> > -})
> > -
> > -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> > -;; patterns separately for Neon and MVE.
> > -
> > -(define_expand "addv8hf3"
> > -  [(set (match_operand:V8HF 0 "s_register_operand")
> > -	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
> > -		   (match_operand:V8HF 2 "s_register_operand")))]
> > -  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
> > -   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
> > -{
> > -  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
> > -    emit_insn (gen_addv8hf3_neon (operands[0], operands[1],
> operands[2]));
> > -})
> > -
> > -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> > -;; patterns separately for Neon and IWMMXT.
> > -
> > -(define_expand "add<mode>3"
> > -  [(set (match_operand:VNINOTM 0 "s_register_operand")
> > -	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
> > -		      (match_operand:VNINOTM 2 "s_register_operand")))]
> > -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode !=
> V4SFmode)
> > -		    || flag_unsafe_math_optimizations))
> > -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE
> (<MODE>mode))"
> > -{
> > -})
> > +  [(set (match_operand:VDQ 0 "s_register_operand")
> > +	(plus:VDQ (match_operand:VDQ 1 "s_register_operand")
> > +		  (match_operand:VDQ 2 "s_register_operand")))]
> > +  "ARM_HAVE_<MODE>_ARITH"
> > +)
> >
> >  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
> >  ;; patterns separately for IWMMXT and Neon.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: Add new vector mode macros
  2020-09-16 10:49   ` Kyrylo Tkachov
@ 2020-09-22 16:20     ` Richard Sandiford
  2020-09-29 10:26       ` Ping: " Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-09-22 16:20 UTC (permalink / raw)
  To: Kyrylo Tkachov
  Cc: gcc-patches, nickc, Richard Earnshaw, Ramana Radhakrishnan, Dennis Zhang

Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
> Hi Richard,
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: 16 September 2020 11:15
>> To: gcc-patches@gcc.gnu.org
>> Cc: nickc@redhat.com; Richard Earnshaw <Richard.Earnshaw@arm.com>;
>> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
>> Tkachov <Kyrylo.Tkachov@arm.com>; Dennis Zhang
>> <Dennis.Zhang@arm.com>
>> Subject: Re: [PATCH] arm: Add new vector mode macros
>> 
>> Ping
>> 
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > [ This is related to Dennis's subtraction patch
>> >   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
>> >   and the discussion about how the patterns were written.  I wanted
>> >   to see whether there was a way that we could simplify the current
>> >   addition handling that might perhaps make it easier to add other
>> >   MVE operations in future.  It seemed like one of those situations
>> >   in which the most productive thing would be to try it and see,
>> >   rather than just describe it in words.
>> >
>> >   One of the questions Ramana had in the thread above was: why does
>> >   MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
>> >   is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
>> >   flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
>> >   doesn't need flag_unsafe_math_optimizations.) ]
>> >
>> > The AArch32 port now has three vector extensions: iwMMXt, Neon
>> > and MVE.  We already have some named expanders that are shared
>> > by all three, and soon we'll need more.
>> >
>> > One way of handling this would be to use define_mode_iterators
>> > that specify the condition for each mode.  For example,
>> >
>> >   (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
>> >   (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
>> >   ...
>> >   (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")
>> >
>> > etc.  However, we'll need several mode iterators, and it would
>> > be repetitive to specify the mode condition every time.
>> >
>> > This patch therefore introduces per-mode macros that say whether
>> > we can perform general arithmetic on the mode.  Initially there are
>> > two sets of macros:
>> >
>> > ARM_HAVE_NEON_<MODE>_ARITH
>> >   true if Neon can handle general arithmetic on <MODE>
>> >
>> > ARM_HAVE_<MODE>_ARITH
>> >   true if any vector extension can handle general arithmetic on <MODE>
>> >
>> > The macro definitions themselves are undeniably ugly, but hopefully
>> > they're justified by the simplifications they allow.
>> >
>> > The patch converts the addition patterns to use this scheme.
>> >
>> > Previously there were three copies of the V8HF and V4HF addition
>> > patterns for Neon:
>> >
>> > (1) *add<VDQ:mode>3_neon, which provided plus:VnHF even without
>> >     TARGET_NEON_FP16INST.  This was probably harmless since all the
>> >     named patterns had an appropriate guard, but it is possible that
>> >     something could have tried to generate the plus directly, such as
>> >     by using a REG_EQUAL note to generate a new pattern.
>> >
>> > (2) addv8hf3_neon and addv4hf3, which had the correct
>> >     TARGET_NEON_FP16INST target condition, but unnecessarily required
>> >     flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
>> >     operations do not force flush to zero.
>> >
>> > (3) add<VH:mode>3_fp16, which provided plus:VnHF with the
>> >     correct conditions (TARGET_NEON_FP16INST, with no
>> >     flag_unsafe_math_optimizations test).
>> >
>> > The patch in essence renames add<VH:mode>3_fp16 to
>> *add<VH:mode>3_neon
>> > (part of *add<VDQ:mode>3_neon) and removes the other two patterns.
>> >
>> > WDYT?  Does this look like a way forward?
>
> This looks like a productive way forward to me.
> Okay if the other maintainer don't object by the end of the week.

Thanks.  Dennis pointed out off-list that it regressed
armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
flush-to-zero control.

This version therefore updates the test to expect the same output
as the -ffast-math version.

Tested on arm-linux-gnueabi (hopefully for real this time -- I must
have messed up the testing last time).  OK for trunk?

FWIW, the non-testsuite part is the same as before.

Richard


gcc/
	* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
	(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
	(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
	(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
	(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
	(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
	(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
	(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
	(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
	Likewise.
	* config/arm/iterators.md (VNIM, VNINOTM): Delete.
	* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
	(add<VNINOTM:mode>3): Replace with...
	(add<VDQ:mode>3): ...this new expander.
	* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
	ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
	(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
	favor of the above.
	(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
	gen_add<mode>3_fp16.

gcc/testsuite/
	* gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
	even without -ffast-math.
---
 gcc/config/arm/arm.h                          | 41 ++++++++++++++++
 gcc/config/arm/iterators.md                   |  8 ----
 gcc/config/arm/neon.md                        | 47 +------------------
 gcc/config/arm/vec-common.md                  | 42 ++---------------
 .../gcc.target/arm/armv8_2-fp16-arith-2.c     | 20 +++++---
 5 files changed, 61 insertions(+), 97 deletions(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f4d3676c5bc..4a63d33c70d 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1110,6 +1110,47 @@ extern const int arm_arch_cde_coproc_bits[];
 #define VALID_MVE_STRUCT_MODE(MODE) \
   ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
 
+/* The conditions under which vector modes are supported for general
+   arithmetic using Neon.  */
+
+#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
+
+#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
+
+/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
+#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
+#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
+
+/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
+   only use them for general arithmetic when -funsafe-math-optimizations
+   is in effect.  */
+#define ARM_HAVE_NEON_V2SF_ARITH \
+  (TARGET_NEON && flag_unsafe_math_optimizations)
+#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
+
+/* The conditions under which vector modes are supported for general
+   arithmetic by any vector extension.  */
+
+#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH || TARGET_REALLY_IWMMXT)
+#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH || TARGET_REALLY_IWMMXT)
+#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH || TARGET_REALLY_IWMMXT)
+
+#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
+
+#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
+#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
+
+#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH || TARGET_HAVE_MVE_FLOAT)
+#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH || TARGET_HAVE_MVE_FLOAT)
+
 /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
 extern int arm_regs_in_sequence[];
 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..c70e3bc2731 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
-;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
-;; arithmetic epxand patterns.
-(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
-
-;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
-;; arithmetic epxand patterns.
-(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
-
 ;; Integer and float modes supported by Neon, IWMMXT and MVE.
 (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab6..96bf277f501 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
 		  (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
                     (const_string "neon_add<q>")))]
 )
 
-;; As with SFmode, full support for HFmode vector arithmetic is only available
-;; when flag-unsafe-math-optimizations is enabled.
-
-;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
-;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
-;; from "addv8hf3" standard pattern inside vec-common.md file.
-
-(define_insn "addv8hf3_neon"
-  [(set
-    (match_operand:V8HF 0 "s_register_operand" "=w")
-    (plus:V8HF
-     (match_operand:V8HF 1 "s_register_operand" "w")
-     (match_operand:V8HF 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_addsub_s_q")]
-)
-
-(define_insn "addv4hf3"
-  [(set
-    (match_operand:V4HF 0 "s_register_operand" "=w")
-    (plus:V4HF
-     (match_operand:V4HF 1 "s_register_operand" "w")
-     (match_operand:V4HF 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_addsub_s_q")]
-)
-
-(define_insn "add<mode>3_fp16"
-  [(set
-    (match_operand:VH 0 "s_register_operand" "=w")
-    (plus:VH
-     (match_operand:VH 1 "s_register_operand" "w")
-     (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST"
- "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set (attr "type")
-   (if_then_else (match_test "<Is_float_mode>")
-    (const_string "neon_fp_addsub_s<q>")
-    (const_string "neon_add<q>")))]
-)
-
 (define_insn "*sub<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
@@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
    (match_operand:VH 2 "s_register_operand")]
   "TARGET_NEON_FP16INST"
 {
-  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b7e3619caf4..c3c86c46355 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -81,43 +81,11 @@ (define_expand "movv8hf"
 ;; patterns separately for Neon, IWMMXT and MVE.
 
 (define_expand "add<mode>3"
-  [(set (match_operand:VNIM 0 "s_register_operand")
-	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
-		   (match_operand:VNIM 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
-   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
-   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
-{
-})
-
-;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
-;; patterns separately for Neon and MVE.
-
-(define_expand "addv8hf3"
-  [(set (match_operand:V8HF 0 "s_register_operand")
-	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
-		   (match_operand:V8HF 2 "s_register_operand")))]
-  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
-   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
-{
-  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
-    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
-})
-
-;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
-;; patterns separately for Neon and IWMMXT.
-
-(define_expand "add<mode>3"
-  [(set (match_operand:VNINOTM 0 "s_register_operand")
-	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
-		      (match_operand:VNINOTM 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+  [(set (match_operand:VDQ 0 "s_register_operand")
+	(plus:VDQ (match_operand:VDQ 1 "s_register_operand")
+		  (match_operand:VDQ 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH"
+)
 
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
index 24d0528540d..81bad225a1f 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
@@ -89,17 +89,23 @@ TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
 /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
 
-/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
-/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
-/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
 /* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
 /* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
-
 /* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }  */
-/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, #0} 2 } }  */
-
-/* { dg-final { scan-assembler-not {vabs\.f16} } }  */
 
 /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
 /* { dg-final { scan-assembler-not {vsub\.f32} } }  */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Ping: [PATCH] arm: Add new vector mode macros
  2020-09-22 16:20     ` Richard Sandiford
@ 2020-09-29 10:26       ` Richard Sandiford
  2020-09-29 10:38         ` Kyrylo Tkachov
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-09-29 10:26 UTC (permalink / raw)
  To: Kyrylo Tkachov
  Cc: gcc-patches, nickc, Richard Earnshaw, Ramana Radhakrishnan, Dennis Zhang

Ping

Richard Sandiford <richard.sandiford@arm.com> writes:
> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
>> This looks like a productive way forward to me.
>> Okay if the other maintainer don't object by the end of the week.
>
> Thanks.  Dennis pointed out off-list that it regressed
> armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
> to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
> be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
> flush-to-zero control.
>
> This version therefore updates the test to expect the same output
> as the -ffast-math version.
>
> Tested on arm-linux-gnueabi (hopefully for real this time -- I must
> have messed up the testing last time).  OK for trunk?
>
> FWIW, the non-testsuite part is the same as before.
>
> Richard
>
>
> gcc/
> 	* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
> 	(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
> 	(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
> 	(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
> 	(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
> 	(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
> 	(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
> 	(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
> 	(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
> 	Likewise.
> 	* config/arm/iterators.md (VNIM, VNINOTM): Delete.
> 	* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
> 	(add<VNINOTM:mode>3): Replace with...
> 	(add<VDQ:mode>3): ...this new expander.
> 	* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
> 	ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
> 	(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
> 	favor of the above.
> 	(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
> 	gen_add<mode>3_fp16.
>
> gcc/testsuite/
> 	* gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
> 	even without -ffast-math.
> ---
>  gcc/config/arm/arm.h                          | 41 ++++++++++++++++
>  gcc/config/arm/iterators.md                   |  8 ----
>  gcc/config/arm/neon.md                        | 47 +------------------
>  gcc/config/arm/vec-common.md                  | 42 ++---------------
>  .../gcc.target/arm/armv8_2-fp16-arith-2.c     | 20 +++++---
>  5 files changed, 61 insertions(+), 97 deletions(-)
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index f4d3676c5bc..4a63d33c70d 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1110,6 +1110,47 @@ extern const int arm_arch_cde_coproc_bits[];
>  #define VALID_MVE_STRUCT_MODE(MODE) \
>    ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
>  
> +/* The conditions under which vector modes are supported for general
> +   arithmetic using Neon.  */
> +
> +#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
> +
> +#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
> +#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
> +
> +/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
> +#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
> +#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
> +
> +/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
> +   only use them for general arithmetic when -funsafe-math-optimizations
> +   is in effect.  */
> +#define ARM_HAVE_NEON_V2SF_ARITH \
> +  (TARGET_NEON && flag_unsafe_math_optimizations)
> +#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> +
> +/* The conditions under which vector modes are supported for general
> +   arithmetic by any vector extension.  */
> +
> +#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH || TARGET_REALLY_IWMMXT)
> +#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH || TARGET_REALLY_IWMMXT)
> +#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH || TARGET_REALLY_IWMMXT)
> +
> +#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH || TARGET_HAVE_MVE)
> +#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH || TARGET_HAVE_MVE)
> +#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH || TARGET_HAVE_MVE)
> +#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
> +
> +#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
> +#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> +
> +#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH || TARGET_HAVE_MVE_FLOAT)
> +#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH || TARGET_HAVE_MVE_FLOAT)
> +
>  /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
>  extern int arm_regs_in_sequence[];
>  
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 0bc9eba0722..c70e3bc2731 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
>  ;; Integer and float modes supported by Neon and IWMMXT.
>  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
>  
> -;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
> -;; arithmetic epxand patterns.
> -(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
> -
> -;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
> -;; arithmetic epxand patterns.
> -(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
> -
>  ;; Integer and float modes supported by Neon, IWMMXT and MVE.
>  (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
>  
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 3e7b51d8ab6..96bf277f501 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
>    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
>          (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
>  		  (match_operand:VDQ 2 "s_register_operand" "w")))]
> -  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +  "ARM_HAVE_NEON_<MODE>_ARITH"
>    "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
>    [(set (attr "type")
>        (if_then_else (match_test "<Is_float_mode>")
> @@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
>                      (const_string "neon_add<q>")))]
>  )
>  
> -;; As with SFmode, full support for HFmode vector arithmetic is only available
> -;; when flag-unsafe-math-optimizations is enabled.
> -
> -;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
> -;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
> -;; from "addv8hf3" standard pattern inside vec-common.md file.
> -
> -(define_insn "addv8hf3_neon"
> -  [(set
> -    (match_operand:V8HF 0 "s_register_operand" "=w")
> -    (plus:V8HF
> -     (match_operand:V8HF 1 "s_register_operand" "w")
> -     (match_operand:V8HF 2 "s_register_operand" "w")))]
> - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> - [(set_attr "type" "neon_fp_addsub_s_q")]
> -)
> -
> -(define_insn "addv4hf3"
> -  [(set
> -    (match_operand:V4HF 0 "s_register_operand" "=w")
> -    (plus:V4HF
> -     (match_operand:V4HF 1 "s_register_operand" "w")
> -     (match_operand:V4HF 2 "s_register_operand" "w")))]
> - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> - [(set_attr "type" "neon_fp_addsub_s_q")]
> -)
> -
> -(define_insn "add<mode>3_fp16"
> -  [(set
> -    (match_operand:VH 0 "s_register_operand" "=w")
> -    (plus:VH
> -     (match_operand:VH 1 "s_register_operand" "w")
> -     (match_operand:VH 2 "s_register_operand" "w")))]
> - "TARGET_NEON_FP16INST"
> - "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> - [(set (attr "type")
> -   (if_then_else (match_test "<Is_float_mode>")
> -    (const_string "neon_fp_addsub_s<q>")
> -    (const_string "neon_add<q>")))]
> -)
> -
>  (define_insn "*sub<mode>3_neon"
>    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
>          (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> @@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
>     (match_operand:VH 2 "s_register_operand")]
>    "TARGET_NEON_FP16INST"
>  {
> -  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
>    DONE;
>  })
>  
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index b7e3619caf4..c3c86c46355 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -81,43 +81,11 @@ (define_expand "movv8hf"
>  ;; patterns separately for Neon, IWMMXT and MVE.
>  
>  (define_expand "add<mode>3"
> -  [(set (match_operand:VNIM 0 "s_register_operand")
> -	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
> -		   (match_operand:VNIM 2 "s_register_operand")))]
> -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> -		    || flag_unsafe_math_optimizations))
> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
> -   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
> -   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
> -{
> -})
> -
> -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> -;; patterns separately for Neon and MVE.
> -
> -(define_expand "addv8hf3"
> -  [(set (match_operand:V8HF 0 "s_register_operand")
> -	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
> -		   (match_operand:V8HF 2 "s_register_operand")))]
> -  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
> -   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
> -{
> -  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
> -    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
> -})
> -
> -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> -;; patterns separately for Neon and IWMMXT.
> -
> -(define_expand "add<mode>3"
> -  [(set (match_operand:VNINOTM 0 "s_register_operand")
> -	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
> -		      (match_operand:VNINOTM 2 "s_register_operand")))]
> -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> -		    || flag_unsafe_math_optimizations))
> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
> -{
> -})
> +  [(set (match_operand:VDQ 0 "s_register_operand")
> +	(plus:VDQ (match_operand:VDQ 1 "s_register_operand")
> +		  (match_operand:VDQ 2 "s_register_operand")))]
> +  "ARM_HAVE_<MODE>_ARITH"
> +)
>  
>  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
>  ;; patterns separately for IWMMXT and Neon.
> diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> index 24d0528540d..81bad225a1f 100644
> --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> @@ -89,17 +89,23 @@ TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
>  /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
>  /* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
>  /* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
> +
> +/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
> +
> +/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
> +
> +/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
> +/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
>  
> -/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
> -/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
> -/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
>  /* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
>  /* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
> -
>  /* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }  */
> -/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, #0} 2 } }  */
> -
> -/* { dg-final { scan-assembler-not {vabs\.f16} } }  */
>  
>  /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
>  /* { dg-final { scan-assembler-not {vsub\.f32} } }  */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Ping: [PATCH] arm: Add new vector mode macros
  2020-09-29 10:26       ` Ping: " Richard Sandiford
@ 2020-09-29 10:38         ` Kyrylo Tkachov
  2020-09-30 12:51           ` Christophe Lyon
  0 siblings, 1 reply; 9+ messages in thread
From: Kyrylo Tkachov @ 2020-09-29 10:38 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nickc, Richard Earnshaw, Ramana Radhakrishnan, Dennis Zhang



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 29 September 2020 11:27
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nickc@redhat.com; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Dennis Zhang
> <Dennis.Zhang@arm.com>
> Subject: Ping: [PATCH] arm: Add new vector mode macros
> 
> Ping
> 
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
> >> This looks like a productive way forward to me.
> >> Okay if the other maintainer don't object by the end of the week.
> >
> > Thanks.  Dennis pointed out off-list that it regressed
> > armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
> > to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
> > be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
> > flush-to-zero control.
> >
> > This version therefore updates the test to expect the same output
> > as the -ffast-math version.
> >
> > Tested on arm-linux-gnueabi (hopefully for real this time -- I must
> > have messed up the testing last time).  OK for trunk?
> >

Ok.
Thanks,
Kyrill

> > FWIW, the non-testsuite part is the same as before.
> >
> > Richard
> >
> >
> > gcc/
> > 	* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH,
> ARM_HAVE_NEON_V4HI_ARITH)
> > 	(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH):
> New macros.
> > 	(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH):
> Likewise.
> > 	(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH):
> Likewise.
> > 	(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH):
> Likewise.
> > 	(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH,
> ARM_HAVE_V4HI_ARITH)
> > 	(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH,
> ARM_HAVE_V8HI_ARITH)
> > 	(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH,
> ARM_HAVE_V4HF_ARITH)
> > 	(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH,
> ARM_HAVE_V4SF_ARITH):
> > 	Likewise.
> > 	* config/arm/iterators.md (VNIM, VNINOTM): Delete.
> > 	* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
> > 	(add<VNINOTM:mode>3): Replace with...
> > 	(add<VDQ:mode>3): ...this new expander.
> > 	* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
> > 	ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
> > 	(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
> > 	favor of the above.
> > 	(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
> > 	gen_add<mode>3_fp16.
> >
> > gcc/testsuite/
> > 	* gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
> > 	even without -ffast-math.
> > ---
> >  gcc/config/arm/arm.h                          | 41 ++++++++++++++++
> >  gcc/config/arm/iterators.md                   |  8 ----
> >  gcc/config/arm/neon.md                        | 47 +------------------
> >  gcc/config/arm/vec-common.md                  | 42 ++---------------
> >  .../gcc.target/arm/armv8_2-fp16-arith-2.c     | 20 +++++---
> >  5 files changed, 61 insertions(+), 97 deletions(-)
> >
> > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > index f4d3676c5bc..4a63d33c70d 100644
> > --- a/gcc/config/arm/arm.h
> > +++ b/gcc/config/arm/arm.h
> > @@ -1110,6 +1110,47 @@ extern const int arm_arch_cde_coproc_bits[];
> >  #define VALID_MVE_STRUCT_MODE(MODE) \
> >    ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
> >
> > +/* The conditions under which vector modes are supported for general
> > +   arithmetic using Neon.  */
> > +
> > +#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
> > +
> > +#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
> > +#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
> > +
> > +/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
> > +#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
> > +#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
> > +
> > +/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
> > +   only use them for general arithmetic when -funsafe-math-optimizations
> > +   is in effect.  */
> > +#define ARM_HAVE_NEON_V2SF_ARITH \
> > +  (TARGET_NEON && flag_unsafe_math_optimizations)
> > +#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> > +
> > +/* The conditions under which vector modes are supported for general
> > +   arithmetic by any vector extension.  */
> > +
> > +#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH ||
> TARGET_REALLY_IWMMXT)
> > +#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH ||
> TARGET_REALLY_IWMMXT)
> > +#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH ||
> TARGET_REALLY_IWMMXT)
> > +
> > +#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH ||
> TARGET_HAVE_MVE)
> > +#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH ||
> TARGET_HAVE_MVE)
> > +#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH ||
> TARGET_HAVE_MVE)
> > +#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
> > +
> > +#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
> > +#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> > +
> > +#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH ||
> TARGET_HAVE_MVE_FLOAT)
> > +#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH ||
> TARGET_HAVE_MVE_FLOAT)
> > +
> >  /* The register numbers in sequence, for passing to
> arm_gen_load_multiple.  */
> >  extern int arm_regs_in_sequence[];
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 0bc9eba0722..c70e3bc2731 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
> >  ;; Integer and float modes supported by Neon and IWMMXT.
> >  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI
> V4SF])
> >
> > -;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
> > -;; arithmetic epxand patterns.
> > -(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
> > -
> > -;; Integer and float modes supported by Neon and IWMMXT but not MVE,
> used by
> > -;; arithmetic epxand patterns.
> > -(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
> > -
> >  ;; Integer and float modes supported by Neon, IWMMXT and MVE.
> >  (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
> >
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> > index 3e7b51d8ab6..96bf277f501 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
> >    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
> >          (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> >  		  (match_operand:VDQ 2 "s_register_operand" "w")))]
> > -  "TARGET_NEON && (!<Is_float_mode> ||
> flag_unsafe_math_optimizations)"
> > +  "ARM_HAVE_NEON_<MODE>_ARITH"
> >    "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> >    [(set (attr "type")
> >        (if_then_else (match_test "<Is_float_mode>")
> > @@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
> >                      (const_string "neon_add<q>")))]
> >  )
> >
> > -;; As with SFmode, full support for HFmode vector arithmetic is only
> available
> > -;; when flag-unsafe-math-optimizations is enabled.
> > -
> > -;; Add pattern with modes V8HF and V4HF is split into separate patterns to
> add
> > -;; support for standard pattern addv8hf3 in MVE.  Following pattern is
> called
> > -;; from "addv8hf3" standard pattern inside vec-common.md file.
> > -
> > -(define_insn "addv8hf3_neon"
> > -  [(set
> > -    (match_operand:V8HF 0 "s_register_operand" "=w")
> > -    (plus:V8HF
> > -     (match_operand:V8HF 1 "s_register_operand" "w")
> > -     (match_operand:V8HF 2 "s_register_operand" "w")))]
> > - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> > - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > - [(set_attr "type" "neon_fp_addsub_s_q")]
> > -)
> > -
> > -(define_insn "addv4hf3"
> > -  [(set
> > -    (match_operand:V4HF 0 "s_register_operand" "=w")
> > -    (plus:V4HF
> > -     (match_operand:V4HF 1 "s_register_operand" "w")
> > -     (match_operand:V4HF 2 "s_register_operand" "w")))]
> > - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> > - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > - [(set_attr "type" "neon_fp_addsub_s_q")]
> > -)
> > -
> > -(define_insn "add<mode>3_fp16"
> > -  [(set
> > -    (match_operand:VH 0 "s_register_operand" "=w")
> > -    (plus:VH
> > -     (match_operand:VH 1 "s_register_operand" "w")
> > -     (match_operand:VH 2 "s_register_operand" "w")))]
> > - "TARGET_NEON_FP16INST"
> > - "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > - [(set (attr "type")
> > -   (if_then_else (match_test "<Is_float_mode>")
> > -    (const_string "neon_fp_addsub_s<q>")
> > -    (const_string "neon_add<q>")))]
> > -)
> > -
> >  (define_insn "*sub<mode>3_neon"
> >    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
> >          (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> > @@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
> >     (match_operand:VH 2 "s_register_operand")]
> >    "TARGET_NEON_FP16INST"
> >  {
> > -  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1],
> operands[2]));
> > +  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
> >    DONE;
> >  })
> >
> > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> > index b7e3619caf4..c3c86c46355 100644
> > --- a/gcc/config/arm/vec-common.md
> > +++ b/gcc/config/arm/vec-common.md
> > @@ -81,43 +81,11 @@ (define_expand "movv8hf"
> >  ;; patterns separately for Neon, IWMMXT and MVE.
> >
> >  (define_expand "add<mode>3"
> > -  [(set (match_operand:VNIM 0 "s_register_operand")
> > -	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
> > -		   (match_operand:VNIM 2 "s_register_operand")))]
> > -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode !=
> V4SFmode)
> > -		    || flag_unsafe_math_optimizations))
> > -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE
> (<MODE>mode))
> > -   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
> > -   || (TARGET_HAVE_MVE_FLOAT &&
> VALID_MVE_SF_MODE(<MODE>mode))"
> > -{
> > -})
> > -
> > -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> > -;; patterns separately for Neon and MVE.
> > -
> > -(define_expand "addv8hf3"
> > -  [(set (match_operand:V8HF 0 "s_register_operand")
> > -	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
> > -		   (match_operand:V8HF 2 "s_register_operand")))]
> > -  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
> > -   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
> > -{
> > -  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
> > -    emit_insn (gen_addv8hf3_neon (operands[0], operands[1],
> operands[2]));
> > -})
> > -
> > -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> > -;; patterns separately for Neon and IWMMXT.
> > -
> > -(define_expand "add<mode>3"
> > -  [(set (match_operand:VNINOTM 0 "s_register_operand")
> > -	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
> > -		      (match_operand:VNINOTM 2 "s_register_operand")))]
> > -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode !=
> V4SFmode)
> > -		    || flag_unsafe_math_optimizations))
> > -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE
> (<MODE>mode))"
> > -{
> > -})
> > +  [(set (match_operand:VDQ 0 "s_register_operand")
> > +	(plus:VDQ (match_operand:VDQ 1 "s_register_operand")
> > +		  (match_operand:VDQ 2 "s_register_operand")))]
> > +  "ARM_HAVE_<MODE>_ARITH"
> > +)
> >
> >  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
> >  ;; patterns separately for IWMMXT and Neon.
> > diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > index 24d0528540d..81bad225a1f 100644
> > --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > @@ -89,17 +89,23 @@ TEST_CMP (greaterthanqual, >=, int16x8_t,
> float16x8_t)
> >  /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
> >  /* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
> >  /* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
> > +/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
> > +
> > +/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 1 } }  */
> > +/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+}
> 1 } }  */
> > +/* { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+}
> 1 } }  */
> > +
> > +/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 1 } }  */
> > +/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+}
> 1 } }  */
> > +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+}
> 1 } }  */
> > +
> > +/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 1 } }  */
> > +/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+}
> 1 } }  */
> > +/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+}
> 1 } }  */
> >
> > -/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 13 } }  */
> > -/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 13 } }  */
> > -/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 13 } }  */
> >  /* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> 13 } }  */
> >  /* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
> > -
> >  /* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }
> */
> > -/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, #0} 2 } }  */
> > -
> > -/* { dg-final { scan-assembler-not {vabs\.f16} } }  */
> >
> >  /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
> >  /* { dg-final { scan-assembler-not {vsub\.f32} } }  */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ping: [PATCH] arm: Add new vector mode macros
  2020-09-29 10:38         ` Kyrylo Tkachov
@ 2020-09-30 12:51           ` Christophe Lyon
  2020-10-02 10:39             ` [PATCH] arm: Make more use of the new " Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Christophe Lyon @ 2020-09-30 12:51 UTC (permalink / raw)
  To: Kyrylo Tkachov
  Cc: Richard Sandiford, Richard Earnshaw, Dennis Zhang, gcc-patches,
	Ramana Radhakrishnan

On Tue, 29 Sep 2020 at 12:38, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: 29 September 2020 11:27
> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nickc@redhat.com; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> > <Ramana.Radhakrishnan@arm.com>; Dennis Zhang
> > <Dennis.Zhang@arm.com>
> > Subject: Ping: [PATCH] arm: Add new vector mode macros
> >
> > Ping
> >
> > Richard Sandiford <richard.sandiford@arm.com> writes:
> > > Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
> > >> This looks like a productive way forward to me.
> > >> Okay if the other maintainer don't object by the end of the week.
> > >
> > > Thanks.  Dennis pointed out off-list that it regressed
> > > armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
> > > to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
> > > be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
> > > flush-to-zero control.
> > >
> > > This version therefore updates the test to expect the same output
> > > as the -ffast-math version.
> > >
> > > Tested on arm-linux-gnueabi (hopefully for real this time -- I must
> > > have messed up the testing last time).  OK for trunk?
> > >
>
> Ok.
> Thanks,
> Kyrill
>

Hi Richard,

I didn't notice the initial regression you mention above, but after
this commit (r11-3522),
I see:
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vabs\\.f16\\ts[0-9]+, s[0-9]+ 2
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vmul\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vmul\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vmul\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vsub\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vsub\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vsub\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1

Looks like we are running validations differently?

Christophe

> > > FWIW, the non-testsuite part is the same as before.
> > >
> > > Richard
> > >
> > >
> > > gcc/
> > >     * config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH,
> > ARM_HAVE_NEON_V4HI_ARITH)
> > >     (ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH):
> > New macros.
> > >     (ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH):
> > Likewise.
> > >     (ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH):
> > Likewise.
> > >     (ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH):
> > Likewise.
> > >     (ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH,
> > ARM_HAVE_V4HI_ARITH)
> > >     (ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH,
> > ARM_HAVE_V8HI_ARITH)
> > >     (ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH,
> > ARM_HAVE_V4HF_ARITH)
> > >     (ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH,
> > ARM_HAVE_V4SF_ARITH):
> > >     Likewise.
> > >     * config/arm/iterators.md (VNIM, VNINOTM): Delete.
> > >     * config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
> > >     (add<VNINOTM:mode>3): Replace with...
> > >     (add<VDQ:mode>3): ...this new expander.
> > >     * config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
> > >     ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
> > >     (addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
> > >     favor of the above.
> > >     (neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
> > >     gen_add<mode>3_fp16.
> > >
> > > gcc/testsuite/
> > >     * gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
> > >     even without -ffast-math.
> > > ---
> > >  gcc/config/arm/arm.h                          | 41 ++++++++++++++++
> > >  gcc/config/arm/iterators.md                   |  8 ----
> > >  gcc/config/arm/neon.md                        | 47 +------------------
> > >  gcc/config/arm/vec-common.md                  | 42 ++---------------
> > >  .../gcc.target/arm/armv8_2-fp16-arith-2.c     | 20 +++++---
> > >  5 files changed, 61 insertions(+), 97 deletions(-)
> > >
> > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > > index f4d3676c5bc..4a63d33c70d 100644
> > > --- a/gcc/config/arm/arm.h
> > > +++ b/gcc/config/arm/arm.h
> > > @@ -1110,6 +1110,47 @@ extern const int arm_arch_cde_coproc_bits[];
> > >  #define VALID_MVE_STRUCT_MODE(MODE) \
> > >    ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
> > >
> > > +/* The conditions under which vector modes are supported for general
> > > +   arithmetic using Neon.  */
> > > +
> > > +#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
> > > +
> > > +#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
> > > +
> > > +/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
> > > +#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
> > > +#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
> > > +
> > > +/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
> > > +   only use them for general arithmetic when -funsafe-math-optimizations
> > > +   is in effect.  */
> > > +#define ARM_HAVE_NEON_V2SF_ARITH \
> > > +  (TARGET_NEON && flag_unsafe_math_optimizations)
> > > +#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> > > +
> > > +/* The conditions under which vector modes are supported for general
> > > +   arithmetic by any vector extension.  */
> > > +
> > > +#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH ||
> > TARGET_REALLY_IWMMXT)
> > > +#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH ||
> > TARGET_REALLY_IWMMXT)
> > > +#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH ||
> > TARGET_REALLY_IWMMXT)
> > > +
> > > +#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH ||
> > TARGET_HAVE_MVE)
> > > +#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH ||
> > TARGET_HAVE_MVE)
> > > +#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH ||
> > TARGET_HAVE_MVE)
> > > +#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
> > > +
> > > +#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
> > > +#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
> > > +
> > > +#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH ||
> > TARGET_HAVE_MVE_FLOAT)
> > > +#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH ||
> > TARGET_HAVE_MVE_FLOAT)
> > > +
> > >  /* The register numbers in sequence, for passing to
> > arm_gen_load_multiple.  */
> > >  extern int arm_regs_in_sequence[];
> > >
> > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > > index 0bc9eba0722..c70e3bc2731 100644
> > > --- a/gcc/config/arm/iterators.md
> > > +++ b/gcc/config/arm/iterators.md
> > > @@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
> > >  ;; Integer and float modes supported by Neon and IWMMXT.
> > >  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI
> > V4SF])
> > >
> > > -;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
> > > -;; arithmetic epxand patterns.
> > > -(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
> > > -
> > > -;; Integer and float modes supported by Neon and IWMMXT but not MVE,
> > used by
> > > -;; arithmetic epxand patterns.
> > > -(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
> > > -
> > >  ;; Integer and float modes supported by Neon, IWMMXT and MVE.
> > >  (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
> > >
> > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> > > index 3e7b51d8ab6..96bf277f501 100644
> > > --- a/gcc/config/arm/neon.md
> > > +++ b/gcc/config/arm/neon.md
> > > @@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
> > >    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
> > >          (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> > >               (match_operand:VDQ 2 "s_register_operand" "w")))]
> > > -  "TARGET_NEON && (!<Is_float_mode> ||
> > flag_unsafe_math_optimizations)"
> > > +  "ARM_HAVE_NEON_<MODE>_ARITH"
> > >    "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > >    [(set (attr "type")
> > >        (if_then_else (match_test "<Is_float_mode>")
> > > @@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
> > >                      (const_string "neon_add<q>")))]
> > >  )
> > >
> > > -;; As with SFmode, full support for HFmode vector arithmetic is only
> > available
> > > -;; when flag-unsafe-math-optimizations is enabled.
> > > -
> > > -;; Add pattern with modes V8HF and V4HF is split into separate patterns to
> > add
> > > -;; support for standard pattern addv8hf3 in MVE.  Following pattern is
> > called
> > > -;; from "addv8hf3" standard pattern inside vec-common.md file.
> > > -
> > > -(define_insn "addv8hf3_neon"
> > > -  [(set
> > > -    (match_operand:V8HF 0 "s_register_operand" "=w")
> > > -    (plus:V8HF
> > > -     (match_operand:V8HF 1 "s_register_operand" "w")
> > > -     (match_operand:V8HF 2 "s_register_operand" "w")))]
> > > - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> > > - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > > - [(set_attr "type" "neon_fp_addsub_s_q")]
> > > -)
> > > -
> > > -(define_insn "addv4hf3"
> > > -  [(set
> > > -    (match_operand:V4HF 0 "s_register_operand" "=w")
> > > -    (plus:V4HF
> > > -     (match_operand:V4HF 1 "s_register_operand" "w")
> > > -     (match_operand:V4HF 2 "s_register_operand" "w")))]
> > > - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
> > > - "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > > - [(set_attr "type" "neon_fp_addsub_s_q")]
> > > -)
> > > -
> > > -(define_insn "add<mode>3_fp16"
> > > -  [(set
> > > -    (match_operand:VH 0 "s_register_operand" "=w")
> > > -    (plus:VH
> > > -     (match_operand:VH 1 "s_register_operand" "w")
> > > -     (match_operand:VH 2 "s_register_operand" "w")))]
> > > - "TARGET_NEON_FP16INST"
> > > - "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > > - [(set (attr "type")
> > > -   (if_then_else (match_test "<Is_float_mode>")
> > > -    (const_string "neon_fp_addsub_s<q>")
> > > -    (const_string "neon_add<q>")))]
> > > -)
> > > -
> > >  (define_insn "*sub<mode>3_neon"
> > >    [(set (match_operand:VDQ 0 "s_register_operand" "=w")
> > >          (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
> > > @@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
> > >     (match_operand:VH 2 "s_register_operand")]
> > >    "TARGET_NEON_FP16INST"
> > >  {
> > > -  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1],
> > operands[2]));
> > > +  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
> > >    DONE;
> > >  })
> > >
> > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> > common.md
> > > index b7e3619caf4..c3c86c46355 100644
> > > --- a/gcc/config/arm/vec-common.md
> > > +++ b/gcc/config/arm/vec-common.md
> > > @@ -81,43 +81,11 @@ (define_expand "movv8hf"
> > >  ;; patterns separately for Neon, IWMMXT and MVE.
> > >
> > >  (define_expand "add<mode>3"
> > > -  [(set (match_operand:VNIM 0 "s_register_operand")
> > > -   (plus:VNIM (match_operand:VNIM 1 "s_register_operand")
> > > -              (match_operand:VNIM 2 "s_register_operand")))]
> > > -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode !=
> > V4SFmode)
> > > -               || flag_unsafe_math_optimizations))
> > > -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE
> > (<MODE>mode))
> > > -   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
> > > -   || (TARGET_HAVE_MVE_FLOAT &&
> > VALID_MVE_SF_MODE(<MODE>mode))"
> > > -{
> > > -})
> > > -
> > > -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> > > -;; patterns separately for Neon and MVE.
> > > -
> > > -(define_expand "addv8hf3"
> > > -  [(set (match_operand:V8HF 0 "s_register_operand")
> > > -   (plus:V8HF (match_operand:V8HF 1 "s_register_operand")
> > > -              (match_operand:V8HF 2 "s_register_operand")))]
> > > -  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
> > > -   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
> > > -{
> > > -  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
> > > -    emit_insn (gen_addv8hf3_neon (operands[0], operands[1],
> > operands[2]));
> > > -})
> > > -
> > > -;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
> > > -;; patterns separately for Neon and IWMMXT.
> > > -
> > > -(define_expand "add<mode>3"
> > > -  [(set (match_operand:VNINOTM 0 "s_register_operand")
> > > -   (plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
> > > -                 (match_operand:VNINOTM 2 "s_register_operand")))]
> > > -  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode !=
> > V4SFmode)
> > > -               || flag_unsafe_math_optimizations))
> > > -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE
> > (<MODE>mode))"
> > > -{
> > > -})
> > > +  [(set (match_operand:VDQ 0 "s_register_operand")
> > > +   (plus:VDQ (match_operand:VDQ 1 "s_register_operand")
> > > +             (match_operand:VDQ 2 "s_register_operand")))]
> > > +  "ARM_HAVE_<MODE>_ARITH"
> > > +)
> > >
> > >  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
> > >  ;; patterns separately for IWMMXT and Neon.
> > > diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > > index 24d0528540d..81bad225a1f 100644
> > > --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > > +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
> > > @@ -89,17 +89,23 @@ TEST_CMP (greaterthanqual, >=, int16x8_t,
> > float16x8_t)
> > >  /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
> > >  /* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
> > >  /* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
> > > +
> > > +/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+}
> > 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+}
> > 1 } }  */
> > > +
> > > +/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+}
> > 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+}
> > 1 } }  */
> > > +
> > > +/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+}
> > 1 } }  */
> > > +/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+}
> > 1 } }  */
> > >
> > > -/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 13 } }  */
> > > -/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 13 } }  */
> > > -/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 13 } }  */
> > >  /* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+}
> > 13 } }  */
> > >  /* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
> > > -
> > >  /* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }
> > */
> > > -/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, #0} 2 } }  */
> > > -
> > > -/* { dg-final { scan-assembler-not {vabs\.f16} } }  */
> > >
> > >  /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
> > >  /* { dg-final { scan-assembler-not {vsub\.f32} } }  */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm: Make more use of the new mode macros
  2020-09-30 12:51           ` Christophe Lyon
@ 2020-10-02 10:39             ` Richard Sandiford
  2020-10-02 10:40               ` Kyrylo Tkachov
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-10-02 10:39 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Kyrylo Tkachov, Richard Earnshaw, Dennis Zhang, gcc-patches,
	Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2987 bytes --]

Christophe Lyon <christophe.lyon@linaro.org> writes:
> On Tue, 29 Sep 2020 at 12:38, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>>
>>
>>
>> > -----Original Message-----
>> > From: Richard Sandiford <richard.sandiford@arm.com>
>> > Sent: 29 September 2020 11:27
>> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> > Cc: gcc-patches@gcc.gnu.org; nickc@redhat.com; Richard Earnshaw
>> > <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
>> > <Ramana.Radhakrishnan@arm.com>; Dennis Zhang
>> > <Dennis.Zhang@arm.com>
>> > Subject: Ping: [PATCH] arm: Add new vector mode macros
>> >
>> > Ping
>> >
>> > Richard Sandiford <richard.sandiford@arm.com> writes:
>> > > Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
>> > >> This looks like a productive way forward to me.
>> > >> Okay if the other maintainer don't object by the end of the week.
>> > >
>> > > Thanks.  Dennis pointed out off-list that it regressed
>> > > armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
>> > > to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
>> > > be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
>> > > flush-to-zero control.
>> > >
>> > > This version therefore updates the test to expect the same output
>> > > as the -ffast-math version.
>> > >
>> > > Tested on arm-linux-gnueabi (hopefully for real this time -- I must
>> > > have messed up the testing last time).  OK for trunk?
>> > >
>>
>> Ok.
>> Thanks,
>> Kyrill
>>
>
> Hi Richard,
>
> I didn't notice the initial regression you mention above, but after

(FWIW, the patch wasn't committed in the original form.  Dennis noticed
the problem when applying it locally.)

> this commit (r11-3522),
> I see:
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vabs\\.f16\\ts[0-9]+, s[0-9]+ 2
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vmul\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vmul\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vmul\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vsub\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vsub\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
> FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> vsub\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
>
> Looks like we are running validations differently?

Hmm, seems like these tests are unsupported on an arm-linux-gnueabihf
bootstrap (even though Advanced SIMD is enabled by default) since the
tests specifically want to be able to compile softfp code. :-(

This patch also uses the macros for patterns that are currently specific
to neon.md.  Tested on arm-linux-gnueabihf (FWIW), arm-eabi with MVE,
and armeb-eabi with various combinations.  I see the above failures
are fixed for the latter two.  OK to install?

Richard


[-- Attachment #2: 0001-arm-Make-more-use-of-the-new-mode-macros.patch --]
[-- Type: text/plain, Size: 13512 bytes --]

From 0100f29846e2d775d6cd1f1cef7614d7b67f5fdd Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Wed, 30 Sep 2020 20:16:19 +0100
Subject: [PATCH] arm: Make more use of the new mode macros

As Christophe pointed out, my r11-3522 patch didn't in fact fix
all of the armv8_2-fp16-arith-2.c failures introduced by allowing
FP16 vectorisation without -funsafe-math-optimizations.  I must have
only tested the final patch on my usual arm-linux-gnueabihf bootstrap,
which it turns out treats the test as unsupported.

The focus of the original patch was to use mode macros for
patterns that are shared between Advanced SIMD, iwMMXt and MVE.
This patch uses the mode macros for general neon.md patterns too.

gcc/
	* config/arm/neon.md (*sub<VDQ:mode>3_neon): Use the new mode macros
	for the insn condition.
	(sub<VH:mode>3, *mul<VDQW:mode>3_neon): Likewise.
	(mul<VDQW:mode>3add<VDQW:mode>_neon): Likewise.
	(mul<VH:mode>3add<VH:mode>_neon): Likewise.
	(mul<VDQW:mode>3neg<VDQW:mode>add<VDQW:mode>_neon): Likewise.
	(fma<VCVTF:mode>4, fma<VH:mode>4, *fmsub<VCVTF:mode>4): Likewise.
	(quad_halves_<code>v4sf, reduc_plus_scal_<VD:mode>): Likewise.
	(reduc_plus_scal_<VQ:mode>, reduc_smin_scal_<VD:mode>): Likewise.
	(reduc_smin_scal_<VQ:mode>, reduc_smax_scal_<VD:mode>): Likewise.
	(reduc_smax_scal_<VQ:mode>, mul<VH:mode>3): Likewise.
	(neon_vabd<VF:mode>_2, neon_vabd<VF:mode>_3): Likewise.
	(fma<VH:mode>4_intrinsic): Delete.
	(neon_vadd<VCVTF:mode>): Use the new mode macros to decide which
	form of instruction to generate.
	(neon_vmla<VDQW:mode>, neon_vmls<VDQW:mode>): Likewise.
	(neon_vsub<VCVTF:mode>): Likewise.
	(neon_vfma<VH:mode>): Generate the main fma<mode>4 form instead
	of using fma<mode>4_intrinsic.

gcc/testsuite/
	* gcc.target/arm/armv8_2-fp16-arith-2.c (float16_t): Use _Float16_t
	rather than __fp16.
	(float16x4_t, float16x4_t): Likewise.
	(fp16_abs): Use __builtin_fabsf16.
---
 gcc/config/arm/neon.md                        | 64 ++++++++-----------
 .../gcc.target/arm/armv8_2-fp16-arith-2.c     |  8 +--
 2 files changed, 29 insertions(+), 43 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 58832cbf484..85e424e6cf4 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -513,7 +513,7 @@ (define_insn "*sub<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
                    (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -527,7 +527,7 @@ (define_insn "sub<mode>3"
    (minus:VH
     (match_operand:VH 1 "s_register_operand" "w")
     (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "ARM_HAVE_NEON_<MODE>_ARITH"
  "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
  [(set_attr "type" "neon_sub<q>")]
 )
@@ -547,7 +547,7 @@ (define_insn "*mul<mode>3_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (mult:VDQW (match_operand:VDQW 1 "s_register_operand" "w")
                    (match_operand:VDQW 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vmul.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -592,7 +592,7 @@ (define_insn "mul<mode>3add<mode>_neon"
         (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
                             (match_operand:VDQW 3 "s_register_operand" "w"))
 		  (match_operand:VDQW 1 "s_register_operand" "0")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vmla.<V_if_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -605,7 +605,7 @@ (define_insn "mul<mode>3add<mode>_neon"
 	(plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w")
 			  (match_operand:VH 3 "s_register_operand" "w"))
 		  (match_operand:VH 1 "s_register_operand" "0")))]
-  "TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
   [(set_attr "type" "neon_fp_mla_s<q>")]
 )
@@ -615,7 +615,7 @@ (define_insn "mul<mode>3neg<mode>add<mode>_neon"
         (minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0")
                     (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
                                (match_operand:VDQW 3 "s_register_operand" "w"))))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vmls.<V_if_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -633,7 +633,7 @@ (define_insn "fma<VCVTF:mode>4"
         (fma:VCVTF (match_operand:VCVTF 1 "register_operand" "w")
 		 (match_operand:VCVTF 2 "register_operand" "w")
 		 (match_operand:VCVTF 3 "register_operand" "0")))]
-  "TARGET_NEON && TARGET_FMA && flag_unsafe_math_optimizations"
+  "ARM_HAVE_NEON_<MODE>_ARITH && TARGET_FMA"
   "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_fp_mla_s<q>")]
 )
@@ -654,18 +654,7 @@ (define_insn "fma<VH:mode>4"
     (match_operand:VH 1 "register_operand" "w")
     (match_operand:VH 2 "register_operand" "w")
     (match_operand:VH 3 "register_operand" "0")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_mla_s<q>")]
-)
-
-(define_insn "fma<VH:mode>4_intrinsic"
- [(set (match_operand:VH 0 "register_operand" "=w")
-   (fma:VH
-    (match_operand:VH 1 "register_operand" "w")
-    (match_operand:VH 2 "register_operand" "w")
-    (match_operand:VH 3 "register_operand" "0")))]
- "TARGET_NEON_FP16INST"
+ "ARM_HAVE_NEON_<MODE>_ARITH"
  "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
  [(set_attr "type" "neon_fp_mla_s<q>")]
 )
@@ -675,7 +664,7 @@ (define_insn "*fmsub<VCVTF:mode>4"
         (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
 		   (match_operand:VCVTF 2 "register_operand" "w")
 		   (match_operand:VCVTF 3 "register_operand" "0")))]
-  "TARGET_NEON && TARGET_FMA && flag_unsafe_math_optimizations"
+  "ARM_HAVE_NEON_<MODE>_ARITH && TARGET_FMA"
   "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_fp_mla_s<q>")]
 )
@@ -1195,7 +1184,7 @@ (define_insn "quad_halves_<code>v4sf"
                            (parallel [(const_int 0) (const_int 1)]))
           (vec_select:V2SF (match_dup 1)
                            (parallel [(const_int 2) (const_int 3)]))))]
-  "TARGET_NEON && flag_unsafe_math_optimizations"
+  "ARM_HAVE_NEON_V4SF_ARITH"
   "<VQH_mnem>.f32\t%P0, %e1, %f1"
   [(set_attr "vqh_mnem" "<VQH_mnem>")
    (set_attr "type" "neon_fp_reduc_<VQH_type>_s_q")]
@@ -1262,7 +1251,7 @@ (define_expand "move_lo_quad_<mode>"
 (define_expand "reduc_plus_scal_<mode>"
   [(match_operand:<V_elem> 0 "nonimmediate_operand")
    (match_operand:VD 1 "s_register_operand")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
 {
   rtx vec = gen_reg_rtx (<MODE>mode);
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
@@ -1275,8 +1264,7 @@ (define_expand "reduc_plus_scal_<mode>"
 (define_expand "reduc_plus_scal_<mode>"
   [(match_operand:<V_elem> 0 "nonimmediate_operand")
    (match_operand:VQ 1 "s_register_operand")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
-   && !BYTES_BIG_ENDIAN"
+  "ARM_HAVE_NEON_<MODE>_ARITH && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
 
@@ -1311,7 +1299,7 @@ (define_insn "arm_reduc_plus_internal_v2di"
 (define_expand "reduc_smin_scal_<mode>"
   [(match_operand:<V_elem> 0 "nonimmediate_operand")
    (match_operand:VD 1 "s_register_operand")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
 {
   rtx vec = gen_reg_rtx (<MODE>mode);
 
@@ -1325,8 +1313,7 @@ (define_expand "reduc_smin_scal_<mode>"
 (define_expand "reduc_smin_scal_<mode>"
   [(match_operand:<V_elem> 0 "nonimmediate_operand")
    (match_operand:VQ 1 "s_register_operand")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
-   && !BYTES_BIG_ENDIAN"
+  "ARM_HAVE_NEON_<MODE>_ARITH && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
 
@@ -1339,7 +1326,7 @@ (define_expand "reduc_smin_scal_<mode>"
 (define_expand "reduc_smax_scal_<mode>"
   [(match_operand:<V_elem> 0 "nonimmediate_operand")
    (match_operand:VD 1 "s_register_operand")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
 {
   rtx vec = gen_reg_rtx (<MODE>mode);
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
@@ -1352,8 +1339,7 @@ (define_expand "reduc_smax_scal_<mode>"
 (define_expand "reduc_smax_scal_<mode>"
   [(match_operand:<V_elem> 0 "nonimmediate_operand")
    (match_operand:VQ 1 "s_register_operand")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
-   && !BYTES_BIG_ENDIAN"
+  "ARM_HAVE_NEON_<MODE>_ARITH && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
 
@@ -1627,7 +1613,7 @@ (define_expand "neon_vadd<mode>"
    (match_operand:VCVTF 2 "s_register_operand")]
   "TARGET_NEON"
 {
-  if (!<Is_float_mode> || flag_unsafe_math_optimizations)
+  if (ARM_HAVE_NEON_<MODE>_ARITH)
     emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
   else
     emit_insn (gen_neon_vadd<mode>_unspec (operands[0], operands[1],
@@ -1752,7 +1738,7 @@ (define_insn "mul<mode>3"
    (mult:VH
     (match_operand:VH 1 "s_register_operand" "w")
     (match_operand:VH 2 "s_register_operand" "w")))]
-  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
  [(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
 )
@@ -1775,7 +1761,7 @@ (define_expand "neon_vmla<mode>"
    (match_operand:VDQW 3 "s_register_operand")]
   "TARGET_NEON"
 {
-  if (!<Is_float_mode> || flag_unsafe_math_optimizations)
+  if (ARM_HAVE_NEON_<MODE>_ARITH)
     emit_insn (gen_mul<mode>3add<mode>_neon (operands[0], operands[1],
 				             operands[2], operands[3]));
   else
@@ -1803,8 +1789,8 @@ (define_expand "neon_vfma<VH:mode>"
    (match_operand:VH 3 "s_register_operand")]
   "TARGET_NEON_FP16INST"
 {
-  emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
-				       operands[1]));
+  emit_insn (gen_fma<mode>4 (operands[0], operands[2], operands[3],
+			     operands[1]));
   DONE;
 })
 
@@ -2266,7 +2252,7 @@ (define_expand "neon_vmls<mode>"
    (match_operand:VDQW 3 "s_register_operand")]
   "TARGET_NEON"
 {
-  if (!<Is_float_mode> || flag_unsafe_math_optimizations)
+  if (ARM_HAVE_NEON_<MODE>_ARITH)
     emit_insn (gen_mul<mode>3neg<mode>add<mode>_neon (operands[0],
 		 operands[1], operands[2], operands[3]));
   else
@@ -2373,7 +2359,7 @@ (define_expand "neon_vsub<mode>"
    (match_operand:VCVTF 2 "s_register_operand")]
   "TARGET_NEON"
 {
-  if (!<Is_float_mode> || flag_unsafe_math_optimizations)
+  if (ARM_HAVE_NEON_<MODE>_ARITH)
     emit_insn (gen_sub<mode>3 (operands[0], operands[1], operands[2]));
   else
     emit_insn (gen_neon_vsub<mode>_unspec (operands[0], operands[1],
@@ -6462,7 +6448,7 @@ (define_insn "neon_vabd<mode>_2"
  [(set (match_operand:VF 0 "s_register_operand" "=w")
        (abs:VF (minus:VF (match_operand:VF 1 "s_register_operand" "w")
 			 (match_operand:VF 2 "s_register_operand" "w"))))]
- "TARGET_NEON && flag_unsafe_math_optimizations"
+ "ARM_HAVE_NEON_<MODE>_ARITH"
  "vabd.<V_s_elem> %<V_reg>0, %<V_reg>1, %<V_reg>2"
  [(set_attr "type" "neon_fp_abd_s<q>")]
 )
@@ -6472,7 +6458,7 @@ (define_insn "neon_vabd<mode>_3"
        (abs:VF (unspec:VF [(match_operand:VF 1 "s_register_operand" "w")
 			    (match_operand:VF 2 "s_register_operand" "w")]
 		UNSPEC_VSUB)))]
- "TARGET_NEON && flag_unsafe_math_optimizations"
+ "ARM_HAVE_NEON_<MODE>_ARITH"
  "vabd.<V_if_elem> %<V_reg>0, %<V_reg>1, %<V_reg>2"
  [(set_attr "type" "neon_fp_abd_s<q>")]
 )
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
index 81bad225a1f..f94109c4396 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
@@ -6,9 +6,9 @@
 /* Test instructions generated for half-precision arithmetic without
    unsafe-math-optimizations.  */
 
-typedef __fp16 float16_t;
-typedef __simd64_float16_t float16x4_t;
-typedef __simd128_float16_t float16x8_t;
+typedef _Float16 float16_t;
+typedef _Float16 float16x4_t __attribute__ ((vector_size (8)));
+typedef _Float16 float16x8_t __attribute__ ((vector_size (16)));
 
 typedef short int16x4_t __attribute__ ((vector_size (8)));
 typedef short int int16x8_t  __attribute__ ((vector_size (16)));
@@ -16,7 +16,7 @@ typedef short int int16x8_t  __attribute__ ((vector_size (16)));
 float16_t
 fp16_abs (float16_t a)
 {
-  return (a < 0) ? -a : a;
+  return __builtin_fabsf16 (a);
 }
 
 #define TEST_UNOP(NAME, OPERATOR, TY)		\
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] arm: Make more use of the new mode macros
  2020-10-02 10:39             ` [PATCH] arm: Make more use of the new " Richard Sandiford
@ 2020-10-02 10:40               ` Kyrylo Tkachov
  0 siblings, 0 replies; 9+ messages in thread
From: Kyrylo Tkachov @ 2020-10-02 10:40 UTC (permalink / raw)
  To: Richard Sandiford, Christophe Lyon
  Cc: Richard Earnshaw, Dennis Zhang, gcc-patches, Ramana Radhakrishnan



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 02 October 2020 11:39
> To: Christophe Lyon <christophe.lyon@linaro.org>
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Dennis Zhang <Dennis.Zhang@arm.com>;
> gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>
> Subject: [PATCH] arm: Make more use of the new mode macros
> 
> Christophe Lyon <christophe.lyon@linaro.org> writes:
> > On Tue, 29 Sep 2020 at 12:38, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: Richard Sandiford <richard.sandiford@arm.com>
> >> > Sent: 29 September 2020 11:27
> >> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> > Cc: gcc-patches@gcc.gnu.org; nickc@redhat.com; Richard Earnshaw
> >> > <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> >> > <Ramana.Radhakrishnan@arm.com>; Dennis Zhang
> >> > <Dennis.Zhang@arm.com>
> >> > Subject: Ping: [PATCH] arm: Add new vector mode macros
> >> >
> >> > Ping
> >> >
> >> > Richard Sandiford <richard.sandiford@arm.com> writes:
> >> > > Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
> >> > >> This looks like a productive way forward to me.
> >> > >> Okay if the other maintainer don't object by the end of the week.
> >> > >
> >> > > Thanks.  Dennis pointed out off-list that it regressed
> >> > > armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
> >> > > to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
> >> > > be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
> >> > > flush-to-zero control.
> >> > >
> >> > > This version therefore updates the test to expect the same output
> >> > > as the -ffast-math version.
> >> > >
> >> > > Tested on arm-linux-gnueabi (hopefully for real this time -- I must
> >> > > have messed up the testing last time).  OK for trunk?
> >> > >
> >>
> >> Ok.
> >> Thanks,
> >> Kyrill
> >>
> >
> > Hi Richard,
> >
> > I didn't notice the initial regression you mention above, but after
> 
> (FWIW, the patch wasn't committed in the original form.  Dennis noticed
> the problem when applying it locally.)
> 
> > this commit (r11-3522),
> > I see:
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vabs\\.f16\\ts[0-9]+, s[0-9]+ 2
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vmul\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vmul\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vmul\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vsub\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vsub\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
> > FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
> > vsub\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
> >
> > Looks like we are running validations differently?
> 
> Hmm, seems like these tests are unsupported on an arm-linux-gnueabihf
> bootstrap (even though Advanced SIMD is enabled by default) since the
> tests specifically want to be able to compile softfp code. :-(
> 
> This patch also uses the macros for patterns that are currently specific
> to neon.md.  Tested on arm-linux-gnueabihf (FWIW), arm-eabi with MVE,
> and armeb-eabi with various combinations.  I see the above failures
> are fixed for the latter two.  OK to install?

Ok.
Thanks,
Kyrill

> 
> Richard


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-02 10:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 13:44 [PATCH] arm: Add new vector mode macros Richard Sandiford
2020-09-16 10:15 ` Richard Sandiford
2020-09-16 10:49   ` Kyrylo Tkachov
2020-09-22 16:20     ` Richard Sandiford
2020-09-29 10:26       ` Ping: " Richard Sandiford
2020-09-29 10:38         ` Kyrylo Tkachov
2020-09-30 12:51           ` Christophe Lyon
2020-10-02 10:39             ` [PATCH] arm: Make more use of the new " Richard Sandiford
2020-10-02 10:40               ` Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).