public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch 3/3][Arm] Add support for IEEE-conformant versions of scalar fmin* and fmax*
@ 2015-12-08 12:19 David Sherwood
  0 siblings, 0 replies; only message in thread
From: David Sherwood @ 2015-12-08 12:19 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 14336 bytes --]

Hi,

Here is the last patch of the fmin/fmax change, which adds the optabs
to the arm backend.

Tested:

arm-none-eabi: no regressions

Good to go?
David Sherwood.

ChangeLog:

2015-12-08  David Sherwood  <david.sherwood@arm.com>

    gcc/
	* config/arm/iterators.md: New iterators.
	* config/arm/unspecs.md: New unspecs.
	* config/arm/neon.md: New pattern.
	* config/arm/vfp.md: Likewise.
    gcc/testsuite
	* gcc.target/arm/fmaxmin.c: New test.

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: 25 November 2015 12:39
> To: David Sherwood
> Cc: GCC Patches; Richard Sandiford
> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
> 
> On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood <david.sherwood@arm.com> wrote:
> > Hi,
> >
> > This is part 1 of a reworked version of a patch I originally submitted in
> > August, rebased after Richard Sandiford's recent work on the internal
> > functions. This first patch adds the internal function definitions and optabs
> > that provide support for IEEE fmax()/fmin() functions.
> >
> > Later patches will add the appropriate aarch64/aarch32 vector instructions.
> 
> Ok.
> 
> Thanks,
> Richard.
> 
> > Tested:
> >
> > x86_64-linux: no regressions
> > aarch64-none-elf: no regressions
> > arm-none-eabi: no regressions
> >
> > Regards,
> > David Sherwood.
> >
> > ChangeLog:
> >
> > 2015-11-19  David Sherwood  <david.sherwood@arm.com>
> >
> >     gcc/
> >         * optabs.def: Add new optabs fmax_optab/fmin_optab.
> >         * internal-fn.def: Add new fmax/fmin internal functions.
> >         * config/aarch64/aarch64.md: New pattern.
> >         * config/aarch64/aarch64-simd.md: Likewise.
> >         * config/aarch64/iterators.md: New unspecs, iterators.
> >         * config/arm/iterators.md: New iterators.
> >         * config/arm/unspecs.md: New unspecs.
> >         * config/arm/neon.md: New pattern.
> >         * config/arm/vfp.md: Likewise.
> >         * doc/md.texi: Add fmin and fmax patterns.
> >     gcc/testsuite
> >         * gcc.target/aarch64/fmaxmin.c: New test.
> >         * gcc.target/arm/fmaxmin.c: New test.
> >
> >
> >> -----Original Message-----
> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
> >> Sent: 19 August 2015 13:35
> >> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
> >>
> >> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
> >> <richard.sandiford@arm.com> wrote:
> >> > Richard Biener <richard.guenther@gmail.com> writes:
> >> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> >> >> <richard.sandiford@arm.com> wrote:
> >> >>> Richard Biener <richard.guenther@gmail.com> writes:
> >> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
> >> >>>> <richard.sandiford@arm.com> wrote:
> >> >>>>> Richard Biener <richard.guenther@gmail.com> writes:
> >> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
> >> >>>>>> <david.sherwood@arm.com> wrote:
> >> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
> >> >>>>>>>> <david.sherwood@arm.com> wrote:
> >> >>>>>>>> > Hi Richard,
> >> >>>>>>>> >
> >> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
> >> >>>>>>>> > seemed more
> >> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
> >> >>>>>>>> > addition it
> >> >>>>>>>> > would seem to provide more opportunities for optimisation than a
> >> >>>>>>>> > target-specific
> >> >>>>>>>> > builtin implementation would. I accept that optimisation
> >> >>>>>>>> > opportunities will
> >> >>>>>>>> > be more limited for strict math compilation, but that it was still
> >> >>>>>>>> > worth having
> >> >>>>>>>> > them. Also, if we did map it to builtins then the scalar
> >> >>>>>>>> > version would go
> >> >>>>>>>> > through the optabs and the vector version would go through the
> >> >>>>>>>> > target's builtin
> >> >>>>>>>> > expansion, which doesn't seem very consistent.
> >> >>>>>>>>
> >> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
> >> >>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
> >> >>>>>>>> correct?)
> >> >>>>>>> I thought for this particular case associativity wasn't an issue?
> >> >>>>>>> We're not doing any
> >> >>>>>>> reductions here, just simply performing max/min operations on each
> >> >>>>>>> pair of elements
> >> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
> >> >>>>>>> ensure that for
> >> >>>>>>> each pair of elements if one element is a NaN we return the other one.
> >> >>>>>>
> >> >>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
> >> >>>>>> tree code is the best thing to do here.  You can add fmin/max optabs
> >> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
> >> >>>>>> builtin for the vectorized variant.
> >> >>>>>>
> >> >>>>>> The reason I am pushing against a new tree code is that we'd have an
> >> >>>>>> awful lot of similar codes when pushing other flag related IL
> >> >>>>>> specialities to actual IL constructs.  And we still need to find a
> >> >>>>>> consistent way to do that.
> >> >>>>>
> >> >>>>> In this case though the new code is really the "native" min/max operation
> >> >>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
> >> >>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
> >> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
> >> >>>>> the flag-related modification.  The STRICT_* prefix is forced by that and
> >> >>>>> might make it seem like more of a special case than it really is.
> >> >>>>
> >> >>>> In some sense.  But the "strict" version already has a builtin (just no
> >> >>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
> >> >>>> for existing builtins (why have builtins at all then?).
> >> >>>
> >> >>> We still need the builtin to match the C function (and to allow direct
> >> >>> calls to __builtin_fmin, etc., which are occasionally useful).
> >> >>>
> >> >>>>> If you're still not convinced, how about an internal function instead
> >> >>>>> of a built-in function, so that we can continue to use optabs for all
> >> >>>>> cases?  I'd really like to avoid forcing such a generic concept down to
> >> >>>>> target-specific builtins with target-specific expansion code, especially
> >> >>>>> when the same concept is exposed by target-independent code for scalars.
> >> >>>>
> >> >>>> The target builtin is for the vectorized variant - not all targets might have
> >> >>>> that and we'd need to query the target about this.  So using a IFN would
> >> >>>> mean adding a target hook for that query.
> >> >>>
> >> >>> No, the idea is that if we have a tree code or an internal function, the
> >> >>> decision about whether we have target support is based on a query of the
> >> >>> optabs (just like it is for scalar, and for other vectorisable tree codes).
> >> >>> No new hooks are needed.
> >> >>>
> >> >>> The patch checked for target support that way.
> >> >>
> >> >> Fair enough.  Still this means we should have tree codes for all builtins
> >> >> that eventually are vectorized?  So why don't we have SIN_EXPR,
> >> >> POW_EXPR (ok, I did argue and have patches for that in the past),
> >> >> RINT_EXPR, SQRT_EXPR, etc?
> >> >
> >> > Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
> >> > in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
> >> > have implemented fmod() or remainder().  Casts from double to int could
> >> > have used the current rounding mode, but instead they truncate and
> >> > conversions using the rounding mode need to go through something like
> >> > (l)lrint().  Like you say, pow() could have been an operator (and is in
> >> > many languages), but instead it's a function.
> >> >
> >> >> This patch starts to go down that route which is why I ask for the
> >> >> whole picture to be considered and hinted at the alternative implementation
> >> >> which follows existing practice.  Add a expander in builtins.c, add an optab,
> >> >> and eventual support to vectorized_function.
> >> >>
> >> >> See for example ix86_builtin_vectorized_function which handles
> >> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> >> >> if the target supports it for the scalar mode, so not sure if there is
> >> >> any x86 ISA where it has vectorized FMA but not scalar FMA).
> >> >
> >> > Yeah.  TBH I'm really against doing that unless (a) there's good reason
> >> > to believe that the concept really is specific to one target and
> >> > wouldn't be implemented on others or (b) there really is a function
> >> > rather than an instruction underneath (usually the case for sin, etc.).
> >> > But (b) could also be handled by the optab support library mechanism.
> >> >
> >> > Reasons against using target-specific builtins for operations that
> >> > have direct support in the ISA:
> >> >
> >> > 1. Like you say, in practice vector ops only tend to be supported if the
> >> >    associated scalar op is also supported.  Sticking to this approach
> >> >    means that vector ops follow a different path from scalar ops whereas
> >> >    (for example) division follows the same path for both.  It just seems
> >> >    confusing to have some floating-point optabs that support both scalar
> >> >    and vector operands and others that only support scalar operands.
> >> >
> >> > 2. Once converted to a target-specific function, the target-independent
> >> >    code has no idea what the function does or how expensive it is.
> >> >    We might start out with just one hook to convert a scalar operation
> >> >    to a target-dependent built-in function, but I bet over time we'll
> >> >    grow other hooks to query properties about the function, such as
> >> >    costs.
> >> >
> >> > 3. builtin_vectorized_function returns a decl rather than a call.
> >> >    If the target's vector API doesn't already have a built-in for the
> >> >    operation we need, with the exact types and arguments that we expect,
> >> >    the target needs to define one, presumably marked so that it isn't
> >> >    callable by input code.
> >> >
> >> >    E.g. on targets where FP conversion instructions allow an explicit
> >> >    rounding mode to be specified as an operand, it's reasonable for a
> >> >    target's vector API to expose that operand as a constant argument to
> >> >    the API function.  There'd then be one API function for all vector-
> >> >    float-to-vector-float integer rounding operations, rather than one
> >> >    for vector rint(), one for vector ceil(), etc.  (I'm thinking of
> >> >    System z instructions here, although I don't know offhand what the
> >> >    vector API is there.)  IMO it doesn't make sense to force the target
> >> >    to define "fake" built-in functions for all those possibilities
> >> >    purely for the sake of the target hook.  It's a lot of extra code,
> >> >    and it's extra code that would be duplicated on any target that needs
> >> >    to do this.
> >> >
> >> > IMO optabs are the best way for the target to tell the target-independent
> >> > code what it can do.  If it supports sqrt on df it defines sqrtdf and
> >> > if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
> >> > will often be a single define_expand or define_insn template -- the
> >> > vectorness often comes "for free" in terms of writing the pattern.
> >> >
> >> >>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
> >> >>>> > builtin enum value) is worse than a tree-code value, unless the limit
> >> >>>> > of the tree_code bitfield is in sight (maybe it is).
> >> >>>>
> >> >>>> I think tree_code is 64bits now.
> >> >>>
> >> >>> Even better :-)
> >> >>
> >> >> Yes.
> >> >>
> >> >> I'm not against adding a corresponding tree code for all math builtin functions,
> >> >> we just have to decide whether this is the way to go (and of course support
> >> >> expanding those back to libcalls to libc/m rather than libgcc).  There are
> >> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> >> >> generate as the target may not be able to expand the long double variant
> >> >> directly but needs a libcall but libm might not be linked or may not
> >> >> have support
> >> >> for it.  That would be a new thing compared to libgcc providing a fallback
> >> >> for all other tree codes.
> >> >
> >> > True, but that doesn't seem too bad.  The constraints would be the same
> >> > if we're operating on built-in functions rather than codes.  I suppose
> >> > built-in functions make this more explicit, but at the end of the day
> >> > it's a costing decision.  We should no more be converting a cheap
> >> > operation into an expensive libgcc function than converting a cheap
> >> > operation into an expensive libm function, even if the libgcc conversion
> >> > links.
> >> >
> >> > There's certainly precedent for introducing calls to things that libgcc
> >> > doesn't define.  E.g. we already introduce calls to memcpy in things
> >> > like loop distribution, even though we don't provide a fallback memcpy
> >> > in libgcc.
> >>
> >> As an additional point for many math functions we have to support errno
> >> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> >> only if -fno-math-errno is in effect.  But then code has to handle
> >> both variants for things like constant folding and expression combining.
> >> That's very unfortunate and something we want to avoid (one reason
> >> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> >> is an example where this doesn't apply, of course (but I detest the name,
> >> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> >> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
> >>
> >> Richard.
> >>
> >>
> >> > Thanks,
> >> > Richard
> >> >
> >


[-- Attachment #2: fmaxmin3.patch --]
[-- Type: application/octet-stream, Size: 5091 bytes --]

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..5b227d3 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -308,6 +308,8 @@
 
 (define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN])
 
+(define_int_iterator VMAXMINFNM [UNSPEC_VMAXNM UNSPEC_VMINNM])
+
 (define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U])
 
 (define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U])
@@ -743,6 +745,13 @@
   (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min")
 ])
 
+(define_int_attr fmaxmin [
+  (UNSPEC_VMAXNM "fmax") (UNSPEC_VMINNM "fmin")])
+
+(define_int_attr fmaxmin_op [
+  (UNSPEC_VMAXNM "vmaxnm") (UNSPEC_VMINNM "vminnm")
+])
+
 (define_int_attr shift_op [
   (UNSPEC_VSHL_S "shl") (UNSPEC_VSHL_U "shl")
   (UNSPEC_VRSHL_S "rshl") (UNSPEC_VRSHL_U "rshl")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..0c1e721 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2354,6 +2354,17 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions
+(define_insn "<fmaxmin><mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+	(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		       (match_operand:VCVTF 2 "s_register_operand" "w")]
+		       VMAXMINFNM))]
+  "TARGET_NEON && TARGET_FPU_ARMV8"
+  "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
 (define_expand "neon_vpadd<mode>"
   [(match_operand:VD 0 "s_register_operand" "=w")
    (match_operand:VD 1 "s_register_operand" "w")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 67acafd..3068fc7 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -226,8 +226,10 @@
   UNSPEC_VLD4_LANE
   UNSPEC_VMAX
   UNSPEC_VMAX_U
+  UNSPEC_VMAXNM
   UNSPEC_VMIN
   UNSPEC_VMIN_U
+  UNSPEC_VMINNM
   UNSPEC_VMLA
   UNSPEC_VMLA_LANE
   UNSPEC_VMLAL_S
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index baeac62..3c89fe9 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1366,6 +1366,18 @@
    (set_attr "conds" "unconditional")]
 )
 
+;; Scalar forms for the IEEE-754 fmax()/fmin() functions
+(define_insn "<fmaxmin><mode>3"
+  [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
+	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
+		     (match_operand:SDF 2 "s_register_operand" "<F_constraint>")]
+		     VMAXMINFNM))]
+  "TARGET_HARD_FLOAT && TARGET_VFP5 <vfp_double_cond>"
+  "<fmaxmin_op>.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "f_minmax<vfp_type>")
+   (set_attr "conds" "unconditional")]
+)
+
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR)]
diff --git a/gcc/testsuite/gcc.target/arm/fmaxmin.c b/gcc/testsuite/gcc.target/arm/fmaxmin.c
new file mode 100644
index 0000000..f55ac5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fmaxmin.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -march=armv8-a -save-temps" } */
+/* { dg-add-options arm_v8_neon } */
+
+extern void abort (void);
+double fmax (double, double);
+float fmaxf (float, float);
+double fmin (double, double);
+float fminf (float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < 4; i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vmaxnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+
+/* NOTE: There are no double precision vector versions of vmaxnm/vminnm.  */
+/* { dg-final { scan-assembler-times "vmaxnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2015-12-08 12:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 12:19 [Patch 3/3][Arm] Add support for IEEE-conformant versions of scalar fmin* and fmax* David Sherwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).