From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-414939-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14675 invoked by alias); 23 Nov 2015 09:21:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 14650 invoked by uid 89); 23 Nov 2015 09:21:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 23 Nov 2015 09:21:31 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-10-yecOR5QXTF66a_tKJ9RD_g-1; Mon, 23 Nov 2015 09:21:24 +0000
Received: from E105887 ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Mon, 23 Nov 2015 09:21:21 +0000
From: "David Sherwood" <david.sherwood@arm.com>
To: "GCC Patches" <gcc-patches@gcc.gnu.org>
Cc: "'Richard Biener'" <richard.guenther@gmail.com>,	"Richard Sandiford" <Richard.Sandiford@arm.com>
References: <000001d0d5b0$5da4dbb0$18ee9310$@arm.com>	<CAFiYyc1bgWwdV4PRLBuUv3yC0X-k5gJVuyyV9V7Vrz3Lte+wZw@mail.gmail.com>	<000001d0d8cf$2fb42770$8f1c7650$@arm.com>	<CAFiYyc2XT+iqyRNgp+N2gWsaP-=1xVUWsuUEj+bOq_UmE_1eLw@mail.gmail.com>	<000001d0d9a6$1efdc350$5cf949f0$@arm.com>	<CAFiYyc3CLF8beK5GaB86Ad7623gWc9yhc8nTom-ByoaHTEMyOg@mail.gmail.com>	<87fv3gbs36.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc1q7deng7mjnt788RwqkHuvdDCKW=uWpd=goEvTeZiK5Q@mail.gmail.com>	<8737zfbo2j.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc1any7rSNCYqEpMDqsCesPte1N=ancreby-XSFBJmJ1Tg@mail.gmail.com>	<87y4h7a35q.fsf@e105548-lin.cambridge.arm.com> <CAFiYyc2W82jzYE3saLQkNhhDEH1+BFk0yShrh3OVBdjHkkyr3A@mail.gmail.com>
In-Reply-To: <CAFiYyc2W82jzYE3saLQkNhhDEH1+BFk0yShrh3OVBdjHkkyr3A@mail.gmail.com>
Subject: RE: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
Date: Mon, 23 Nov 2015 09:21:00 -0000
Message-ID: <000001d125d0$4f09e990$ed1dbcb0$@arm.com>
MIME-Version: 1.0
X-MC-Unique: yecOR5QXTF66a_tKJ9RD_g-1
Content-Type: multipart/mixed;	boundary="----=_NextPart_000_0001_01D125D0.4F7F67B0"
X-SW-Source: 2015-11/txt/msg02655.txt.bz2

This is a multipart message in MIME format.
------=_NextPart_000_0001_01D125D0.4F7F67B0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Content-length: 12462

Hi,

This is part 1 of a reworked version of a patch I originally submitted in
August, rebased after Richard Sandiford's recent work on the internal
functions. This first patch adds the internal function definitions and opta=
bs
that provide support for IEEE fmax()/fmin() functions.

Later patches will add the appropriate aarch64/aarch32 vector instructions.

Tested:

x86_64-linux: no regressions
aarch64-none-elf: no regressions
arm-none-eabi: no regressions

Regards,
David Sherwood.

ChangeLog:

2015-11-19  David Sherwood  <david.sherwood@arm.com>

    gcc/
	* optabs.def: Add new optabs fmax_optab/fmin_optab.
	* internal-fn.def: Add new fmax/fmin internal functions.
	* config/aarch64/aarch64.md: New pattern.
	* config/aarch64/aarch64-simd.md: Likewise.
	* config/aarch64/iterators.md: New unspecs, iterators.
	* config/arm/iterators.md: New iterators.
	* config/arm/unspecs.md: New unspecs.
	* config/arm/neon.md: New pattern.
	* config/arm/vfp.md: Likewise.
        * doc/md.texi: Add fmin and fmax patterns.
    gcc/testsuite
        * gcc.target/aarch64/fmaxmin.c: New test.
        * gcc.target/arm/fmaxmin.c: New test.


> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: 19 August 2015 13:35
> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of sc=
alar fmin* and fmax*
>=20
> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> > Richard Biener <richard.guenther@gmail.com> writes:
> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> >> <richard.sandiford@arm.com> wrote:
> >>> Richard Biener <richard.guenther@gmail.com> writes:
> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
> >>>> <richard.sandiford@arm.com> wrote:
> >>>>> Richard Biener <richard.guenther@gmail.com> writes:
> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
> >>>>>> <david.sherwood@arm.com> wrote:
> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
> >>>>>>>> <david.sherwood@arm.com> wrote:
> >>>>>>>> > Hi Richard,
> >>>>>>>> >
> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
> >>>>>>>> > seemed more
> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes.=
 In
> >>>>>>>> > addition it
> >>>>>>>> > would seem to provide more opportunities for optimisation than=
 a
> >>>>>>>> > target-specific
> >>>>>>>> > builtin implementation would. I accept that optimisation
> >>>>>>>> > opportunities will
> >>>>>>>> > be more limited for strict math compilation, but that it was s=
till
> >>>>>>>> > worth having
> >>>>>>>> > them. Also, if we did map it to builtins then the scalar
> >>>>>>>> > version would go
> >>>>>>>> > through the optabs and the vector version would go through the
> >>>>>>>> > target's builtin
> >>>>>>>> > expansion, which doesn't seem very consistent.
> >>>>>>>>
> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and=
 thus
> >>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
> >>>>>>>> correct?)
> >>>>>>> I thought for this particular case associativity wasn't an issue?
> >>>>>>> We're not doing any
> >>>>>>> reductions here, just simply performing max/min operations on each
> >>>>>>> pair of elements
> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just ne=
ed to
> >>>>>>> ensure that for
> >>>>>>> each pair of elements if one element is a NaN we return the other=
 one.
> >>>>>>
> >>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
> >>>>>> tree code is the best thing to do here.  You can add fmin/max opta=
bs
> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
> >>>>>> builtin for the vectorized variant.
> >>>>>>
> >>>>>> The reason I am pushing against a new tree code is that we'd have =
an
> >>>>>> awful lot of similar codes when pushing other flag related IL
> >>>>>> specialities to actual IL constructs.  And we still need to find a
> >>>>>> consistent way to do that.
> >>>>>
> >>>>> In this case though the new code is really the "native" min/max ope=
ration
> >>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
> >>>>> a bit unfortunate that the non-strict min/max fp operation got mapp=
ed
> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is=
 really
> >>>>> the flag-related modification.  The STRICT_* prefix is forced by th=
at and
> >>>>> might make it seem like more of a special case than it really is.
> >>>>
> >>>> In some sense.  But the "strict" version already has a builtin (just=
 no
> >>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
> >>>> for existing builtins (why have builtins at all then?).
> >>>
> >>> We still need the builtin to match the C function (and to allow direct
> >>> calls to __builtin_fmin, etc., which are occasionally useful).
> >>>
> >>>>> If you're still not convinced, how about an internal function inste=
ad
> >>>>> of a built-in function, so that we can continue to use optabs for a=
ll
> >>>>> cases?  I'd really like to avoid forcing such a generic concept dow=
n to
> >>>>> target-specific builtins with target-specific expansion code, espec=
ially
> >>>>> when the same concept is exposed by target-independent code for sca=
lars.
> >>>>
> >>>> The target builtin is for the vectorized variant - not all targets m=
ight have
> >>>> that and we'd need to query the target about this.  So using a IFN w=
ould
> >>>> mean adding a target hook for that query.
> >>>
> >>> No, the idea is that if we have a tree code or an internal function, =
the
> >>> decision about whether we have target support is based on a query of =
the
> >>> optabs (just like it is for scalar, and for other vectorisable tree c=
odes).
> >>> No new hooks are needed.
> >>>
> >>> The patch checked for target support that way.
> >>
> >> Fair enough.  Still this means we should have tree codes for all built=
ins
> >> that eventually are vectorized?  So why don't we have SIN_EXPR,
> >> POW_EXPR (ok, I did argue and have patches for that in the past),
> >> RINT_EXPR, SQRT_EXPR, etc?
> >
> > Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
> > in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
> > have implemented fmod() or remainder().  Casts from double to int could
> > have used the current rounding mode, but instead they truncate and
> > conversions using the rounding mode need to go through something like
> > (l)lrint().  Like you say, pow() could have been an operator (and is in
> > many languages), but instead it's a function.
> >
> >> This patch starts to go down that route which is why I ask for the
> >> whole picture to be considered and hinted at the alternative implement=
ation
> >> which follows existing practice.  Add a expander in builtins.c, add an=
 optab,
> >> and eventual support to vectorized_function.
> >>
> >> See for example ix86_builtin_vectorized_function which handles
> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> >> if the target supports it for the scalar mode, so not sure if there is
> >> any x86 ISA where it has vectorized FMA but not scalar FMA).
> >
> > Yeah.  TBH I'm really against doing that unless (a) there's good reason
> > to believe that the concept really is specific to one target and
> > wouldn't be implemented on others or (b) there really is a function
> > rather than an instruction underneath (usually the case for sin, etc.).
> > But (b) could also be handled by the optab support library mechanism.
> >
> > Reasons against using target-specific builtins for operations that
> > have direct support in the ISA:
> >
> > 1. Like you say, in practice vector ops only tend to be supported if the
> >    associated scalar op is also supported.  Sticking to this approach
> >    means that vector ops follow a different path from scalar ops whereas
> >    (for example) division follows the same path for both.  It just seems
> >    confusing to have some floating-point optabs that support both scalar
> >    and vector operands and others that only support scalar operands.
> >
> > 2. Once converted to a target-specific function, the target-independent
> >    code has no idea what the function does or how expensive it is.
> >    We might start out with just one hook to convert a scalar operation
> >    to a target-dependent built-in function, but I bet over time we'll
> >    grow other hooks to query properties about the function, such as
> >    costs.
> >
> > 3. builtin_vectorized_function returns a decl rather than a call.
> >    If the target's vector API doesn't already have a built-in for the
> >    operation we need, with the exact types and arguments that we expect,
> >    the target needs to define one, presumably marked so that it isn't
> >    callable by input code.
> >
> >    E.g. on targets where FP conversion instructions allow an explicit
> >    rounding mode to be specified as an operand, it's reasonable for a
> >    target's vector API to expose that operand as a constant argument to
> >    the API function.  There'd then be one API function for all vector-
> >    float-to-vector-float integer rounding operations, rather than one
> >    for vector rint(), one for vector ceil(), etc.  (I'm thinking of
> >    System z instructions here, although I don't know offhand what the
> >    vector API is there.)  IMO it doesn't make sense to force the target
> >    to define "fake" built-in functions for all those possibilities
> >    purely for the sake of the target hook.  It's a lot of extra code,
> >    and it's extra code that would be duplicated on any target that needs
> >    to do this.
> >
> > IMO optabs are the best way for the target to tell the target-independe=
nt
> > code what it can do.  If it supports sqrt on df it defines sqrtdf and
> > if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
> > will often be a single define_expand or define_insn template -- the
> > vectorness often comes "for free" in terms of writing the pattern.
> >
> >>>> > TBH though I'm not sure why an internal_fn value (or a target-spec=
ific
> >>>> > builtin enum value) is worse than a tree-code value, unless the li=
mit
> >>>> > of the tree_code bitfield is in sight (maybe it is).
> >>>>
> >>>> I think tree_code is 64bits now.
> >>>
> >>> Even better :-)
> >>
> >> Yes.
> >>
> >> I'm not against adding a corresponding tree code for all math builtin =
functions,
> >> we just have to decide whether this is the way to go (and of course su=
pport
> >> expanding those back to libcalls to libc/m rather than libgcc).  There=
 are
> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> >> generate as the target may not be able to expand the long double varia=
nt
> >> directly but needs a libcall but libm might not be linked or may not
> >> have support
> >> for it.  That would be a new thing compared to libgcc providing a fall=
back
> >> for all other tree codes.
> >
> > True, but that doesn't seem too bad.  The constraints would be the same
> > if we're operating on built-in functions rather than codes.  I suppose
> > built-in functions make this more explicit, but at the end of the day
> > it's a costing decision.  We should no more be converting a cheap
> > operation into an expensive libgcc function than converting a cheap
> > operation into an expensive libm function, even if the libgcc conversion
> > links.
> >
> > There's certainly precedent for introducing calls to things that libgcc
> > doesn't define.  E.g. we already introduce calls to memcpy in things
> > like loop distribution, even though we don't provide a fallback memcpy
> > in libgcc.
>=20
> As an additional point for many math functions we have to support errno
> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> only if -fno-math-errno is in effect.  But then code has to handle
> both variants for things like constant folding and expression combining.
> That's very unfortunate and something we want to avoid (one reason
> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> is an example where this doesn't apply, of course (but I detest the name,
> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
>=20
> Richard.
>=20
>=20
> > Thanks,
> > Richard
> >


------=_NextPart_000_0001_01D125D0.4F7F67B0
Content-Transfer-Encoding: base64
Content-Type: application/octet-stream;
	name="fmaxmin_v1.patch"
Content-Disposition: attachment;
	filename="fmaxmin_v1.patch"
Content-length: 2839

ZGlmZiAtLWdpdCBhL2djYy9kb2MvbWQudGV4aSBiL2djYy9kb2MvbWQudGV4
aQppbmRleCA4YjJkZWFhLi5lNWJjNTllIDEwMDY0NAotLS0gYS9nY2MvZG9j
L21kLnRleGkKKysrIGIvZ2NjL2RvYy9tZC50ZXhpCkBAIC00OTU3LDYgKzQ5
NTcsMTUgQEAgU2lnbmVkIG1pbmltdW0gYW5kIG1heGltdW0gb3BlcmF0aW9u
cy4gIFdoZW4gdXNlZCB3aXRoIGZsb2F0aW5nIHBvaW50LAogaWYgYm90aCBv
cGVyYW5kcyBhcmUgemVyb3MsIG9yIGlmIGVpdGhlciBvcGVyYW5kIGlzIEBj
b2Rle05hTn0sIHRoZW4KIGl0IGlzIHVuc3BlY2lmaWVkIHdoaWNoIG9mIHRo
ZSB0d28gb3BlcmFuZHMgaXMgcmV0dXJuZWQgYXMgdGhlIHJlc3VsdC4KIAor
QGNpbmRleCBAY29kZXtmbWluQHZhcnttfTN9IGluc3RydWN0aW9uIHBhdHRl
cm4KK0BjaW5kZXggQGNvZGV7Zm1heEB2YXJ7bX0zfSBpbnN0cnVjdGlvbiBw
YXR0ZXJuCitAaXRlbSBAc2FtcHtmbWluQHZhcnttfTN9LCBAc2FtcHtmbWF4
QHZhcnttfTN9CitJRUVFLWNvbmZvcm1hbnQgbWluaW11bSBhbmQgbWF4aW11
bSBvcGVyYXRpb25zLiAgSWYgb25lIG9wZXJhbmQgaXMgYSBxdWlldAorQGNv
ZGV7TmFOfSwgdGhlbiB0aGUgb3RoZXIgb3BlcmFuZCBpcyByZXR1cm5lZC4g
IElmIGJvdGggb3BlcmFuZHMgYXJlIHF1aWV0CitAY29kZXtOYU59LCB0aGVu
IGEgcXVpZXQgQGNvZGV7TmFOfSBpcyByZXR1cm5lZC4gIEluIHRoZSBjYXNl
IHdoZW4gZ2NjIHN1cHBvcnRzCitzaWduYWxsaW5nIEBjb2Rle05hTn0gKC1m
c2lnbmFsaW5nLW5hbnMpIGFuIGludmFsaWQgZmxvYXRpbmcgcG9pbnQgZXhj
ZXB0aW9uIGlzCityYWlzZWQgYW5kIGEgcXVpZXQgQGNvZGV7TmFOfSBpcyBy
ZXR1cm5lZC4KKwogQGNpbmRleCBAY29kZXtyZWR1Y19zbWluX0B2YXJ7bX19
IGluc3RydWN0aW9uIHBhdHRlcm4KIEBjaW5kZXggQGNvZGV7cmVkdWNfc21h
eF9AdmFye219fSBpbnN0cnVjdGlvbiBwYXR0ZXJuCiBAaXRlbSBAc2FtcHty
ZWR1Y19zbWluX0B2YXJ7bX19LCBAc2FtcHtyZWR1Y19zbWF4X0B2YXJ7bX19
CmRpZmYgLS1naXQgYS9nY2MvaW50ZXJuYWwtZm4uZGVmIGIvZ2NjL2ludGVy
bmFsLWZuLmRlZgppbmRleCA4MjVkYmExLi5kZWU5MzMyIDEwMDY0NAotLS0g
YS9nY2MvaW50ZXJuYWwtZm4uZGVmCisrKyBiL2djYy9pbnRlcm5hbC1mbi5k
ZWYKQEAgLTEyNSw2ICsxMjUsOCBAQCBERUZfSU5URVJOQUxfRkxUX0ZOIChG
TU9ELCBFQ0ZfQ09OU1QsIGZtb2QsIGJpbmFyeSkKIERFRl9JTlRFUk5BTF9G
TFRfRk4gKFBPVywgRUNGX0NPTlNULCBwb3csIGJpbmFyeSkKIERFRl9JTlRF
Uk5BTF9GTFRfRk4gKFJFTUFJTkRFUiwgRUNGX0NPTlNULCByZW1haW5kZXIs
IGJpbmFyeSkKIERFRl9JTlRFUk5BTF9GTFRfRk4gKFNDQUxCLCBFQ0ZfQ09O
U1QsIHNjYWxiLCBiaW5hcnkpCitERUZfSU5URVJOQUxfRkxUX0ZOIChGTUlO
LCBFQ0ZfQ09OU1QsIGZtaW4sIGJpbmFyeSkKK0RFRl9JTlRFUk5BTF9GTFRf
Rk4gKEZNQVgsIEVDRl9DT05TVCwgZm1heCwgYmluYXJ5KQogCiAvKiBGUCBz
Y2FsZXMuICAqLwogREVGX0lOVEVSTkFMX0ZMVF9GTiAoTERFWFAsIEVDRl9D
T05TVCwgbGRleHAsIGJpbmFyeSkKZGlmZiAtLWdpdCBhL2djYy9vcHRhYnMu
ZGVmIGIvZ2NjL29wdGFicy5kZWYKaW5kZXggMGNhMjMzMy4uZmVkNzU3YiAx
MDA2NDQKLS0tIGEvZ2NjL29wdGFicy5kZWYKKysrIGIvZ2NjL29wdGFicy5k
ZWYKQEAgLTI1MSw2ICsyNTEsMTAgQEAgT1BUQUJfRCAoc2luX29wdGFiLCAi
c2luJGEyIikKIE9QVEFCX0QgKHNpbmNvc19vcHRhYiwgInNpbmNvcyRhMyIp
CiBPUFRBQl9EICh0YW5fb3B0YWIsICJ0YW4kYTIiKQogCisvKiBDOTkgaW1w
bGVtZW50YXRpb25zIG9mIGZtYXgvZm1pbi4gICovCitPUFRBQl9EIChmbWF4
X29wdGFiLCAiZm1heCRhMyIpCitPUFRBQl9EIChmbWluX29wdGFiLCAiZm1p
biRhMyIpCisKIC8qIFZlY3RvciByZWR1Y3Rpb24gdG8gYSBzY2FsYXIuICAq
LwogT1BUQUJfRCAocmVkdWNfc21heF9zY2FsX29wdGFiLCAicmVkdWNfc21h
eF9zY2FsXyRhIikKIE9QVEFCX0QgKHJlZHVjX3NtaW5fc2NhbF9vcHRhYiwg
InJlZHVjX3NtaW5fc2NhbF8kYSIpCg==

------=_NextPart_000_0001_01D125D0.4F7F67B0--