From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14675 invoked by alias); 23 Nov 2015 09:21:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 14650 invoked by uid 89); 23 Nov 2015 09:21:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 23 Nov 2015 09:21:31 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-10-yecOR5QXTF66a_tKJ9RD_g-1; Mon, 23 Nov 2015 09:21:24 +0000 Received: from E105887 ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 23 Nov 2015 09:21:21 +0000 From: "David Sherwood" To: "GCC Patches" Cc: "'Richard Biener'" , "Richard Sandiford" References: <000001d0d5b0$5da4dbb0$18ee9310$@arm.com> <000001d0d8cf$2fb42770$8f1c7650$@arm.com> <000001d0d9a6$1efdc350$5cf949f0$@arm.com> <87fv3gbs36.fsf@e105548-lin.cambridge.arm.com> <8737zfbo2j.fsf@e105548-lin.cambridge.arm.com> <87y4h7a35q.fsf@e105548-lin.cambridge.arm.com> In-Reply-To: Subject: RE: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax* Date: Mon, 23 Nov 2015 09:21:00 -0000 Message-ID: <000001d125d0$4f09e990$ed1dbcb0$@arm.com> MIME-Version: 1.0 X-MC-Unique: yecOR5QXTF66a_tKJ9RD_g-1 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0001_01D125D0.4F7F67B0" X-SW-Source: 2015-11/txt/msg02655.txt.bz2 This is a multipart message in MIME format. ------=_NextPart_000_0001_01D125D0.4F7F67B0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-length: 12462 Hi, This is part 1 of a reworked version of a patch I originally submitted in August, rebased after Richard Sandiford's recent work on the internal functions. This first patch adds the internal function definitions and opta= bs that provide support for IEEE fmax()/fmin() functions. Later patches will add the appropriate aarch64/aarch32 vector instructions. Tested: x86_64-linux: no regressions aarch64-none-elf: no regressions arm-none-eabi: no regressions Regards, David Sherwood. ChangeLog: 2015-11-19 David Sherwood gcc/ * optabs.def: Add new optabs fmax_optab/fmin_optab. * internal-fn.def: Add new fmax/fmin internal functions. * config/aarch64/aarch64.md: New pattern. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/iterators.md: New unspecs, iterators. * config/arm/iterators.md: New iterators. * config/arm/unspecs.md: New unspecs. * config/arm/neon.md: New pattern. * config/arm/vfp.md: Likewise. * doc/md.texi: Add fmin and fmax patterns. gcc/testsuite * gcc.target/aarch64/fmaxmin.c: New test. * gcc.target/arm/fmaxmin.c: New test. > -----Original Message----- > From: Richard Biener [mailto:richard.guenther@gmail.com] > Sent: 19 August 2015 13:35 > To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford > Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of sc= alar fmin* and fmax* >=20 > On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford > wrote: > > Richard Biener writes: > >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford > >> wrote: > >>> Richard Biener writes: > >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford > >>>> wrote: > >>>>> Richard Biener writes: > >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood > >>>>>> wrote: > >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood > >>>>>>>> wrote: > >>>>>>>> > Hi Richard, > >>>>>>>> > > >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this > >>>>>>>> > seemed more > >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes.= In > >>>>>>>> > addition it > >>>>>>>> > would seem to provide more opportunities for optimisation than= a > >>>>>>>> > target-specific > >>>>>>>> > builtin implementation would. I accept that optimisation > >>>>>>>> > opportunities will > >>>>>>>> > be more limited for strict math compilation, but that it was s= till > >>>>>>>> > worth having > >>>>>>>> > them. Also, if we did map it to builtins then the scalar > >>>>>>>> > version would go > >>>>>>>> > through the optabs and the vector version would go through the > >>>>>>>> > target's builtin > >>>>>>>> > expansion, which doesn't seem very consistent. > >>>>>>>> > >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and= thus > >>>>>>>> you can't vectorize anyway? (strict IEEE behavior is about NaNs, > >>>>>>>> correct?) > >>>>>>> I thought for this particular case associativity wasn't an issue? > >>>>>>> We're not doing any > >>>>>>> reductions here, just simply performing max/min operations on each > >>>>>>> pair of elements > >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just ne= ed to > >>>>>>> ensure that for > >>>>>>> each pair of elements if one element is a NaN we return the other= one. > >>>>>> > >>>>>> Hmm, true. Ok, my comment still stands - I don't see that using a > >>>>>> tree code is the best thing to do here. You can add fmin/max opta= bs > >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target > >>>>>> builtin for the vectorized variant. > >>>>>> > >>>>>> The reason I am pushing against a new tree code is that we'd have = an > >>>>>> awful lot of similar codes when pushing other flag related IL > >>>>>> specialities to actual IL constructs. And we still need to find a > >>>>>> consistent way to do that. > >>>>> > >>>>> In this case though the new code is really the "native" min/max ope= ration > >>>>> for fp, rather than some weird flag-dependent behaviour. Maybe it's > >>>>> a bit unfortunate that the non-strict min/max fp operation got mapp= ed > >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is= really > >>>>> the flag-related modification. The STRICT_* prefix is forced by th= at and > >>>>> might make it seem like more of a special case than it really is. > >>>> > >>>> In some sense. But the "strict" version already has a builtin (just= no > >>>> special expander in builtins.c). We usually don't add 1:1 tree codes > >>>> for existing builtins (why have builtins at all then?). > >>> > >>> We still need the builtin to match the C function (and to allow direct > >>> calls to __builtin_fmin, etc., which are occasionally useful). > >>> > >>>>> If you're still not convinced, how about an internal function inste= ad > >>>>> of a built-in function, so that we can continue to use optabs for a= ll > >>>>> cases? I'd really like to avoid forcing such a generic concept dow= n to > >>>>> target-specific builtins with target-specific expansion code, espec= ially > >>>>> when the same concept is exposed by target-independent code for sca= lars. > >>>> > >>>> The target builtin is for the vectorized variant - not all targets m= ight have > >>>> that and we'd need to query the target about this. So using a IFN w= ould > >>>> mean adding a target hook for that query. > >>> > >>> No, the idea is that if we have a tree code or an internal function, = the > >>> decision about whether we have target support is based on a query of = the > >>> optabs (just like it is for scalar, and for other vectorisable tree c= odes). > >>> No new hooks are needed. > >>> > >>> The patch checked for target support that way. > >> > >> Fair enough. Still this means we should have tree codes for all built= ins > >> that eventually are vectorized? So why don't we have SIN_EXPR, > >> POW_EXPR (ok, I did argue and have patches for that in the past), > >> RINT_EXPR, SQRT_EXPR, etc? > > > > Yeah, it doesn't sound so bad to me :-) The choice of what's a function > > in C and what's inherent is pretty arbitrary. E.g. % on doubles could > > have implemented fmod() or remainder(). Casts from double to int could > > have used the current rounding mode, but instead they truncate and > > conversions using the rounding mode need to go through something like > > (l)lrint(). Like you say, pow() could have been an operator (and is in > > many languages), but instead it's a function. > > > >> This patch starts to go down that route which is why I ask for the > >> whole picture to be considered and hinted at the alternative implement= ation > >> which follows existing practice. Add a expander in builtins.c, add an= optab, > >> and eventual support to vectorized_function. > >> > >> See for example ix86_builtin_vectorized_function which handles > >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR > >> if the target supports it for the scalar mode, so not sure if there is > >> any x86 ISA where it has vectorized FMA but not scalar FMA). > > > > Yeah. TBH I'm really against doing that unless (a) there's good reason > > to believe that the concept really is specific to one target and > > wouldn't be implemented on others or (b) there really is a function > > rather than an instruction underneath (usually the case for sin, etc.). > > But (b) could also be handled by the optab support library mechanism. > > > > Reasons against using target-specific builtins for operations that > > have direct support in the ISA: > > > > 1. Like you say, in practice vector ops only tend to be supported if the > > associated scalar op is also supported. Sticking to this approach > > means that vector ops follow a different path from scalar ops whereas > > (for example) division follows the same path for both. It just seems > > confusing to have some floating-point optabs that support both scalar > > and vector operands and others that only support scalar operands. > > > > 2. Once converted to a target-specific function, the target-independent > > code has no idea what the function does or how expensive it is. > > We might start out with just one hook to convert a scalar operation > > to a target-dependent built-in function, but I bet over time we'll > > grow other hooks to query properties about the function, such as > > costs. > > > > 3. builtin_vectorized_function returns a decl rather than a call. > > If the target's vector API doesn't already have a built-in for the > > operation we need, with the exact types and arguments that we expect, > > the target needs to define one, presumably marked so that it isn't > > callable by input code. > > > > E.g. on targets where FP conversion instructions allow an explicit > > rounding mode to be specified as an operand, it's reasonable for a > > target's vector API to expose that operand as a constant argument to > > the API function. There'd then be one API function for all vector- > > float-to-vector-float integer rounding operations, rather than one > > for vector rint(), one for vector ceil(), etc. (I'm thinking of > > System z instructions here, although I don't know offhand what the > > vector API is there.) IMO it doesn't make sense to force the target > > to define "fake" built-in functions for all those possibilities > > purely for the sake of the target hook. It's a lot of extra code, > > and it's extra code that would be duplicated on any target that needs > > to do this. > > > > IMO optabs are the best way for the target to tell the target-independe= nt > > code what it can do. If it supports sqrt on df it defines sqrtdf and > > if it supports vector sqrt on v2df it defines sqrtv2df. These patterns > > will often be a single define_expand or define_insn template -- the > > vectorness often comes "for free" in terms of writing the pattern. > > > >>>> > TBH though I'm not sure why an internal_fn value (or a target-spec= ific > >>>> > builtin enum value) is worse than a tree-code value, unless the li= mit > >>>> > of the tree_code bitfield is in sight (maybe it is). > >>>> > >>>> I think tree_code is 64bits now. > >>> > >>> Even better :-) > >> > >> Yes. > >> > >> I'm not against adding a corresponding tree code for all math builtin = functions, > >> we just have to decide whether this is the way to go (and of course su= pport > >> expanding those back to libcalls to libc/m rather than libgcc). There= are > >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may > >> generate as the target may not be able to expand the long double varia= nt > >> directly but needs a libcall but libm might not be linked or may not > >> have support > >> for it. That would be a new thing compared to libgcc providing a fall= back > >> for all other tree codes. > > > > True, but that doesn't seem too bad. The constraints would be the same > > if we're operating on built-in functions rather than codes. I suppose > > built-in functions make this more explicit, but at the end of the day > > it's a costing decision. We should no more be converting a cheap > > operation into an expensive libgcc function than converting a cheap > > operation into an expensive libm function, even if the libgcc conversion > > links. > > > > There's certainly precedent for introducing calls to things that libgcc > > doesn't define. E.g. we already introduce calls to memcpy in things > > like loop distribution, even though we don't provide a fallback memcpy > > in libgcc. >=20 > As an additional point for many math functions we have to support errno > which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR > only if -fno-math-errno is in effect. But then code has to handle > both variants for things like constant folding and expression combining. > That's very unfortunate and something we want to avoid (one reason > the POW_EXPR thing didn't fly when I tried). STRICT_FMIN/MAX_EXPR > is an example where this doesn't apply, of course (but I detest the name, > just use FMIN/FMAX_EXPR?). Still you'd need to handle both, > FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform. >=20 > Richard. >=20 >=20 > > Thanks, > > Richard > > ------=_NextPart_000_0001_01D125D0.4F7F67B0 Content-Transfer-Encoding: base64 Content-Type: application/octet-stream; name="fmaxmin_v1.patch" Content-Disposition: attachment; filename="fmaxmin_v1.patch" Content-length: 2839 ZGlmZiAtLWdpdCBhL2djYy9kb2MvbWQudGV4aSBiL2djYy9kb2MvbWQudGV4 aQppbmRleCA4YjJkZWFhLi5lNWJjNTllIDEwMDY0NAotLS0gYS9nY2MvZG9j L21kLnRleGkKKysrIGIvZ2NjL2RvYy9tZC50ZXhpCkBAIC00OTU3LDYgKzQ5 NTcsMTUgQEAgU2lnbmVkIG1pbmltdW0gYW5kIG1heGltdW0gb3BlcmF0aW9u cy4gIFdoZW4gdXNlZCB3aXRoIGZsb2F0aW5nIHBvaW50LAogaWYgYm90aCBv cGVyYW5kcyBhcmUgemVyb3MsIG9yIGlmIGVpdGhlciBvcGVyYW5kIGlzIEBj b2Rle05hTn0sIHRoZW4KIGl0IGlzIHVuc3BlY2lmaWVkIHdoaWNoIG9mIHRo ZSB0d28gb3BlcmFuZHMgaXMgcmV0dXJuZWQgYXMgdGhlIHJlc3VsdC4KIAor QGNpbmRleCBAY29kZXtmbWluQHZhcnttfTN9IGluc3RydWN0aW9uIHBhdHRl cm4KK0BjaW5kZXggQGNvZGV7Zm1heEB2YXJ7bX0zfSBpbnN0cnVjdGlvbiBw YXR0ZXJuCitAaXRlbSBAc2FtcHtmbWluQHZhcnttfTN9LCBAc2FtcHtmbWF4 QHZhcnttfTN9CitJRUVFLWNvbmZvcm1hbnQgbWluaW11bSBhbmQgbWF4aW11 bSBvcGVyYXRpb25zLiAgSWYgb25lIG9wZXJhbmQgaXMgYSBxdWlldAorQGNv ZGV7TmFOfSwgdGhlbiB0aGUgb3RoZXIgb3BlcmFuZCBpcyByZXR1cm5lZC4g IElmIGJvdGggb3BlcmFuZHMgYXJlIHF1aWV0CitAY29kZXtOYU59LCB0aGVu IGEgcXVpZXQgQGNvZGV7TmFOfSBpcyByZXR1cm5lZC4gIEluIHRoZSBjYXNl IHdoZW4gZ2NjIHN1cHBvcnRzCitzaWduYWxsaW5nIEBjb2Rle05hTn0gKC1m c2lnbmFsaW5nLW5hbnMpIGFuIGludmFsaWQgZmxvYXRpbmcgcG9pbnQgZXhj ZXB0aW9uIGlzCityYWlzZWQgYW5kIGEgcXVpZXQgQGNvZGV7TmFOfSBpcyBy ZXR1cm5lZC4KKwogQGNpbmRleCBAY29kZXtyZWR1Y19zbWluX0B2YXJ7bX19 IGluc3RydWN0aW9uIHBhdHRlcm4KIEBjaW5kZXggQGNvZGV7cmVkdWNfc21h eF9AdmFye219fSBpbnN0cnVjdGlvbiBwYXR0ZXJuCiBAaXRlbSBAc2FtcHty ZWR1Y19zbWluX0B2YXJ7bX19LCBAc2FtcHtyZWR1Y19zbWF4X0B2YXJ7bX19 CmRpZmYgLS1naXQgYS9nY2MvaW50ZXJuYWwtZm4uZGVmIGIvZ2NjL2ludGVy bmFsLWZuLmRlZgppbmRleCA4MjVkYmExLi5kZWU5MzMyIDEwMDY0NAotLS0g YS9nY2MvaW50ZXJuYWwtZm4uZGVmCisrKyBiL2djYy9pbnRlcm5hbC1mbi5k ZWYKQEAgLTEyNSw2ICsxMjUsOCBAQCBERUZfSU5URVJOQUxfRkxUX0ZOIChG TU9ELCBFQ0ZfQ09OU1QsIGZtb2QsIGJpbmFyeSkKIERFRl9JTlRFUk5BTF9G TFRfRk4gKFBPVywgRUNGX0NPTlNULCBwb3csIGJpbmFyeSkKIERFRl9JTlRF Uk5BTF9GTFRfRk4gKFJFTUFJTkRFUiwgRUNGX0NPTlNULCByZW1haW5kZXIs IGJpbmFyeSkKIERFRl9JTlRFUk5BTF9GTFRfRk4gKFNDQUxCLCBFQ0ZfQ09O U1QsIHNjYWxiLCBiaW5hcnkpCitERUZfSU5URVJOQUxfRkxUX0ZOIChGTUlO LCBFQ0ZfQ09OU1QsIGZtaW4sIGJpbmFyeSkKK0RFRl9JTlRFUk5BTF9GTFRf Rk4gKEZNQVgsIEVDRl9DT05TVCwgZm1heCwgYmluYXJ5KQogCiAvKiBGUCBz Y2FsZXMuICAqLwogREVGX0lOVEVSTkFMX0ZMVF9GTiAoTERFWFAsIEVDRl9D T05TVCwgbGRleHAsIGJpbmFyeSkKZGlmZiAtLWdpdCBhL2djYy9vcHRhYnMu ZGVmIGIvZ2NjL29wdGFicy5kZWYKaW5kZXggMGNhMjMzMy4uZmVkNzU3YiAx MDA2NDQKLS0tIGEvZ2NjL29wdGFicy5kZWYKKysrIGIvZ2NjL29wdGFicy5k ZWYKQEAgLTI1MSw2ICsyNTEsMTAgQEAgT1BUQUJfRCAoc2luX29wdGFiLCAi c2luJGEyIikKIE9QVEFCX0QgKHNpbmNvc19vcHRhYiwgInNpbmNvcyRhMyIp CiBPUFRBQl9EICh0YW5fb3B0YWIsICJ0YW4kYTIiKQogCisvKiBDOTkgaW1w bGVtZW50YXRpb25zIG9mIGZtYXgvZm1pbi4gICovCitPUFRBQl9EIChmbWF4 X29wdGFiLCAiZm1heCRhMyIpCitPUFRBQl9EIChmbWluX29wdGFiLCAiZm1p biRhMyIpCisKIC8qIFZlY3RvciByZWR1Y3Rpb24gdG8gYSBzY2FsYXIuICAq LwogT1BUQUJfRCAocmVkdWNfc21heF9zY2FsX29wdGFiLCAicmVkdWNfc21h eF9zY2FsXyRhIikKIE9QVEFCX0QgKHJlZHVjX3NtaW5fc2NhbF9vcHRhYiwg InJlZHVjX3NtaW5fc2NhbF8kYSIpCg== ------=_NextPart_000_0001_01D125D0.4F7F67B0--