From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8719 invoked by alias); 25 Nov 2015 12:38:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 8704 invoked by uid 89); 25 Nov 2015 12:38:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f175.google.com Received: from mail-yk0-f175.google.com (HELO mail-yk0-f175.google.com) (209.85.160.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 25 Nov 2015 12:38:36 +0000 Received: by ykfs79 with SMTP id s79so54332841ykf.1 for ; Wed, 25 Nov 2015 04:38:34 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.129.46.76 with SMTP id u73mr33530231ywu.147.1448455114120; Wed, 25 Nov 2015 04:38:34 -0800 (PST) Received: by 10.37.93.11 with HTTP; Wed, 25 Nov 2015 04:38:33 -0800 (PST) In-Reply-To: <000001d125d0$4f09e990$ed1dbcb0$@arm.com> References: <000001d0d5b0$5da4dbb0$18ee9310$@arm.com> <000001d0d8cf$2fb42770$8f1c7650$@arm.com> <000001d0d9a6$1efdc350$5cf949f0$@arm.com> <87fv3gbs36.fsf@e105548-lin.cambridge.arm.com> <8737zfbo2j.fsf@e105548-lin.cambridge.arm.com> <87y4h7a35q.fsf@e105548-lin.cambridge.arm.com> <000001d125d0$4f09e990$ed1dbcb0$@arm.com> Date: Wed, 25 Nov 2015 12:39:00 -0000 Message-ID: Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax* From: Richard Biener To: David Sherwood Cc: GCC Patches , Richard Sandiford Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg03046.txt.bz2 On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood wrote: > Hi, > > This is part 1 of a reworked version of a patch I originally submitted in > August, rebased after Richard Sandiford's recent work on the internal > functions. This first patch adds the internal function definitions and optabs > that provide support for IEEE fmax()/fmin() functions. > > Later patches will add the appropriate aarch64/aarch32 vector instructions. Ok. Thanks, Richard. > Tested: > > x86_64-linux: no regressions > aarch64-none-elf: no regressions > arm-none-eabi: no regressions > > Regards, > David Sherwood. > > ChangeLog: > > 2015-11-19 David Sherwood > > gcc/ > * optabs.def: Add new optabs fmax_optab/fmin_optab. > * internal-fn.def: Add new fmax/fmin internal functions. > * config/aarch64/aarch64.md: New pattern. > * config/aarch64/aarch64-simd.md: Likewise. > * config/aarch64/iterators.md: New unspecs, iterators. > * config/arm/iterators.md: New iterators. > * config/arm/unspecs.md: New unspecs. > * config/arm/neon.md: New pattern. > * config/arm/vfp.md: Likewise. > * doc/md.texi: Add fmin and fmax patterns. > gcc/testsuite > * gcc.target/aarch64/fmaxmin.c: New test. > * gcc.target/arm/fmaxmin.c: New test. > > >> -----Original Message----- >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> Sent: 19 August 2015 13:35 >> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax* >> >> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford >> wrote: >> > Richard Biener writes: >> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford >> >> wrote: >> >>> Richard Biener writes: >> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford >> >>>> wrote: >> >>>>> Richard Biener writes: >> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood >> >>>>>> wrote: >> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood >> >>>>>>>> wrote: >> >>>>>>>> > Hi Richard, >> >>>>>>>> > >> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this >> >>>>>>>> > seemed more >> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In >> >>>>>>>> > addition it >> >>>>>>>> > would seem to provide more opportunities for optimisation than a >> >>>>>>>> > target-specific >> >>>>>>>> > builtin implementation would. I accept that optimisation >> >>>>>>>> > opportunities will >> >>>>>>>> > be more limited for strict math compilation, but that it was still >> >>>>>>>> > worth having >> >>>>>>>> > them. Also, if we did map it to builtins then the scalar >> >>>>>>>> > version would go >> >>>>>>>> > through the optabs and the vector version would go through the >> >>>>>>>> > target's builtin >> >>>>>>>> > expansion, which doesn't seem very consistent. >> >>>>>>>> >> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus >> >>>>>>>> you can't vectorize anyway? (strict IEEE behavior is about NaNs, >> >>>>>>>> correct?) >> >>>>>>> I thought for this particular case associativity wasn't an issue? >> >>>>>>> We're not doing any >> >>>>>>> reductions here, just simply performing max/min operations on each >> >>>>>>> pair of elements >> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to >> >>>>>>> ensure that for >> >>>>>>> each pair of elements if one element is a NaN we return the other one. >> >>>>>> >> >>>>>> Hmm, true. Ok, my comment still stands - I don't see that using a >> >>>>>> tree code is the best thing to do here. You can add fmin/max optabs >> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target >> >>>>>> builtin for the vectorized variant. >> >>>>>> >> >>>>>> The reason I am pushing against a new tree code is that we'd have an >> >>>>>> awful lot of similar codes when pushing other flag related IL >> >>>>>> specialities to actual IL constructs. And we still need to find a >> >>>>>> consistent way to do that. >> >>>>> >> >>>>> In this case though the new code is really the "native" min/max operation >> >>>>> for fp, rather than some weird flag-dependent behaviour. Maybe it's >> >>>>> a bit unfortunate that the non-strict min/max fp operation got mapped >> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really >> >>>>> the flag-related modification. The STRICT_* prefix is forced by that and >> >>>>> might make it seem like more of a special case than it really is. >> >>>> >> >>>> In some sense. But the "strict" version already has a builtin (just no >> >>>> special expander in builtins.c). We usually don't add 1:1 tree codes >> >>>> for existing builtins (why have builtins at all then?). >> >>> >> >>> We still need the builtin to match the C function (and to allow direct >> >>> calls to __builtin_fmin, etc., which are occasionally useful). >> >>> >> >>>>> If you're still not convinced, how about an internal function instead >> >>>>> of a built-in function, so that we can continue to use optabs for all >> >>>>> cases? I'd really like to avoid forcing such a generic concept down to >> >>>>> target-specific builtins with target-specific expansion code, especially >> >>>>> when the same concept is exposed by target-independent code for scalars. >> >>>> >> >>>> The target builtin is for the vectorized variant - not all targets might have >> >>>> that and we'd need to query the target about this. So using a IFN would >> >>>> mean adding a target hook for that query. >> >>> >> >>> No, the idea is that if we have a tree code or an internal function, the >> >>> decision about whether we have target support is based on a query of the >> >>> optabs (just like it is for scalar, and for other vectorisable tree codes). >> >>> No new hooks are needed. >> >>> >> >>> The patch checked for target support that way. >> >> >> >> Fair enough. Still this means we should have tree codes for all builtins >> >> that eventually are vectorized? So why don't we have SIN_EXPR, >> >> POW_EXPR (ok, I did argue and have patches for that in the past), >> >> RINT_EXPR, SQRT_EXPR, etc? >> > >> > Yeah, it doesn't sound so bad to me :-) The choice of what's a function >> > in C and what's inherent is pretty arbitrary. E.g. % on doubles could >> > have implemented fmod() or remainder(). Casts from double to int could >> > have used the current rounding mode, but instead they truncate and >> > conversions using the rounding mode need to go through something like >> > (l)lrint(). Like you say, pow() could have been an operator (and is in >> > many languages), but instead it's a function. >> > >> >> This patch starts to go down that route which is why I ask for the >> >> whole picture to be considered and hinted at the alternative implementation >> >> which follows existing practice. Add a expander in builtins.c, add an optab, >> >> and eventual support to vectorized_function. >> >> >> >> See for example ix86_builtin_vectorized_function which handles >> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR >> >> if the target supports it for the scalar mode, so not sure if there is >> >> any x86 ISA where it has vectorized FMA but not scalar FMA). >> > >> > Yeah. TBH I'm really against doing that unless (a) there's good reason >> > to believe that the concept really is specific to one target and >> > wouldn't be implemented on others or (b) there really is a function >> > rather than an instruction underneath (usually the case for sin, etc.). >> > But (b) could also be handled by the optab support library mechanism. >> > >> > Reasons against using target-specific builtins for operations that >> > have direct support in the ISA: >> > >> > 1. Like you say, in practice vector ops only tend to be supported if the >> > associated scalar op is also supported. Sticking to this approach >> > means that vector ops follow a different path from scalar ops whereas >> > (for example) division follows the same path for both. It just seems >> > confusing to have some floating-point optabs that support both scalar >> > and vector operands and others that only support scalar operands. >> > >> > 2. Once converted to a target-specific function, the target-independent >> > code has no idea what the function does or how expensive it is. >> > We might start out with just one hook to convert a scalar operation >> > to a target-dependent built-in function, but I bet over time we'll >> > grow other hooks to query properties about the function, such as >> > costs. >> > >> > 3. builtin_vectorized_function returns a decl rather than a call. >> > If the target's vector API doesn't already have a built-in for the >> > operation we need, with the exact types and arguments that we expect, >> > the target needs to define one, presumably marked so that it isn't >> > callable by input code. >> > >> > E.g. on targets where FP conversion instructions allow an explicit >> > rounding mode to be specified as an operand, it's reasonable for a >> > target's vector API to expose that operand as a constant argument to >> > the API function. There'd then be one API function for all vector- >> > float-to-vector-float integer rounding operations, rather than one >> > for vector rint(), one for vector ceil(), etc. (I'm thinking of >> > System z instructions here, although I don't know offhand what the >> > vector API is there.) IMO it doesn't make sense to force the target >> > to define "fake" built-in functions for all those possibilities >> > purely for the sake of the target hook. It's a lot of extra code, >> > and it's extra code that would be duplicated on any target that needs >> > to do this. >> > >> > IMO optabs are the best way for the target to tell the target-independent >> > code what it can do. If it supports sqrt on df it defines sqrtdf and >> > if it supports vector sqrt on v2df it defines sqrtv2df. These patterns >> > will often be a single define_expand or define_insn template -- the >> > vectorness often comes "for free" in terms of writing the pattern. >> > >> >>>> > TBH though I'm not sure why an internal_fn value (or a target-specific >> >>>> > builtin enum value) is worse than a tree-code value, unless the limit >> >>>> > of the tree_code bitfield is in sight (maybe it is). >> >>>> >> >>>> I think tree_code is 64bits now. >> >>> >> >>> Even better :-) >> >> >> >> Yes. >> >> >> >> I'm not against adding a corresponding tree code for all math builtin functions, >> >> we just have to decide whether this is the way to go (and of course support >> >> expanding those back to libcalls to libc/m rather than libgcc). There are >> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may >> >> generate as the target may not be able to expand the long double variant >> >> directly but needs a libcall but libm might not be linked or may not >> >> have support >> >> for it. That would be a new thing compared to libgcc providing a fallback >> >> for all other tree codes. >> > >> > True, but that doesn't seem too bad. The constraints would be the same >> > if we're operating on built-in functions rather than codes. I suppose >> > built-in functions make this more explicit, but at the end of the day >> > it's a costing decision. We should no more be converting a cheap >> > operation into an expensive libgcc function than converting a cheap >> > operation into an expensive libm function, even if the libgcc conversion >> > links. >> > >> > There's certainly precedent for introducing calls to things that libgcc >> > doesn't define. E.g. we already introduce calls to memcpy in things >> > like loop distribution, even though we don't provide a fallback memcpy >> > in libgcc. >> >> As an additional point for many math functions we have to support errno >> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR >> only if -fno-math-errno is in effect. But then code has to handle >> both variants for things like constant folding and expression combining. >> That's very unfortunate and something we want to avoid (one reason >> the POW_EXPR thing didn't fly when I tried). STRICT_FMIN/MAX_EXPR >> is an example where this doesn't apply, of course (but I detest the name, >> just use FMIN/FMAX_EXPR?). Still you'd need to handle both, >> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform. >> >> Richard. >> >> >> > Thanks, >> > Richard >> > >