From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-405529-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 100194 invoked by alias); 19 Aug 2015 12:11:53 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 100182 invoked by uid 89); 19 Aug 2015 12:11:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 19 Aug 2015 12:11:50 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-18-6OX6ywgeQEq-NERty7O7Kg-1; Wed, 19 Aug 2015 13:11:46 +0100
Received: from localhost ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Wed, 19 Aug 2015 13:11:45 +0100
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Mail-Followup-To: Richard Biener <richard.guenther@gmail.com>,David Sherwood <David.Sherwood@arm.com>,  GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: David Sherwood <David.Sherwood@arm.com>,  GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
References: <000001d0d5b0$5da4dbb0$18ee9310$@arm.com>	<CAFiYyc1bgWwdV4PRLBuUv3yC0X-k5gJVuyyV9V7Vrz3Lte+wZw@mail.gmail.com>	<000001d0d8cf$2fb42770$8f1c7650$@arm.com>	<CAFiYyc2XT+iqyRNgp+N2gWsaP-=1xVUWsuUEj+bOq_UmE_1eLw@mail.gmail.com>	<000001d0d9a6$1efdc350$5cf949f0$@arm.com>	<CAFiYyc3CLF8beK5GaB86Ad7623gWc9yhc8nTom-ByoaHTEMyOg@mail.gmail.com>	<87fv3gbs36.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc1q7deng7mjnt788RwqkHuvdDCKW=uWpd=goEvTeZiK5Q@mail.gmail.com>	<8737zfbo2j.fsf@e105548-lin.cambridge.arm.com>	<CAFiYyc1any7rSNCYqEpMDqsCesPte1N=ancreby-XSFBJmJ1Tg@mail.gmail.com>
Date: Wed, 19 Aug 2015 12:23:00 -0000
In-Reply-To: <CAFiYyc1any7rSNCYqEpMDqsCesPte1N=ancreby-XSFBJmJ1Tg@mail.gmail.com>	(Richard Biener's message of "Wed, 19 Aug 2015 11:17:55 +0100")
Message-ID: <87y4h7a35q.fsf@e105548-lin.cambridge.arm.com>
User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
X-MC-Unique: 6OX6ywgeQEq-NERty7O7Kg-1
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2015-08/txt/msg01066.txt.bz2

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>>> <richard.sandiford@arm.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>>> <david.sherwood@arm.com> wrote:
>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>>>> <david.sherwood@arm.com> wrote:
>>>>>>> > Hi Richard,
>>>>>>> >
>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>>>> > seemed more
>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>>>> > addition it
>>>>>>> > would seem to provide more opportunities for optimisation than a
>>>>>>> > target-specific
>>>>>>> > builtin implementation would. I accept that optimisation
>>>>>>> > opportunities will
>>>>>>> > be more limited for strict math compilation, but that it was still
>>>>>>> > worth having
>>>>>>> > them. Also, if we did map it to builtins then the scalar
>>>>>>> > version would go
>>>>>>> > through the optabs and the vector version would go through the
>>>>>>> > target's builtin
>>>>>>> > expansion, which doesn't seem very consistent.
>>>>>>>
>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and th=
us
>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
>>>>>>> correct?)
>>>>>> I thought for this particular case associativity wasn't an issue?
>>>>>> We're not doing any
>>>>>> reductions here, just simply performing max/min operations on each
>>>>>> pair of elements
>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need =
to
>>>>>> ensure that for
>>>>>> each pair of elements if one element is a NaN we return the other on=
e.
>>>>>
>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>>>> builtin for the vectorized variant.
>>>>>
>>>>> The reason I am pushing against a new tree code is that we'd have an
>>>>> awful lot of similar codes when pushing other flag related IL
>>>>> specialities to actual IL constructs.  And we still need to find a
>>>>> consistent way to do that.
>>>>
>>>> In this case though the new code is really the "native" min/max operat=
ion
>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is re=
ally
>>>> the flag-related modification.  The STRICT_* prefix is forced by that =
and
>>>> might make it seem like more of a special case than it really is.
>>>
>>> In some sense.  But the "strict" version already has a builtin (just no
>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>>> for existing builtins (why have builtins at all then?).
>>
>> We still need the builtin to match the C function (and to allow direct
>> calls to __builtin_fmin, etc., which are occasionally useful).
>>
>>>> If you're still not convinced, how about an internal function instead
>>>> of a built-in function, so that we can continue to use optabs for all
>>>> cases?  I'd really like to avoid forcing such a generic concept down to
>>>> target-specific builtins with target-specific expansion code, especial=
ly
>>>> when the same concept is exposed by target-independent code for scalar=
s.
>>>
>>> The target builtin is for the vectorized variant - not all targets migh=
t have
>>> that and we'd need to query the target about this.  So using a IFN would
>>> mean adding a target hook for that query.
>>
>> No, the idea is that if we have a tree code or an internal function, the
>> decision about whether we have target support is based on a query of the
>> optabs (just like it is for scalar, and for other vectorisable tree code=
s).
>> No new hooks are needed.
>>
>> The patch checked for target support that way.
>
> Fair enough.  Still this means we should have tree codes for all builtins
> that eventually are vectorized?  So why don't we have SIN_EXPR,
> POW_EXPR (ok, I did argue and have patches for that in the past),
> RINT_EXPR, SQRT_EXPR, etc?

Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
have implemented fmod() or remainder().  Casts from double to int could
have used the current rounding mode, but instead they truncate and
conversions using the rounding mode need to go through something like
(l)lrint().  Like you say, pow() could have been an operator (and is in
many languages), but instead it's a function.

> This patch starts to go down that route which is why I ask for the
> whole picture to be considered and hinted at the alternative implementati=
on
> which follows existing practice.  Add a expander in builtins.c, add an op=
tab,
> and eventual support to vectorized_function.
>
> See for example ix86_builtin_vectorized_function which handles
> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> if the target supports it for the scalar mode, so not sure if there is
> any x86 ISA where it has vectorized FMA but not scalar FMA).

Yeah.  TBH I'm really against doing that unless (a) there's good reason
to believe that the concept really is specific to one target and
wouldn't be implemented on others or (b) there really is a function
rather than an instruction underneath (usually the case for sin, etc.).
But (b) could also be handled by the optab support library mechanism.

Reasons against using target-specific builtins for operations that
have direct support in the ISA:

1. Like you say, in practice vector ops only tend to be supported if the
   associated scalar op is also supported.  Sticking to this approach
   means that vector ops follow a different path from scalar ops whereas
   (for example) division follows the same path for both.  It just seems
   confusing to have some floating-point optabs that support both scalar
   and vector operands and others that only support scalar operands.

2. Once converted to a target-specific function, the target-independent
   code has no idea what the function does or how expensive it is.
   We might start out with just one hook to convert a scalar operation
   to a target-dependent built-in function, but I bet over time we'll
   grow other hooks to query properties about the function, such as
   costs.

3. builtin_vectorized_function returns a decl rather than a call.
   If the target's vector API doesn't already have a built-in for the
   operation we need, with the exact types and arguments that we expect,
   the target needs to define one, presumably marked so that it isn't
   callable by input code.

   E.g. on targets where FP conversion instructions allow an explicit
   rounding mode to be specified as an operand, it's reasonable for a
   target's vector API to expose that operand as a constant argument to
   the API function.  There'd then be one API function for all vector-
   float-to-vector-float integer rounding operations, rather than one
   for vector rint(), one for vector ceil(), etc.  (I'm thinking of
   System z instructions here, although I don't know offhand what the
   vector API is there.)  IMO it doesn't make sense to force the target
   to define "fake" built-in functions for all those possibilities
   purely for the sake of the target hook.  It's a lot of extra code,
   and it's extra code that would be duplicated on any target that needs
   to do this.

IMO optabs are the best way for the target to tell the target-independent
code what it can do.  If it supports sqrt on df it defines sqrtdf and
if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
will often be a single define_expand or define_insn template -- the
vectorness often comes "for free" in terms of writing the pattern.

>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>>> > builtin enum value) is worse than a tree-code value, unless the limit
>>> > of the tree_code bitfield is in sight (maybe it is).
>>>
>>> I think tree_code is 64bits now.
>>
>> Even better :-)
>
> Yes.
>
> I'm not against adding a corresponding tree code for all math builtin fun=
ctions,
> we just have to decide whether this is the way to go (and of course suppo=
rt
> expanding those back to libcalls to libc/m rather than libgcc).  There are
> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> generate as the target may not be able to expand the long double variant
> directly but needs a libcall but libm might not be linked or may not
> have support
> for it.  That would be a new thing compared to libgcc providing a fallback
> for all other tree codes.

True, but that doesn't seem too bad.  The constraints would be the same
if we're operating on built-in functions rather than codes.  I suppose
built-in functions make this more explicit, but at the end of the day
it's a costing decision.  We should no more be converting a cheap
operation into an expensive libgcc function than converting a cheap
operation into an expensive libm function, even if the libgcc conversion
links.

There's certainly precedent for introducing calls to things that libgcc
doesn't define.  E.g. we already introduce calls to memcpy in things
like loop distribution, even though we don't provide a fallback memcpy
in libgcc.

Thanks,
Richard