From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-460713-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 126049 invoked by alias); 22 Aug 2017 13:52:01 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 126034 invoked by uid 89); 22 Aug 2017 13:52:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS,URIBL_RED autolearn=no version=3.3.2 spammy=
X-HELO: mail-wr0-f180.google.com
Received: from mail-wr0-f180.google.com (HELO mail-wr0-f180.google.com) (209.85.128.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 22 Aug 2017 13:51:59 +0000
Received: by mail-wr0-f180.google.com with SMTP id z91so124272735wrc.4        for <gcc-patches@gcc.gnu.org>; Tue, 22 Aug 2017 06:51:58 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc:content-transfer-encoding;        bh=ehhIbrtZ8/IbQMmGCRbfk1h6wBK2hEoeJ0cU+0Di9WA=;        b=i1qO7feIyk9pPYUsnFni+VtbVBUR9GsI/PqQCmeoxy1cE6mzSCwdrGjVZ9B6ar3vuN         /z5N6oxQgMVy/RVgGRs4VEXNPlzQa3bl3RZt2ADxSHLYKZUwiPGxzT7y9h9LtDFXk2m8         0ms+1eOg9JInJvA42ZBqA5XmqWpoL27ukSTQe8cUvvvVUzTCE5r8COPAMIIBIXBwy3FD         V2kkNDhNu3/zvUX1tRVck2XRt04yCn3UkoYaLs5D1Kcw1VjnL5uOgv5Mbkpj0m4C9MHn         j+69MkgPsva/KJGPTP3fVPwsEmfkEeQ9v5jsQVHlN2cctcYigAAGQOW9PJQyUPrFSMIn         JDWg==
X-Gm-Message-State: AHYfb5j1BFJL9nzRk+iYCGi0G57heCHDPUnhL/H+hsK4hsqdtixnGJYa	qwIBJRdVohnkxXd1Ta+RIoctiFSpbA==
X-Received: by 10.80.167.65 with SMTP id h59mr183838edc.142.1503409917059; Tue, 22 Aug 2017 06:51:57 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.80.180.249 with HTTP; Tue, 22 Aug 2017 06:51:56 -0700 (PDT)
In-Reply-To: <CAJk11WCnHkf8U1r5fJapHo=c0p83XgD-hZO5YOsWdYReipgxNw@mail.gmail.com>
References: <CAJk11WCfMLenz=hume0aTTpkpJ6GM0bB3xsFL+dSvnA9zVPs+Q@mail.gmail.com> <CAFiYyc07+kScuyPZSe6oeVA4PROiLg4bb=zz95BaV_oXeGjtVw@mail.gmail.com> <CAJk11WDZMdj_fUsBQTsXShZEA4eTb86cGRs4O3zgxq7uWrA1GQ@mail.gmail.com> <alpine.DEB.2.20.1708141227130.29224@digraph.polyomino.org.uk> <CAJk11WCnHkf8U1r5fJapHo=c0p83XgD-hZO5YOsWdYReipgxNw@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 22 Aug 2017 14:20:00 -0000
Message-ID: <CAFiYyc02cMCvF-GMxAFyNFfL-g3eeUawWH287ki97K0w0m9nTQ@mail.gmail.com>
Subject: Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
To: =?UTF-8?B?UGVra2EgSsOkw6Rza2Vsw6RpbmVu?= <pekka@parmance.com>
Cc: Joseph Myers <joseph@codesourcery.com>, GCC Patches <gcc-patches@gcc.gnu.org>, 	=?UTF-8?Q?Henry_Linjam=C3=A4ki?= <henry.linjamaki@parmance.com>, 	Martin Jambor <mjambor@suse.cz>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2017-08/txt/msg01276.txt.bz2

On Tue, Aug 22, 2017 at 3:28 PM, Pekka J=C3=A4=C3=A4skel=C3=A4inen <pekka@p=
armance.com> wrote:
> Hi Richard and Joseph,
>
> Replies for both inline:
>
> I wrote:
>>> Both the inputs and outputs must be flushed to zero in the HSAIL=E2=80=
=99s
>>> =E2=80=98ftz=E2=80=99 semantics.
>>> FTZ operations were previously always =E2=80=9Cexplicit=E2=80=9D in the=
 BRIG FE output, like you
>>> propose here; there were builtin calls injected for all inputs and the
>>> output of =E2=80=98ftz=E2=80=99-marked
>>> float HSAIL instructions. This is still provided as a fallback for
>>> targets which do not
>>> support a CPU mode flag.
>
> On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> I see.  But how does making them implicit fix cases in the conformance
>> testsuite?  That is, isn't the error in the runtime implementation of
>> __hsail_ftz_*?  I'd have used a "simple" [...]
>
> There are two parts in the story here:
>
> 1) Making the FTZ/DAZ =E2=80=9Cthe default=E2=80=9D, meaning no builtin c=
alls or
> similar are used to flush
> the operands/results, but relying on that the runtime flips on the
> FTZ/DAZ CPU flags
> before executing this code. This is purely a performance optimization bec=
ause
> those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the perfor=
mance
> for multiple reasons. We implemented this optimization already in our
> staging branch of
> the BRIG FE.
>
> 2) Ensuring GCC does not perform certain compile-time optimizations with =
the
> assumption that FTZ/DAZ is optional, but make it assume that ftz
> should happen for
> correctness. The proposed patch addresses this part for the compiler
> side by disabling
> the currently known optimizations which should be flushed at runtime
> when =E2=80=9Cftz denorm
> math=E2=80=9D is desired.
>
>>> The problem with a special FTZ =E2=80=98operation=E2=80=99 of some kind=
 in the generic output is
>>> that the basic optimizations get confused by a new operation and we=E2=
=80=99d need to
>>> add knowledge of the =E2=80=98FTZ=E2=80=99 operation to a bunch of exis=
ting optimizer
>>> code, which
>>> seems unnecessary to support this case as the optimizations typically a=
pply also
>>> for the =E2=80=98FTZ semantics=E2=80=99 when the FTZ/DAZ flag is on.
>>
>> Apart from the exceptions you needed to guard ... do you have an example=
 of
>> a transform that is confused by explicit FTZ and that would be valid if =
that FTZ
>> were implicit?  An explicit FTZ should be much safer.  I think the built=
ins
>> should also be CONST and not only PURE.
>
> Explicit builtin calls ruin many optimizations starting from a simple
> common subexpression
> elimination if they don=E2=80=99t understand what the builtin returns for=
 any
> given operand.

Calls to const functions are CSEd just fine (if they are passed the same
argument, that is).

int __attribute__((const)) foo (int i);

int main()
{
  return foo(1) + foo(1);
}

results in 2 * foo (1).

Note that I expected FTZ to be a tree code and not a builtin.  The target
can then choose to simply elide all FTZ.  Constant folding can then
also correctly handle FTZ in the places where it is relevant.

> Thus,
> inlining the builtin function=E2=80=99s code would be needed first and th=
ere
> would be a lot of code
> inlined due to the abundance of ftz calls required and you cannot
> eliminate it all (as at
> compile time you don=E2=80=99t know if the operand is a denorm or not). A=
nother approach
> would be to introduce special cases to the optimizations affected so
> they understand
> the FTZ builtin and might be able to remove the useless ones. This potent=
ially
> touches _a lot_ of code. And in the end, if the CPU could flush
> denorms efficiently
> using hardware (typically it=E2=80=99s faster to do FTZ in HW than gradual
> underflow so this
> is likely the case), any builtin call to do it that cannot be
> optimized away presents
> additional, possibly major, runtime overhead.

Understood.

> We tested if a simple common subexpression elimination case works with
> the ftz builtins
> and it didn=E2=80=99t. CONST didn=E2=80=99t help here.
>
> However, I understand your concern that there might be optimizations
> that still break the
> FTZ semantics if there are no explicit builtin calls, but we are
> prepared to fix them case by
> case if/when they appear. The attached updated patch fixes a few
> additional cases we noticed,
> e.g. it disables several constant folding cases.
>
> On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers <joseph@codesourcery.com> w=
rote:
>> Presumably this means that constant folding needs to know about those
>> semantics, both for operations with a subnormal floating-point argument
>> (whether or not the output is floating point, or floating point in the
>> same format), and those with such a result?
>> Can assignments copy subnormals without converting them to zero?  Should
>> comparisons flush input subnormals to zero before comparing?  Should
>> conversions e.g. from float to double convert a float subnormal input to
>> zero?
>
> I can answer yes to all of these questions.

I think the flag approach isn't good here.  If we'd have a mode that
doesn't have
denormals we could represent that but it's the language frontend that requi=
res
a certain semantic and thus it should impose those as IL details.  These da=
ys
I'd not like to introduce global flags for semantic details of the IL
as we try to
get rid of those already existing.

Richard.

> BR,
> Pekka