From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 126049 invoked by alias); 22 Aug 2017 13:52:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 126034 invoked by uid 89); 22 Aug 2017 13:52:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS,URIBL_RED autolearn=no version=3.3.2 spammy= X-HELO: mail-wr0-f180.google.com Received: from mail-wr0-f180.google.com (HELO mail-wr0-f180.google.com) (209.85.128.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 22 Aug 2017 13:51:59 +0000 Received: by mail-wr0-f180.google.com with SMTP id z91so124272735wrc.4 for ; Tue, 22 Aug 2017 06:51:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ehhIbrtZ8/IbQMmGCRbfk1h6wBK2hEoeJ0cU+0Di9WA=; b=i1qO7feIyk9pPYUsnFni+VtbVBUR9GsI/PqQCmeoxy1cE6mzSCwdrGjVZ9B6ar3vuN /z5N6oxQgMVy/RVgGRs4VEXNPlzQa3bl3RZt2ADxSHLYKZUwiPGxzT7y9h9LtDFXk2m8 0ms+1eOg9JInJvA42ZBqA5XmqWpoL27ukSTQe8cUvvvVUzTCE5r8COPAMIIBIXBwy3FD V2kkNDhNu3/zvUX1tRVck2XRt04yCn3UkoYaLs5D1Kcw1VjnL5uOgv5Mbkpj0m4C9MHn j+69MkgPsva/KJGPTP3fVPwsEmfkEeQ9v5jsQVHlN2cctcYigAAGQOW9PJQyUPrFSMIn JDWg== X-Gm-Message-State: AHYfb5j1BFJL9nzRk+iYCGi0G57heCHDPUnhL/H+hsK4hsqdtixnGJYa qwIBJRdVohnkxXd1Ta+RIoctiFSpbA== X-Received: by 10.80.167.65 with SMTP id h59mr183838edc.142.1503409917059; Tue, 22 Aug 2017 06:51:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.180.249 with HTTP; Tue, 22 Aug 2017 06:51:56 -0700 (PDT) In-Reply-To: References: From: Richard Biener Date: Tue, 22 Aug 2017 14:20:00 -0000 Message-ID: Subject: Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations To: =?UTF-8?B?UGVra2EgSsOkw6Rza2Vsw6RpbmVu?= Cc: Joseph Myers , GCC Patches , =?UTF-8?Q?Henry_Linjam=C3=A4ki?= , Martin Jambor Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg01276.txt.bz2 On Tue, Aug 22, 2017 at 3:28 PM, Pekka J=C3=A4=C3=A4skel=C3=A4inen wrote: > Hi Richard and Joseph, > > Replies for both inline: > > I wrote: >>> Both the inputs and outputs must be flushed to zero in the HSAIL=E2=80= =99s >>> =E2=80=98ftz=E2=80=99 semantics. >>> FTZ operations were previously always =E2=80=9Cexplicit=E2=80=9D in the= BRIG FE output, like you >>> propose here; there were builtin calls injected for all inputs and the >>> output of =E2=80=98ftz=E2=80=99-marked >>> float HSAIL instructions. This is still provided as a fallback for >>> targets which do not >>> support a CPU mode flag. > > On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener > wrote: >> I see. But how does making them implicit fix cases in the conformance >> testsuite? That is, isn't the error in the runtime implementation of >> __hsail_ftz_*? I'd have used a "simple" [...] > > There are two parts in the story here: > > 1) Making the FTZ/DAZ =E2=80=9Cthe default=E2=80=9D, meaning no builtin c= alls or > similar are used to flush > the operands/results, but relying on that the runtime flips on the > FTZ/DAZ CPU flags > before executing this code. This is purely a performance optimization bec= ause > those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the perfor= mance > for multiple reasons. We implemented this optimization already in our > staging branch of > the BRIG FE. > > 2) Ensuring GCC does not perform certain compile-time optimizations with = the > assumption that FTZ/DAZ is optional, but make it assume that ftz > should happen for > correctness. The proposed patch addresses this part for the compiler > side by disabling > the currently known optimizations which should be flushed at runtime > when =E2=80=9Cftz denorm > math=E2=80=9D is desired. > >>> The problem with a special FTZ =E2=80=98operation=E2=80=99 of some kind= in the generic output is >>> that the basic optimizations get confused by a new operation and we=E2= =80=99d need to >>> add knowledge of the =E2=80=98FTZ=E2=80=99 operation to a bunch of exis= ting optimizer >>> code, which >>> seems unnecessary to support this case as the optimizations typically a= pply also >>> for the =E2=80=98FTZ semantics=E2=80=99 when the FTZ/DAZ flag is on. >> >> Apart from the exceptions you needed to guard ... do you have an example= of >> a transform that is confused by explicit FTZ and that would be valid if = that FTZ >> were implicit? An explicit FTZ should be much safer. I think the built= ins >> should also be CONST and not only PURE. > > Explicit builtin calls ruin many optimizations starting from a simple > common subexpression > elimination if they don=E2=80=99t understand what the builtin returns for= any > given operand. Calls to const functions are CSEd just fine (if they are passed the same argument, that is). int __attribute__((const)) foo (int i); int main() { return foo(1) + foo(1); } results in 2 * foo (1). Note that I expected FTZ to be a tree code and not a builtin. The target can then choose to simply elide all FTZ. Constant folding can then also correctly handle FTZ in the places where it is relevant. > Thus, > inlining the builtin function=E2=80=99s code would be needed first and th= ere > would be a lot of code > inlined due to the abundance of ftz calls required and you cannot > eliminate it all (as at > compile time you don=E2=80=99t know if the operand is a denorm or not). A= nother approach > would be to introduce special cases to the optimizations affected so > they understand > the FTZ builtin and might be able to remove the useless ones. This potent= ially > touches _a lot_ of code. And in the end, if the CPU could flush > denorms efficiently > using hardware (typically it=E2=80=99s faster to do FTZ in HW than gradual > underflow so this > is likely the case), any builtin call to do it that cannot be > optimized away presents > additional, possibly major, runtime overhead. Understood. > We tested if a simple common subexpression elimination case works with > the ftz builtins > and it didn=E2=80=99t. CONST didn=E2=80=99t help here. > > However, I understand your concern that there might be optimizations > that still break the > FTZ semantics if there are no explicit builtin calls, but we are > prepared to fix them case by > case if/when they appear. The attached updated patch fixes a few > additional cases we noticed, > e.g. it disables several constant folding cases. > > On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers w= rote: >> Presumably this means that constant folding needs to know about those >> semantics, both for operations with a subnormal floating-point argument >> (whether or not the output is floating point, or floating point in the >> same format), and those with such a result? >> Can assignments copy subnormals without converting them to zero? Should >> comparisons flush input subnormals to zero before comparing? Should >> conversions e.g. from float to double convert a float subnormal input to >> zero? > > I can answer yes to all of these questions. I think the flag approach isn't good here. If we'd have a mode that doesn't have denormals we could represent that but it's the language frontend that requi= res a certain semantic and thus it should impose those as IL details. These da= ys I'd not like to introduce global flags for semantic details of the IL as we try to get rid of those already existing. Richard. > BR, > Pekka