From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by sourceware.org (Postfix) with ESMTPS id F117C3858280 for ; Tue, 8 Aug 2023 11:03:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F117C3858280 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-4fe15bfb1adso9160030e87.0 for ; Tue, 08 Aug 2023 04:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691492626; x=1692097426; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8cgU9Z2xKhXQn8gXKB9TM6qDRB74Dtg9sD91F/7TOuU=; b=ajg+gxvMYHutKbKlxT5K9oitxJ6RIr/a6Nf1gxj+zL3Ft0oEIP/b5Cg52zzi6mln4k 0eyHfaLrUL3EDnK61XPE67KBdbJleXxz5yMm9+Fds9goYnGCvfazfkMchH9Kwvq6HLoS Ur5CYKAnbWPketcz1ZAqYX7EPtONVYd1ILee+fkxV7u7kYaZlvT8QE0INvhVRGSgZ/KU xtsTiwlzuAdJeViSYpaU3FgVkqlwa10QJaoHhTNgv/wUe49SAhp7/JfTC3ieUIOnSr5J tI9sRdv3EPWmqU2n27uJWJYOuNFmRIOHvQ7uWvJ7BFsQWU/h5Nfh+VtbP7XOr3Yj5O7w zMVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691492626; x=1692097426; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8cgU9Z2xKhXQn8gXKB9TM6qDRB74Dtg9sD91F/7TOuU=; b=eeE7PVP1Wga1R9YaHIVvMDw+ZCSfuk4KKiasQD6goBoyJjDjd1DRBDjxjkW9Nz+R/d mHBbZmZZkyWaakvfPdBb4hXPKsGct5GAq4JRSZe611GSSTXAlUv3QOu4VHNFiMaFORS7 MYz7madIhbs6p8/FrH3GAsx3T2rRH059sCQSZFTn1pAXUpY/GGjI9u617hKU8XH+re2b cGaPtaDqgSKC9kFrlG2khztJl6HbUfBKQbbBi7VLxw4lrR9Zdb+qLcADbeULuXiHrZ41 3DyUGLfO3/4To2DUaIqDkBu26foXqK6zmqgYA6IoyuIzvQV5THg1d3T6foB4sCx0/TIe cWNQ== X-Gm-Message-State: AOJu0YzTDxQ0rXbEUNLaSrh5JknVWZ9BIdhcBlotDTut1UklJJh7GkdG DaaszVOVOTfZmID5uX2uGi93M7qTH2eY/Tmfb4g= X-Google-Smtp-Source: AGHT+IHJ5rVxKltifWtFWulcfx/m7sGSPWy+BtuA8h7AEH5xpHClIkQIO9uJLVXYqWymwN35rehjc0Spw0JQDlDE8hM= X-Received: by 2002:a05:6512:e9a:b0:4fd:d213:dfd0 with SMTP id bi26-20020a0565120e9a00b004fdd213dfd0mr9324640lfb.11.1691492626097; Tue, 08 Aug 2023 04:03:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Uros Bizjak Date: Tue, 8 Aug 2023 13:03:34 +0200 Message-ID: Subject: Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832] To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" , Jan Hubicka , Hongtao Liu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Aug 8, 2023 at 12:08=E2=80=AFPM Richard Biener = wrote: > > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping= V2SF > > > > > > named patterns in order to avoid generation of partial vector V= 4SFmode > > > > > > trapping instructions. > > > > > > > > > > > > The new option is enabled by default, because even with sanitiz= ation, > > > > > > a small but consistent speed up of 2 to 3% with Polyhedron capa= cita > > > > > > benchmark can be achieved vs. scalar code. > > > > > > > > > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8= to 9% > > > > > > vs. scalar code. This is what clang does by default, as it def= aults > > > > > > to -fno-trapping-math. > > > > > > > > > > I like the new option, note you lack invoke.texi documentation wh= ere > > > > > I'd also elaborate a bit on the interaction with -fno-trapping-ma= th > > > > > and the possible performance impact then NaNs or denormals leak > > > > > into the upper halves and cross-reference -mdaz-ftz. > > > > > > > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse > > > > option. It is written in a way to also cover half-float vectors. WD= YT? > > > > > > "generate trapping floating-point operations" > > > > > > I'd say "generate floating-point operations that might affect the > > > set of floating point status flags", the word "trapping" is IMHO > > > misleading. > > > Not sure if "set of floating point status flags" is the correct term, > > > but it's what the C standard seems to refer to when talking about > > > things you get with fegetexceptflag. feraieexcept refers to > > > "floating-point exceptions". Unfortunately the -fno-trapping-math > > > documentation is similarly confusing (and maybe even wrong, I read > > > it to conform to 'non-stop' IEEE arithmetic). > > > > Thanks for suggesting the right terminology. I think that: > > > > +@opindex mpartial-vector-math > > +@item -mpartial-vector-math > > +This option enables GCC to generate floating-point operations that mig= ht > > +affect the set of floating point status flags on partial vectors, wher= e > > +vector elements reside in the low part of the 128-bit SSE register. U= nless > > +@option{-fno-trapping-math} is specified, the compiler guarantees corr= ect > > +behavior by sanitizing all input operands to have zeroes in the unused > > +upper part of the vector register. Note that by using built-in functi= ons > > +or inline assembly with partial vector arguments, NaNs, denormal or in= valid > > +values can leak into the upper part of the vector, causing possible > > +performance issues when @option{-fno-trapping-math} is in effect. The= se > > +issues can be mitigated by manually sanitizing the upper part of the p= artial > > +vector argument register or by using @option{-mdaz-ftz} to set > > +denormals-are-zero (DAZ) flag in the MXCSR register. > > > > Now explain in adequate detail what the option does. IMO, the > > "floating-point operations that might affect the set of floating point > > status flags" correctly identifies affected operations, so an example, > > as suggested below, is not necessary. > > > > > I'd maybe give an example of a FP operation that's _not_ affected > > > by the flag (copysign?). > > > > Please note that I have renamed the option to "-mpartial-vector-math" > > with a short target-specific description: > > Ah yes, that's a less confusing name but then it might suggest > that -mno-partial-vector-math would disable all of that, including > integer ops, not only the patterns possibly affecting the exception > flags? Note I don't have a better suggestion and this is clearly > better than the one mentioning mmx. You are right, I think I'll rename the option to -mpartial-vector-fp-math. Thanks, Uros.