From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 746753858D20 for ; Tue, 8 Aug 2023 10:08:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 746753858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 93327221C8; Tue, 8 Aug 2023 10:08:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1691489302; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b4ahb5l82psZhaK6drVrGCR1DVKFe2V3Ad2g6N3EbQo=; b=nJDDgVDrFnDi5OkoUqycuD+KePeLX+BGw7MJbf6rO6zBk3ZkJGxHxTxf8Yi3BTKXCkBcjR UH0n84VS6uD1GVmEiwi18YttcOUIDoVDUqJthiBW7f424RHzvzX+525jHNr3m/3fxt5MMo u29FA2K6SGWBKXExX47cuHO9Yg1v19g= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1691489302; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b4ahb5l82psZhaK6drVrGCR1DVKFe2V3Ad2g6N3EbQo=; b=OQ4aoBUPFleB+DjuDT52CsELaboJc36uUXEC0ExnMmJEAYZe7KMfIydRb9+BEmjialX6dd R6PYoHtYQTVVsmBw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 348EE2C142; Tue, 8 Aug 2023 10:08:22 +0000 (UTC) Date: Tue, 8 Aug 2023 10:08:22 +0000 (UTC) From: Richard Biener To: Uros Bizjak cc: "gcc-patches@gcc.gnu.org" , Jan Hubicka , Hongtao Liu Subject: Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 8 Aug 2023, Uros Bizjak wrote: > On Tue, Aug 8, 2023 at 10:07?AM Richard Biener wrote: > > > > On Mon, 7 Aug 2023, Uros Bizjak wrote: > > > > > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener wrote: > > > > > > > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > > > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > > > named patterns in order to avoid generation of partial vector V4SFmode > > > > > trapping instructions. > > > > > > > > > > The new option is enabled by default, because even with sanitization, > > > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > > > > benchmark can be achieved vs. scalar code. > > > > > > > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > > > > > vs. scalar code. This is what clang does by default, as it defaults > > > > > to -fno-trapping-math. > > > > > > > > I like the new option, note you lack invoke.texi documentation where > > > > I'd also elaborate a bit on the interaction with -fno-trapping-math > > > > and the possible performance impact then NaNs or denormals leak > > > > into the upper halves and cross-reference -mdaz-ftz. > > > > > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse > > > option. It is written in a way to also cover half-float vectors. WDYT? > > > > "generate trapping floating-point operations" > > > > I'd say "generate floating-point operations that might affect the > > set of floating point status flags", the word "trapping" is IMHO > > misleading. > > Not sure if "set of floating point status flags" is the correct term, > > but it's what the C standard seems to refer to when talking about > > things you get with fegetexceptflag. feraieexcept refers to > > "floating-point exceptions". Unfortunately the -fno-trapping-math > > documentation is similarly confusing (and maybe even wrong, I read > > it to conform to 'non-stop' IEEE arithmetic). > > Thanks for suggesting the right terminology. I think that: > > +@opindex mpartial-vector-math > +@item -mpartial-vector-math > +This option enables GCC to generate floating-point operations that might > +affect the set of floating point status flags on partial vectors, where > +vector elements reside in the low part of the 128-bit SSE register. Unless > +@option{-fno-trapping-math} is specified, the compiler guarantees correct > +behavior by sanitizing all input operands to have zeroes in the unused > +upper part of the vector register. Note that by using built-in functions > +or inline assembly with partial vector arguments, NaNs, denormal or invalid > +values can leak into the upper part of the vector, causing possible > +performance issues when @option{-fno-trapping-math} is in effect. These > +issues can be mitigated by manually sanitizing the upper part of the partial > +vector argument register or by using @option{-mdaz-ftz} to set > +denormals-are-zero (DAZ) flag in the MXCSR register. > > Now explain in adequate detail what the option does. IMO, the > "floating-point operations that might affect the set of floating point > status flags" correctly identifies affected operations, so an example, > as suggested below, is not necessary. > > > I'd maybe give an example of a FP operation that's _not_ affected > > by the flag (copysign?). > > Please note that I have renamed the option to "-mpartial-vector-math" > with a short target-specific description: Ah yes, that's a less confusing name but then it might suggest that -mno-partial-vector-math would disable all of that, including integer ops, not only the patterns possibly affecting the exception flags? Note I don't have a better suggestion and this is clearly better than the one mentioning mmx. > +partial-vector-math > +Target Var(ix86_partial_vec_math) Init(1) > +Enable floating-point status flags setting SSE vector operations on > partial vectors > > which I think summarises the option (without the word "trapping"). The > same approach will be taken for Float16 operations, so the approach is > not specific to MMX vectors. > > > Otherwise it looks OK to me. > > Thanks, I have attached the RFC V2 patch; I plan to submit a formal > patch later today. Thanks. With AVX512VL there might also be the option to use a mask (with the penalty of a very much larger instruction encoding). Richard.