From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MK31=DZ=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c])
	by sourceware.org (Postfix) with ESMTPS id 746753858D20
	for <gcc-patches@gcc.gnu.org>; Tue,  8 Aug 2023 10:08:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 746753858D20
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id 93327221C8;
	Tue,  8 Aug 2023 10:08:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1691489302; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=b4ahb5l82psZhaK6drVrGCR1DVKFe2V3Ad2g6N3EbQo=;
	b=nJDDgVDrFnDi5OkoUqycuD+KePeLX+BGw7MJbf6rO6zBk3ZkJGxHxTxf8Yi3BTKXCkBcjR
	UH0n84VS6uD1GVmEiwi18YttcOUIDoVDUqJthiBW7f424RHzvzX+525jHNr3m/3fxt5MMo
	u29FA2K6SGWBKXExX47cuHO9Yg1v19g=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1691489302;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=b4ahb5l82psZhaK6drVrGCR1DVKFe2V3Ad2g6N3EbQo=;
	b=OQ4aoBUPFleB+DjuDT52CsELaboJc36uUXEC0ExnMmJEAYZe7KMfIydRb9+BEmjialX6dd
	R6PYoHtYQTVVsmBw==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 348EE2C142;
	Tue,  8 Aug 2023 10:08:22 +0000 (UTC)
Date: Tue, 8 Aug 2023 10:08:22 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Uros Bizjak <ubizjak@gmail.com>
cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, 
    Jan Hubicka <hubicka@ucw.cz>, Hongtao Liu <hongtao.liu@intel.com>
Subject: Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg
 with -fno-trapping-math [PR110832]
In-Reply-To: <CAFULd4YhGRqs9ByoQQjXwEB+ndi9VHmkv=1RUu2GBUskT8c2GQ@mail.gmail.com>
Message-ID: <nycvar.YFH.7.77.849.2308081003560.12935@jbgna.fhfr.qr>
References: <CAFULd4abm7fZrKOYWMibFDM=uBk1TET0vSn7=5=-tYhcVrRdUA@mail.gmail.com> <nycvar.YFH.7.77.849.2307310937420.12935@jbgna.fhfr.qr> <CAFULd4ZUXTDoAgGN_xi0tgWCm=gC5Vmd1nM+0KHa+MDPRC5V9A@mail.gmail.com> <nycvar.YFH.7.77.849.2308080758430.12935@jbgna.fhfr.qr>
 <CAFULd4YhGRqs9ByoQQjXwEB+ndi9VHmkv=1RUu2GBUskT8c2GQ@mail.gmail.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, 8 Aug 2023, Uros Bizjak wrote:

> On Tue, Aug 8, 2023 at 10:07?AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 7 Aug 2023, Uros Bizjak wrote:
> >
> > > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Sun, 30 Jul 2023, Uros Bizjak wrote:
> > > >
> > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > > > > named patterns in order to avoid generation of partial vector V4SFmode
> > > > > trapping instructions.
> > > > >
> > > > > The new option is enabled by default, because even with sanitization,
> > > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita
> > > > > benchmark can be achieved vs. scalar code.
> > > > >
> > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
> > > > > vs. scalar code.  This is what clang does by default, as it defaults
> > > > > to -fno-trapping-math.
> > > >
> > > > I like the new option, note you lack invoke.texi documentation where
> > > > I'd also elaborate a bit on the interaction with -fno-trapping-math
> > > > and the possible performance impact then NaNs or denormals leak
> > > > into the upper halves and cross-reference -mdaz-ftz.
> > >
> > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse
> > > option. It is written in a way to also cover half-float vectors. WDYT?
> >
> > "generate trapping floating-point operations"
> >
> > I'd say "generate floating-point operations that might affect the
> > set of floating point status flags", the word "trapping" is IMHO
> > misleading.
> > Not sure if "set of floating point status flags" is the correct term,
> > but it's what the C standard seems to refer to when talking about
> > things you get with fegetexceptflag.  feraieexcept refers to
> > "floating-point exceptions".  Unfortunately the -fno-trapping-math
> > documentation is similarly confusing (and maybe even wrong, I read
> > it to conform to 'non-stop' IEEE arithmetic).
> 
> Thanks for suggesting the right terminology. I think that:
> 
> +@opindex mpartial-vector-math
> +@item -mpartial-vector-math
> +This option enables GCC to generate floating-point operations that might
> +affect the set of floating point status flags on partial vectors, where
> +vector elements reside in the low part of the 128-bit SSE register.  Unless
> +@option{-fno-trapping-math} is specified, the compiler guarantees correct
> +behavior by sanitizing all input operands to have zeroes in the unused
> +upper part of the vector register.  Note that by using built-in functions
> +or inline assembly with partial vector arguments, NaNs, denormal or invalid
> +values can leak into the upper part of the vector, causing possible
> +performance issues when @option{-fno-trapping-math} is in effect.  These
> +issues can be mitigated by manually sanitizing the upper part of the partial
> +vector argument register or by using @option{-mdaz-ftz} to set
> +denormals-are-zero (DAZ) flag in the MXCSR register.
> 
> Now explain in adequate detail what the option does. IMO, the
> "floating-point operations that might affect the set of floating point
> status flags" correctly identifies affected operations, so an example,
> as suggested below, is not necessary.
> 
> > I'd maybe give an example of a FP operation that's _not_ affected
> > by the flag (copysign?).
> 
> Please note that I have renamed the option to "-mpartial-vector-math"
> with a short target-specific description:

Ah yes, that's a less confusing name but then it might suggest
that -mno-partial-vector-math would disable all of that, including
integer ops, not only the patterns possibly affecting the exception
flags?  Note I don't have a better suggestion and this is clearly
better than the one mentioning mmx.

> +partial-vector-math
> +Target Var(ix86_partial_vec_math) Init(1)
> +Enable floating-point status flags setting SSE vector operations on
> partial vectors
> 
> which I think summarises the option (without the word "trapping"). The
> same approach will be taken for Float16 operations, so the approach is
> not specific to MMX vectors.
> 
> > Otherwise it looks OK to me.
> 
> Thanks, I have attached the RFC V2 patch; I plan to submit a formal
> patch later today.

Thanks.  With AVX512VL there might also be the option to use
a mask (with the penalty of a very much larger instruction encoding).

Richard.