On Tue, Aug 8, 2023 at 10:07 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 7 Aug 2023, Uros Bizjak wrote:
>
> > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Sun, 30 Jul 2023, Uros Bizjak wrote:
> > >
> > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > > > named patterns in order to avoid generation of partial vector V4SFmode
> > > > trapping instructions.
> > > >
> > > > The new option is enabled by default, because even with sanitization,
> > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita
> > > > benchmark can be achieved vs. scalar code.
> > > >
> > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
> > > > vs. scalar code.  This is what clang does by default, as it defaults
> > > > to -fno-trapping-math.
> > >
> > > I like the new option, note you lack invoke.texi documentation where
> > > I'd also elaborate a bit on the interaction with -fno-trapping-math
> > > and the possible performance impact then NaNs or denormals leak
> > > into the upper halves and cross-reference -mdaz-ftz.
> >
> > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse
> > option. It is written in a way to also cover half-float vectors. WDYT?
>
> "generate trapping floating-point operations"
>
> I'd say "generate floating-point operations that might affect the
> set of floating point status flags", the word "trapping" is IMHO
> misleading.
> Not sure if "set of floating point status flags" is the correct term,
> but it's what the C standard seems to refer to when talking about
> things you get with fegetexceptflag.  feraieexcept refers to
> "floating-point exceptions".  Unfortunately the -fno-trapping-math
> documentation is similarly confusing (and maybe even wrong, I read
> it to conform to 'non-stop' IEEE arithmetic).

Thanks for suggesting the right terminology. I think that:

+@opindex mpartial-vector-math
+@item -mpartial-vector-math
+This option enables GCC to generate floating-point operations that might
+affect the set of floating point status flags on partial vectors, where
+vector elements reside in the low part of the 128-bit SSE register.  Unless
+@option{-fno-trapping-math} is specified, the compiler guarantees correct
+behavior by sanitizing all input operands to have zeroes in the unused
+upper part of the vector register.  Note that by using built-in functions
+or inline assembly with partial vector arguments, NaNs, denormal or invalid
+values can leak into the upper part of the vector, causing possible
+performance issues when @option{-fno-trapping-math} is in effect.  These
+issues can be mitigated by manually sanitizing the upper part of the partial
+vector argument register or by using @option{-mdaz-ftz} to set
+denormals-are-zero (DAZ) flag in the MXCSR register.

Now explain in adequate detail what the option does. IMO, the
"floating-point operations that might affect the set of floating point
status flags" correctly identifies affected operations, so an example,
as suggested below, is not necessary.

> I'd maybe give an example of a FP operation that's _not_ affected
> by the flag (copysign?).

Please note that I have renamed the option to "-mpartial-vector-math"
with a short target-specific description:

+partial-vector-math
+Target Var(ix86_partial_vec_math) Init(1)
+Enable floating-point status flags setting SSE vector operations on
partial vectors

which I think summarises the option (without the word "trapping"). The
same approach will be taken for Float16 operations, so the approach is
not specific to MMX vectors.

> Otherwise it looks OK to me.

Thanks, I have attached the RFC V2 patch; I plan to submit a formal
patch later today.

Uros.