On Tue, Aug 8, 2023 at 10:07 AM Richard Biener wrote: > > On Mon, 7 Aug 2023, Uros Bizjak wrote: > > > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener wrote: > > > > > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > > named patterns in order to avoid generation of partial vector V4SFmode > > > > trapping instructions. > > > > > > > > The new option is enabled by default, because even with sanitization, > > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > > > benchmark can be achieved vs. scalar code. > > > > > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > > > > vs. scalar code. This is what clang does by default, as it defaults > > > > to -fno-trapping-math. > > > > > > I like the new option, note you lack invoke.texi documentation where > > > I'd also elaborate a bit on the interaction with -fno-trapping-math > > > and the possible performance impact then NaNs or denormals leak > > > into the upper halves and cross-reference -mdaz-ftz. > > > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse > > option. It is written in a way to also cover half-float vectors. WDYT? > > "generate trapping floating-point operations" > > I'd say "generate floating-point operations that might affect the > set of floating point status flags", the word "trapping" is IMHO > misleading. > Not sure if "set of floating point status flags" is the correct term, > but it's what the C standard seems to refer to when talking about > things you get with fegetexceptflag. feraieexcept refers to > "floating-point exceptions". Unfortunately the -fno-trapping-math > documentation is similarly confusing (and maybe even wrong, I read > it to conform to 'non-stop' IEEE arithmetic). Thanks for suggesting the right terminology. I think that: +@opindex mpartial-vector-math +@item -mpartial-vector-math +This option enables GCC to generate floating-point operations that might +affect the set of floating point status flags on partial vectors, where +vector elements reside in the low part of the 128-bit SSE register. Unless +@option{-fno-trapping-math} is specified, the compiler guarantees correct +behavior by sanitizing all input operands to have zeroes in the unused +upper part of the vector register. Note that by using built-in functions +or inline assembly with partial vector arguments, NaNs, denormal or invalid +values can leak into the upper part of the vector, causing possible +performance issues when @option{-fno-trapping-math} is in effect. These +issues can be mitigated by manually sanitizing the upper part of the partial +vector argument register or by using @option{-mdaz-ftz} to set +denormals-are-zero (DAZ) flag in the MXCSR register. Now explain in adequate detail what the option does. IMO, the "floating-point operations that might affect the set of floating point status flags" correctly identifies affected operations, so an example, as suggested below, is not necessary. > I'd maybe give an example of a FP operation that's _not_ affected > by the flag (copysign?). Please note that I have renamed the option to "-mpartial-vector-math" with a short target-specific description: +partial-vector-math +Target Var(ix86_partial_vec_math) Init(1) +Enable floating-point status flags setting SSE vector operations on partial vectors which I think summarises the option (without the word "trapping"). The same approach will be taken for Float16 operations, so the approach is not specific to MMX vectors. > Otherwise it looks OK to me. Thanks, I have attached the RFC V2 patch; I plan to submit a formal patch later today. Uros.