From: Uros Bizjak <ubizjak@gmail.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Jan Hubicka <hubicka@ucw.cz>,
Hongtao Liu <hongtao.liu@intel.com>
Subject: Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]
Date: Tue, 8 Aug 2023 11:06:26 +0200 [thread overview]
Message-ID: <CAFULd4YhGRqs9ByoQQjXwEB+ndi9VHmkv=1RUu2GBUskT8c2GQ@mail.gmail.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2308080758430.12935@jbgna.fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 3841 bytes --]
On Tue, Aug 8, 2023 at 10:07 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 7 Aug 2023, Uros Bizjak wrote:
>
> > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Sun, 30 Jul 2023, Uros Bizjak wrote:
> > >
> > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > > > named patterns in order to avoid generation of partial vector V4SFmode
> > > > trapping instructions.
> > > >
> > > > The new option is enabled by default, because even with sanitization,
> > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita
> > > > benchmark can be achieved vs. scalar code.
> > > >
> > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
> > > > vs. scalar code. This is what clang does by default, as it defaults
> > > > to -fno-trapping-math.
> > >
> > > I like the new option, note you lack invoke.texi documentation where
> > > I'd also elaborate a bit on the interaction with -fno-trapping-math
> > > and the possible performance impact then NaNs or denormals leak
> > > into the upper halves and cross-reference -mdaz-ftz.
> >
> > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse
> > option. It is written in a way to also cover half-float vectors. WDYT?
>
> "generate trapping floating-point operations"
>
> I'd say "generate floating-point operations that might affect the
> set of floating point status flags", the word "trapping" is IMHO
> misleading.
> Not sure if "set of floating point status flags" is the correct term,
> but it's what the C standard seems to refer to when talking about
> things you get with fegetexceptflag. feraieexcept refers to
> "floating-point exceptions". Unfortunately the -fno-trapping-math
> documentation is similarly confusing (and maybe even wrong, I read
> it to conform to 'non-stop' IEEE arithmetic).
Thanks for suggesting the right terminology. I think that:
+@opindex mpartial-vector-math
+@item -mpartial-vector-math
+This option enables GCC to generate floating-point operations that might
+affect the set of floating point status flags on partial vectors, where
+vector elements reside in the low part of the 128-bit SSE register. Unless
+@option{-fno-trapping-math} is specified, the compiler guarantees correct
+behavior by sanitizing all input operands to have zeroes in the unused
+upper part of the vector register. Note that by using built-in functions
+or inline assembly with partial vector arguments, NaNs, denormal or invalid
+values can leak into the upper part of the vector, causing possible
+performance issues when @option{-fno-trapping-math} is in effect. These
+issues can be mitigated by manually sanitizing the upper part of the partial
+vector argument register or by using @option{-mdaz-ftz} to set
+denormals-are-zero (DAZ) flag in the MXCSR register.
Now explain in adequate detail what the option does. IMO, the
"floating-point operations that might affect the set of floating point
status flags" correctly identifies affected operations, so an example,
as suggested below, is not necessary.
> I'd maybe give an example of a FP operation that's _not_ affected
> by the flag (copysign?).
Please note that I have renamed the option to "-mpartial-vector-math"
with a short target-specific description:
+partial-vector-math
+Target Var(ix86_partial_vec_math) Init(1)
+Enable floating-point status flags setting SSE vector operations on
partial vectors
which I think summarises the option (without the word "trapping"). The
same approach will be taken for Float16 operations, so the approach is
not specific to MMX vectors.
> Otherwise it looks OK to me.
Thanks, I have attached the RFC V2 patch; I plan to submit a formal
patch later today.
Uros.
[-- Attachment #2: pr110832-v2.diff.txt --]
[-- Type: text/plain, Size: 14654 bytes --]
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1cc8563477a..8d9a1ae93f3 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -632,6 +632,10 @@ Enum(prefer_vector_width) String(256) Value(PVW_AVX256)
EnumValue
Enum(prefer_vector_width) String(512) Value(PVW_AVX512)
+partial-vector-math
+Target Var(ix86_partial_vec_math) Init(1)
+Enable floating-point status flags setting SSE vector operations on partial vectors
+
mmove-max=
Target RejectNegative Joined Var(ix86_move_max) Enum(prefer_vector_width) Init(PVW_NONE) Save
Maximum number of bits that can be moved from memory to memory efficiently.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b49554e9b8f..95f7a0113e7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -595,7 +595,18 @@ (define_expand "movq_<mode>_to_sse"
(match_operand:V2FI_V4HF 1 "nonimmediate_operand")
(match_dup 2)))]
"TARGET_SSE2"
- "operands[2] = CONST0_RTX (<MODE>mode);")
+{
+ if (<MODE>mode == V2SFmode
+ && !flag_trapping_math)
+ {
+ rtx op1 = force_reg (<MODE>mode, operands[1]);
+ emit_move_insn (operands[0], lowpart_subreg (<mmxdoublevecmode>mode,
+ op1, <MODE>mode));
+ DONE;
+ }
+
+ operands[2] = CONST0_RTX (<MODE>mode);
+})
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
@@ -648,7 +659,7 @@ (define_expand "<insn>v2sf3"
(plusminusmult:V2SF
(match_operand:V2SF 1 "nonimmediate_operand")
(match_operand:V2SF 2 "nonimmediate_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op2 = gen_reg_rtx (V4SFmode);
rtx op1 = gen_reg_rtx (V4SFmode);
@@ -726,7 +737,7 @@ (define_expand "divv2sf3"
[(set (match_operand:V2SF 0 "register_operand")
(div:V2SF (match_operand:V2SF 1 "register_operand")
(match_operand:V2SF 2 "register_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op2 = gen_reg_rtx (V4SFmode);
rtx op1 = gen_reg_rtx (V4SFmode);
@@ -748,7 +759,7 @@ (define_expand "<code>v2sf3"
(smaxmin:V2SF
(match_operand:V2SF 1 "register_operand")
(match_operand:V2SF 2 "register_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op2 = gen_reg_rtx (V4SFmode);
rtx op1 = gen_reg_rtx (V4SFmode);
@@ -850,7 +861,7 @@ (define_insn "mmx_rcpit2v2sf3"
(define_expand "sqrtv2sf2"
[(set (match_operand:V2SF 0 "register_operand")
(sqrt:V2SF (match_operand:V2SF 1 "nonimmediate_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -931,7 +942,7 @@ (define_insn_and_split "*mmx_haddv2sf3_low"
(vec_select:SF
(match_dup 1)
(parallel [(match_operand:SI 3 "const_0_to_1_operand")]))))]
- "TARGET_SSE3 && TARGET_MMX_WITH_SSE
+ "TARGET_SSE3 && TARGET_MMX_WITH_SSE && ix86_partial_vec_math
&& INTVAL (operands[2]) != INTVAL (operands[3])
&& ix86_pre_reload_split ()"
"#"
@@ -977,7 +988,7 @@ (define_insn_and_split "*mmx_hsubv2sf3_low"
(vec_select:SF
(match_dup 1)
(parallel [(const_int 1)]))))]
- "TARGET_SSE3 && TARGET_MMX_WITH_SSE
+ "TARGET_SSE3 && TARGET_MMX_WITH_SSE && ix86_partial_vec_math
&& ix86_pre_reload_split ()"
"#"
"&& 1"
@@ -1039,7 +1050,7 @@ (define_expand "vec_addsubv2sf3"
(match_operand:V2SF 2 "nonimmediate_operand"))
(plus:V2SF (match_dup 1) (match_dup 2))
(const_int 1)))]
- "TARGET_SSE3 && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE3 && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op2 = gen_reg_rtx (V4SFmode);
rtx op1 = gen_reg_rtx (V4SFmode);
@@ -1102,7 +1113,7 @@ (define_expand "vec_cmpv2sfv2si"
(match_operator:V2SI 1 ""
[(match_operand:V2SF 2 "nonimmediate_operand")
(match_operand:V2SF 3 "nonimmediate_operand")]))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx ops[4];
ops[3] = gen_reg_rtx (V4SFmode);
@@ -1128,7 +1139,7 @@ (define_expand "vcond<mode>v2sf"
(match_operand:V2SF 5 "nonimmediate_operand")])
(match_operand:V2FI 1 "general_operand")
(match_operand:V2FI 2 "general_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx ops[6];
ops[5] = gen_reg_rtx (V4SFmode);
@@ -1318,7 +1329,7 @@ (define_expand "fmav2sf4"
(match_operand:V2SF 2 "nonimmediate_operand")
(match_operand:V2SF 3 "nonimmediate_operand")))]
"(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op3 = gen_reg_rtx (V4SFmode);
rtx op2 = gen_reg_rtx (V4SFmode);
@@ -1343,7 +1354,7 @@ (define_expand "fmsv2sf4"
(neg:V2SF
(match_operand:V2SF 3 "nonimmediate_operand"))))]
"(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op3 = gen_reg_rtx (V4SFmode);
rtx op2 = gen_reg_rtx (V4SFmode);
@@ -1368,7 +1379,7 @@ (define_expand "fnmav2sf4"
(match_operand:V2SF 2 "nonimmediate_operand")
(match_operand:V2SF 3 "nonimmediate_operand")))]
"(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op3 = gen_reg_rtx (V4SFmode);
rtx op2 = gen_reg_rtx (V4SFmode);
@@ -1394,7 +1405,7 @@ (define_expand "fnmsv2sf4"
(neg:V2SF
(match_operand:V2SF 3 "nonimmediate_operand"))))]
"(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op3 = gen_reg_rtx (V4SFmode);
rtx op2 = gen_reg_rtx (V4SFmode);
@@ -1420,7 +1431,7 @@ (define_expand "fnmsv2sf4"
(define_expand "fix_truncv2sfv2si2"
[(set (match_operand:V2SI 0 "register_operand")
(fix:V2SI (match_operand:V2SF 1 "nonimmediate_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SImode);
@@ -1436,7 +1447,7 @@ (define_expand "fix_truncv2sfv2si2"
(define_expand "fixuns_truncv2sfv2si2"
[(set (match_operand:V2SI 0 "register_operand")
(unsigned_fix:V2SI (match_operand:V2SF 1 "nonimmediate_operand")))]
- "TARGET_AVX512VL && TARGET_MMX_WITH_SSE"
+ "TARGET_AVX512VL && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SImode);
@@ -1461,7 +1472,7 @@ (define_insn "mmx_fix_truncv2sfv2si2"
(define_expand "floatv2siv2sf2"
[(set (match_operand:V2SF 0 "register_operand")
(float:V2SF (match_operand:V2SI 1 "nonimmediate_operand")))]
- "TARGET_MMX_WITH_SSE"
+ "TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SImode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1477,7 +1488,7 @@ (define_expand "floatv2siv2sf2"
(define_expand "floatunsv2siv2sf2"
[(set (match_operand:V2SF 0 "register_operand")
(unsigned_float:V2SF (match_operand:V2SI 1 "nonimmediate_operand")))]
- "TARGET_AVX512VL && TARGET_MMX_WITH_SSE"
+ "TARGET_AVX512VL && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SImode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1754,7 +1765,7 @@ (define_expand "vec_initv2sfsf"
(define_expand "nearbyintv2sf2"
[(match_operand:V2SF 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
- "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1770,7 +1781,7 @@ (define_expand "nearbyintv2sf2"
(define_expand "rintv2sf2"
[(match_operand:V2SF 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
- "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1786,8 +1797,8 @@ (define_expand "rintv2sf2"
(define_expand "lrintv2sfv2si2"
[(match_operand:V2SI 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
- "TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE4_1 && !flag_trapping_math
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SImode);
@@ -1804,7 +1815,7 @@ (define_expand "ceilv2sf2"
[(match_operand:V2SF 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
"TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1820,8 +1831,8 @@ (define_expand "ceilv2sf2"
(define_expand "lceilv2sfv2si2"
[(match_operand:V2SI 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
- "TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE4_1 && !flag_trapping_math
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SImode);
@@ -1838,7 +1849,7 @@ (define_expand "floorv2sf2"
[(match_operand:V2SF 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
"TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1854,8 +1865,8 @@ (define_expand "floorv2sf2"
(define_expand "lfloorv2sfv2si2"
[(match_operand:V2SI 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
- "TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE4_1 && !flag_trapping_math
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SImode);
@@ -1872,7 +1883,7 @@ (define_expand "btruncv2sf2"
[(match_operand:V2SF 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
"TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1889,7 +1900,7 @@ (define_expand "roundv2sf2"
[(match_operand:V2SF 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
"TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SFmode);
@@ -1905,8 +1916,8 @@ (define_expand "roundv2sf2"
(define_expand "lroundv2sfv2si2"
[(match_operand:V2SI 0 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")]
- "TARGET_SSE4_1 && !flag_trapping_math
- && TARGET_MMX_WITH_SSE"
+ "TARGET_SSE4_1 && !flag_trapping_math
+ && TARGET_MMX_WITH_SSE && ix86_partial_vec_math"
{
rtx op1 = gen_reg_rtx (V4SFmode);
rtx op0 = gen_reg_rtx (V4SImode);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 674f956f4b8..f5081c0cfb9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1419,6 +1419,7 @@ See RS/6000 and PowerPC Options.
-mcld -mcx16 -msahf -mmovbe -mcrc32 -mmwait
-mrecip -mrecip=@var{opt}
-mvzeroupper -mprefer-avx128 -mprefer-vector-width=@var{opt}
+-mpartial-vector-math
-mmove-max=@var{bits} -mstore-max=@var{bits}
-mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx
-mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -mavx512vl
@@ -33754,6 +33755,23 @@ This option instructs GCC to use 128-bit AVX instructions instead of
This option instructs GCC to use @var{opt}-bit vector width in instructions
instead of default on the selected platform.
+@opindex mpartial-vector-math
+@item -mpartial-vector-math
+This option enables GCC to generate floating-point operations that might
+affect the set of floating point status flags on partial vectors, where
+vector elements reside in the low part of the 128-bit SSE register. Unless
+@option{-fno-trapping-math} is specified, the compiler guarantees correct
+behavior by sanitizing all input operands to have zeroes in the unused
+upper part of the vector register. Note that by using built-in functions
+or inline assembly with partial vector arguments, NaNs, denormal or invalid
+values can leak into the upper part of the vector, causing possible
+performance issues when @option{-fno-trapping-math} is in effect. These
+issues can be mitigated by manually sanitizing the upper part of the partial
+vector argument register or by using @option{-mdaz-ftz} to set
+denormals-are-zero (DAZ) flag in the MXCSR register.
+
+This option is enabled by default.
+
@opindex mmove-max
@item -mmove-max=@var{bits}
This option instructs GCC to set the maximum number of bits can be
diff --git a/gcc/testsuite/gcc.target/i386/pr110832-1.c b/gcc/testsuite/gcc.target/i386/pr110832-1.c
new file mode 100644
index 00000000000..3df22e3b5a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110832-1.c
@@ -0,0 +1,12 @@
+/* PR target/110832 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse2 -mno-partial-vector-math" } */
+
+typedef float __attribute__((vector_size(8))) v2sf;
+
+v2sf test (v2sf a, v2sf b)
+{
+ return a + b;
+}
+
+/* { dg-final { scan-assembler-not "addps" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr110832-2.c b/gcc/testsuite/gcc.target/i386/pr110832-2.c
new file mode 100644
index 00000000000..4d16488b4fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110832-2.c
@@ -0,0 +1,13 @@
+/* PR target/110832 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -ftrapping-math -msse2 -mpartial-vector-math -dp" } */
+
+typedef float __attribute__((vector_size(8))) v2sf;
+
+v2sf test (v2sf a, v2sf b)
+{
+ return a + b;
+}
+
+/* { dg-final { scan-assembler "addps" } } */
+/* { dg-final { scan-assembler-times "\\*vec_concatv4sf_0" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr110832-3.c b/gcc/testsuite/gcc.target/i386/pr110832-3.c
new file mode 100644
index 00000000000..02cb4fc8100
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110832-3.c
@@ -0,0 +1,13 @@
+/* PR target/110832 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -fno-trapping-math -msse2 -mpartial-vector-math -dp" } */
+
+typedef float __attribute__((vector_size(8))) v2sf;
+
+v2sf test (v2sf a, v2sf b)
+{
+ return a + b;
+}
+
+/* { dg-final { scan-assembler "addps" } } */
+/* { dg-final { scan-assembler-not "\\*vec_concatv4sf_0" } } */
next prev parent reply other threads:[~2023-08-08 9:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-30 20:12 Uros Bizjak
2023-07-31 9:40 ` Richard Biener
2023-07-31 10:13 ` Uros Bizjak
2023-08-07 15:59 ` Uros Bizjak
2023-08-08 8:07 ` Richard Biener
2023-08-08 9:06 ` Uros Bizjak [this message]
2023-08-08 10:08 ` Richard Biener
2023-08-08 11:03 ` Uros Bizjak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFULd4YhGRqs9ByoQQjXwEB+ndi9VHmkv=1RUu2GBUskT8c2GQ@mail.gmail.com' \
--to=ubizjak@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongtao.liu@intel.com \
--cc=hubicka@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).