From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 33EC43857C67 for ; Tue, 22 Feb 2022 09:51:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 33EC43857C67 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id E33DA1F39A; Tue, 22 Feb 2022 09:51:45 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id BD1AFA3B8D; Tue, 22 Feb 2022 09:51:45 +0000 (UTC) Date: Tue, 22 Feb 2022 10:51:45 +0100 (CET) From: Richard Biener To: Hongtao Liu cc: GCC Patches , Richard Sandiford , "Liu, Hongtao" Subject: Re: [PATCH 3/3] target/99881 - x86 vector cost of CTOR from integer regs In-Reply-To: Message-ID: <6n58qs6s-ono5-42s-o0q-s7ss22p8866s@fhfr.qr> References: <20220218140102.13F4E13C91@imap2.suse-dmz.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2022 09:51:50 -0000 On Tue, 22 Feb 2022, Richard Biener wrote: > On Tue, 22 Feb 2022, Hongtao Liu wrote: > > > On Mon, Feb 21, 2022 at 5:10 PM Richard Biener wrote: > > > > > > On Mon, 21 Feb 2022, Hongtao Liu wrote: > > > > > > > On Fri, Feb 18, 2022 at 10:01 PM Richard Biener via Gcc-patches > > > > wrote: > > > > > > > > > > This uses the now passed SLP node to the vectorizer costing hook > > > > > to adjust vector construction costs for the cost of moving an > > > > > integer component from a GPR to a vector register when that's > > > > > required for building a vector from components. A cruical difference > > > > > here is whether the component is loaded from memory or extracted > > > > > from a vector register as in those cases no intermediate GPR is involved. > > > > > > > > > > The pr99881.c testcase can be Un-XFAILed with this patch, the > > > > > pr91446.c testcase now produces scalar code which looks superior > > > > > to me so I've adjusted it as well. > > > > > > > > > > I'm currently re-bootstrapping and testing on x86_64-unknown-linux-gnu > > > > > after adding the BIT_FIELD_REF vector extracting special casing. > > > > Does the patch handle PR101929? > > > > > > The patch will regress the testcase posted in PR101929 again: > > > > > > _255 1 times scalar_store costs 12 in body > > > _261 1 times scalar_store costs 12 in body > > > _258 1 times scalar_store costs 12 in body > > > _264 1 times scalar_store costs 12 in body > > > t0_247 + t2_251 1 times scalar_stmt costs 4 in body > > > t1_472 + t3_444 1 times scalar_stmt costs 4 in body > > > t0_406 - t2_451 1 times scalar_stmt costs 4 in body > > > t1_472 - t3_444 1 times scalar_stmt costs 4 in body > > > -node 0x4182f48 1 times vec_construct costs 16 in prologue > > > -node 0x41882b0 1 times vec_construct costs 16 in prologue > > > +node 0x4182f48 1 times vec_construct costs 28 in prologue > > > +node 0x41882b0 1 times vec_construct costs 28 in prologue > > > t0_406 + t2_451 1 times vector_stmt costs 4 in body > > > t1_472 - t3_444 1 times vector_stmt costs 4 in body > > > node 0x41829f8 1 times vec_perm costs 4 in body > > > _436 1 times vector_store costs 16 in body > > > t.c:37:9: note: Cost model analysis for part in loop 0: > > > - Vector cost: 60 > > > + Vector cost: 84 > > > Scalar cost: 64 > > > +t.c:37:9: missed: not vectorized: vectorization is not profitable. > > > > > > We're constructing V4SI from patterns like { _407, _480, _407, _480 } > > > where the components are results of integer adds (so the result is > > > definitely in a GPR). We are costing the construction as > > > 4 * sse_op + 2 * sse_to_integer which with skylake cost is > > > 4 * COSTS_N_INSNS (1) + 2 * 6. > > > > > > Whether the vectorization itself is profitable is likely questionable > > > but then it's true that the construction of V4SI is more costly > > > in terms of uops than a construction of V4SF. > > > > > > Now, we can - for the first time - now see the actual construction > > > pattern and ideal construction might be two GPR->xmm moves > > > + two splats + one unpack or maybe two GPR->xmm moves + one > > > unpack + splat of DI (or other means of duplicating the lowpart). > > Yes, the patch is technically right. I'm ok with the patch. > > Thanks, I've pushed it now. I've also tested the suggested adjustment > doing > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..acf2cc977b4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > case vec_construct: > { > /* N element inserts into SSE vectors. */ > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > ix86_cost->sse_op; > /* One vinserti128 for combining two SSE vectors for AVX256. */ > if (GET_MODE_BITSIZE (mode) == 256) > cost += ix86_vec_cost (mode, ix86_cost->addss); > > successfully (with no effect on the PR101929 case as expected), I > will queue that for stage1 since it isn't known to fix any > regression (but I will keep it as option in case something pops up). > > I'll also have a more detailed look into the x264_r case to see > if there's something we can do about the regression that will now > show up (and I'll watch autotesters). I found a way to get back the vectorization doing diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9188d727e33..7f1f12fb6c6 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2374,7 +2375,7 @@ fail: n_vector_builds++; } } - if (all_uniform_p + if ((all_uniform_p && !two_operators) || n_vector_builds > 1 || (n_vector_builds == children.length () && is_a (stmt_info->stmt))) which in itself is reasonable since the result of the operation is a non-uniform vector. That causes us to do _37 = {a3_245_6(D), a3_245_6(D), a3_245_6(D), a3_245_6(D)}; _36 = {a2_232_5(D), a2_232_5(D), a2_232_5(D), a2_232_5(D)}; vect__252_8.7_39 = _36 - _37; vect__250_7.6_38 = _36 + _37; _40 = VEC_PERM_EXPR ; and then vector lowering does what vect_optimize_slp should do as well (but doesn't): _26 = a2_232_5(D) - a3_245_6(D); vect__252_8.7_39 = {_26, _26, _26, _26}; _25 = a2_232_5(D) + a3_245_6(D); vect__250_7.6_38 = {_25, _25, _25, _25}; _40 = VEC_PERM_EXPR ; and later stmt folding will get us back the non-uniform vector construction by folding the VEC_PERM_EXPR where we simply assume that we can optimally expand the constructor. But the important part is that this way we evade vec_construct costing and instead get node 0x40e11a8 1 times scalar_to_vec costs 4 in prologue where of course it's obvious that we miss to account for the GPR -> XMM penalty here in the backend. When we are not fixing that we get the kernel for x264_r optimized like before. A self-contained testcase for this opportunity is void foo (unsigned * __restrict tmp, unsigned a0_206, unsigned a1_219, unsigned a2_232, unsigned a3_245) { unsigned _246 = a0_206 + a1_219; unsigned _248 = a0_206 - a1_219; unsigned _250 = a2_232 + a3_245; unsigned _252 = a2_232 - a3_245; int t0_247 = (int) _246; int t1_249 = (int) _248; int t2_251 = (int) _250; int t3_253 = (int) _252; int _254 = t0_247 + t2_251; int _260 = t1_249 + t3_253; int _257 = t0_247 - t2_251; int _263 = t1_249 - t3_253; unsigned _255 = (unsigned int) _254; unsigned _261 = (unsigned int) _260; unsigned _258 = (unsigned int) _257; unsigned _264 = (unsigned int) _263; tmp[0] = _255; tmp[1] = _261; tmp[2] = _258; tmp[3] = _264; } which does show that the vectorization is not profitable since we get 0000000000000000 : 0: 89 c8 mov %ecx,%eax 2: 44 01 c1 add %r8d,%ecx 5: 44 29 c0 sub %r8d,%eax 8: c5 f9 6e d9 vmovd %ecx,%xmm3 c: c4 e3 61 22 c8 01 vpinsrd $0x1,%eax,%xmm3,%xmm1 12: 89 f0 mov %esi,%eax 14: 01 d6 add %edx,%esi 16: 29 d0 sub %edx,%eax 18: c5 f9 6e e6 vmovd %esi,%xmm4 1c: c4 e3 59 22 c0 01 vpinsrd $0x1,%eax,%xmm4,%xmm0 22: c5 f1 6c c9 vpunpcklqdq %xmm1,%xmm1,%xmm1 26: c5 f9 6c c0 vpunpcklqdq %xmm0,%xmm0,%xmm0 2a: c5 f9 fe d1 vpaddd %xmm1,%xmm0,%xmm2 2e: c5 f9 fa c1 vpsubd %xmm1,%xmm0,%xmm0 32: c5 e8 c6 c0 e4 vshufps $0xe4,%xmm0,%xmm2,%xmm0 37: c5 fa 7f 07 vmovdqu %xmm0,(%rdi) 3b: c3 ret vs the scalar code 0000000000000000 : 0: 48 89 f8 mov %rdi,%rax 3: 8d 3c 16 lea (%rsi,%rdx,1),%edi 6: 29 d6 sub %edx,%esi 8: 42 8d 14 01 lea (%rcx,%r8,1),%edx c: 44 29 c1 sub %r8d,%ecx f: 44 8d 04 17 lea (%rdi,%rdx,1),%r8d 13: 44 89 00 mov %r8d,(%rax) 16: 29 d7 sub %edx,%edi 18: 44 8d 04 0e lea (%rsi,%rcx,1),%r8d 1c: 29 ce sub %ecx,%esi 1e: 44 89 40 04 mov %r8d,0x4(%rax) 22: 89 78 08 mov %edi,0x8(%rax) 25: 89 70 0c mov %esi,0xc(%rax) 28: c3 ret but we could choose to ignore this for the sake of avoiding the x264 regression (shall it appear which I think it will) and try to address this in GCC 12. Richard. > Thanks, > Richard. > > > > > It still wouldn't help here of course, and we would not have RTL > > > expansion adhere to this scheme. > > > > > > Trying to capture secondary effects (the FRE opportunity unleashed) > > > in costing of this particular SLP subgraph is difficult and probably > > > undesirable though. Adjusting any of the target specific costs > > > is likely not going to work because of the original narrow profitability > > > gap (60 vs. 64). > > > > > > For the particular kernel the overall vectorization strathegy needs > > > to improve (PR98138 is about this I think). I know we've reverted > > > the previous change that attempted to cost for GPR -> XMM. It did > > > > > > case vec_construct: > > > { > > > /* N element inserts into SSE vectors. */ > > > + int cost > > > + = TYPE_VECTOR_SUBPARTS (vectype) * (fp ? > > > + ix86_cost->sse_op > > > + : > > > ix86_cost->integer_to_sse); > > > + > > > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > > > > > > thus treated integer_to_sse as GPR insert into vector cost for > > > integer components in general while the now proposed change > > > _adds_ integer to XMM move costs (but only once for duplicate > > > elements and not for the cases we know are from memory / vector regs). > > > With the proposed patch we can probably be more "correct" above > > > and only cost TYPE_VECTOR_SUBPARTS (vectype) - 1 inserts given > > > for FP the first one is free and for int we're costing it separately. > > > Again it won't help here. > > > > > > Thanks, > > > Richard. > > > > > > > > > > > > > I suppose we can let autotesters look for SPEC performance fallout. > > > > > > > > > > OK if testing succeeds? > > > > > > > > > > Thanks, > > > > > Richard. > > > > > > > > > > 2022-02-18 Richard Biener > > > > > > > > > > PR tree-optimization/104582 > > > > > PR target/99881 > > > > > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): > > > > > Cost GPR to vector register moves for integer vector construction. > > > > > > > > > > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New. > > > > > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise. > > > > > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise. > > > > > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise. > > > > > * gcc.target/i386/pr99881.c: Un-XFAIL. > > > > > * gcc.target/i386/pr91446.c: Adjust to not expect vectorization. > > > > > --- > > > > > gcc/config/i386/i386.cc | 45 ++++++++++++++++++- > > > > > .../costmodel/x86_64/costmodel-pr104582-1.c | 15 +++++++ > > > > > .../costmodel/x86_64/costmodel-pr104582-2.c | 13 ++++++ > > > > > .../costmodel/x86_64/costmodel-pr104582-3.c | 13 ++++++ > > > > > .../costmodel/x86_64/costmodel-pr104582-4.c | 15 +++++++ > > > > > gcc/testsuite/gcc.target/i386/pr91446.c | 2 +- > > > > > gcc/testsuite/gcc.target/i386/pr99881.c | 2 +- > > > > > 7 files changed, 102 insertions(+), 3 deletions(-) > > > > > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c > > > > > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c > > > > > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c > > > > > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c > > > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > > index 0830dbd7dca..b2bf90576d5 100644 > > > > > --- a/gcc/config/i386/i386.cc > > > > > +++ b/gcc/config/i386/i386.cc > > > > > @@ -22997,7 +22997,7 @@ ix86_vectorize_create_costs (vec_info *vinfo, bool costing_for_scalar) > > > > > > > > > > unsigned > > > > > ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, > > > > > - stmt_vec_info stmt_info, slp_tree, > > > > > + stmt_vec_info stmt_info, slp_tree node, > > > > > tree vectype, int misalign, > > > > > vect_cost_model_location where) > > > > > { > > > > > @@ -23160,6 +23160,49 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, > > > > > stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); > > > > > stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1); > > > > > } > > > > > + else if (kind == vec_construct > > > > > + && node > > > > > + && SLP_TREE_DEF_TYPE (node) == vect_external_def > > > > > + && INTEGRAL_TYPE_P (TREE_TYPE (vectype))) > > > > > + { > > > > > + stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); > > > > > + unsigned i; > > > > > + tree op; > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) > > > > > + if (TREE_CODE (op) == SSA_NAME) > > > > > + TREE_VISITED (op) = 0; > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) > > > > > + { > > > > > + if (TREE_CODE (op) != SSA_NAME > > > > > + || TREE_VISITED (op)) > > > > > + continue; > > > > > + TREE_VISITED (op) = 1; > > > > > + gimple *def = SSA_NAME_DEF_STMT (op); > > > > > + tree tem; > > > > > + if (is_gimple_assign (def) > > > > > + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)) > > > > > + && ((tem = gimple_assign_rhs1 (def)), true) > > > > > + && TREE_CODE (tem) == SSA_NAME > > > > > + /* A sign-change expands to nothing. */ > > > > > + && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (def)), > > > > > + TREE_TYPE (tem))) > > > > > + def = SSA_NAME_DEF_STMT (tem); > > > > > + /* When the component is loaded from memory we can directly > > > > > + move it to a vector register, otherwise we have to go > > > > > + via a GPR or via vpinsr which involves similar cost. > > > > > + Likewise with a BIT_FIELD_REF extracting from a vector > > > > > + register we can hope to avoid using a GPR. */ > > > > > + if (!is_gimple_assign (def) > > > > > + || (!gimple_assign_load_p (def) > > > > > + && (gimple_assign_rhs_code (def) != BIT_FIELD_REF > > > > > + || !VECTOR_TYPE_P (TREE_TYPE > > > > > + (TREE_OPERAND (gimple_assign_rhs1 (def), 0)))))) > > > > > + stmt_cost += ix86_cost->sse_to_integer; > > > > > + } > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) > > > > > + if (TREE_CODE (op) == SSA_NAME) > > > > > + TREE_VISITED (op) = 0; > > > > > + } > > > > > if (stmt_cost == -1) > > > > > stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c > > > > > new file mode 100644 > > > > > index 00000000000..992a845ad7a > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c > > > > > @@ -0,0 +1,15 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-additional-options "-msse -fdump-tree-slp2-details" } */ > > > > > + > > > > > +struct S { unsigned long a, b; } s; > > > > > + > > > > > +void > > > > > +foo (unsigned long *a, unsigned long *b) > > > > > +{ > > > > > + unsigned long a_ = *a; > > > > > + unsigned long b_ = *b; > > > > > + s.a = a_; > > > > > + s.b = b_; > > > > > +} > > > > > + > > > > > +/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } */ > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c > > > > > new file mode 100644 > > > > > index 00000000000..7637cdb4a97 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c > > > > > @@ -0,0 +1,13 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-additional-options "-msse -fdump-tree-slp2-details" } */ > > > > > + > > > > > +struct S { unsigned long a, b; } s; > > > > > + > > > > > +void > > > > > +foo (unsigned long a, unsigned long b) > > > > > +{ > > > > > + s.a = a; > > > > > + s.b = b; > > > > > +} > > > > > + > > > > > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } */ > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c > > > > > new file mode 100644 > > > > > index 00000000000..999c4905708 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c > > > > > @@ -0,0 +1,13 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-additional-options "-msse -fdump-tree-slp2-details" } */ > > > > > + > > > > > +struct S { double a, b; } s; > > > > > + > > > > > +void > > > > > +foo (double a, double b) > > > > > +{ > > > > > + s.a = a; > > > > > + s.b = b; > > > > > +} > > > > > + > > > > > +/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } */ > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c > > > > > new file mode 100644 > > > > > index 00000000000..cc471e1ed73 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c > > > > > @@ -0,0 +1,15 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-additional-options "-msse -fdump-tree-slp2-details" } */ > > > > > + > > > > > +struct S { unsigned long a, b; } s; > > > > > + > > > > > +void > > > > > +foo (signed long *a, unsigned long *b) > > > > > +{ > > > > > + unsigned long a_ = *a; > > > > > + unsigned long b_ = *b; > > > > > + s.a = a_; > > > > > + s.b = b_; > > > > > +} > > > > > + > > > > > +/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } */ > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr91446.c b/gcc/testsuite/gcc.target/i386/pr91446.c > > > > > index 0243ca3ea68..067bf43f698 100644 > > > > > --- a/gcc/testsuite/gcc.target/i386/pr91446.c > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr91446.c > > > > > @@ -21,4 +21,4 @@ foo (unsigned long long width, unsigned long long height, > > > > > bar (&t); > > > > > } > > > > > > > > > > -/* { dg-final { scan-assembler-times "vmovdqa\[^\n\r\]*xmm\[0-9\]" 2 } } */ > > > > > +/* { dg-final { scan-assembler-times "xmm\[0-9\]" 0 } } */ > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr99881.c b/gcc/testsuite/gcc.target/i386/pr99881.c > > > > > index 3e087eb2ed7..a1ec1d1ba8a 100644 > > > > > --- a/gcc/testsuite/gcc.target/i386/pr99881.c > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr99881.c > > > > > @@ -1,7 +1,7 @@ > > > > > /* PR target/99881. */ > > > > > /* { dg-do compile { target { ! ia32 } } } */ > > > > > /* { dg-options "-Ofast -march=skylake" } */ > > > > > -/* { dg-final { scan-assembler-not "xmm\[0-9\]" { xfail *-*-* } } } */ > > > > > +/* { dg-final { scan-assembler-not "xmm\[0-9\]" } } */ > > > > > > > > > > void > > > > > foo (int* __restrict a, int n, int c) > > > > > -- > > > > > 2.34.1 > > > > > > > > > > > > > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg) > > > > > > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)