From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by sourceware.org (Postfix) with ESMTPS id 7EECC3858C98 for ; Mon, 4 Dec 2023 14:10:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7EECC3858C98 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7EECC3858C98 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701699062; cv=none; b=Z2wR2BlrvcXSnxyBbItBmoPKtiCqPyZAuypuvtmmNx7GYv9rFsaSeucL2riUKjnP/ypmmVBUwoPILHhXcR1hNJr958KKqvvEhgERV7PB2EyDZPzO387byDD+uvverrMQB22NyQyScrTmPhuHO81lpsAhw/rxo4mj05m+yS0GNJI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701699062; c=relaxed/simple; bh=dZWv4mhnnxA7Sc6eRHPvuld9MEjrlduPZyZZM9NLHSM=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=EEO+UuwC2NYYTXaMk/MVrxbznlufc5CSIFWrBUcwz2Meo/1iEr9eLdJeuBJWSd/CUMJDae9iG0/eC3J4wFpeAjhpouH2dvTi2SmaWH13S/aeQJACdcJX52lP3QaK8YQsXIH++CXXFONe7ZRykHG83CDKbaL4XxL+4eD5OP+YxpU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-50bf4f97752so1416058e87.1 for ; Mon, 04 Dec 2023 06:10:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701699058; x=1702303858; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hndS2WMiCR9rZKzvucLmmyJMfN0OtjXM+99cRypn+iA=; b=lcMMpugFNHKvoA8lkkjrQkT/XBQaUMFqHrWZqtWfImh+anM9ytzuai/sQXZtsO4c4D UXoMzkiLABUdSGrZXqG4klf/TMl2oxCmRW/z2Gz+YqR2tyFlSZQvx+QaPVZrrJRBjmUp E0/O0Gd1oVhfpyI3ohlZn6+/FahZGRHuu+5fmV5ZvgSl01oTnRz+GOLxDE7YBG02Fe+d i13fqOBDlptRdtGbpyJ7ROQYfL3REj+sT+dtiXf1XQgTMD4hlMhjuY7OP+sHRIh1fXPl nSPNEJcBZqobSzNy9Z3F724Oqn5IPJB8MI5xuT9Mq7EaXggL/EJPuQsjeIT1w8QabFKK wZ0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701699058; x=1702303858; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hndS2WMiCR9rZKzvucLmmyJMfN0OtjXM+99cRypn+iA=; b=J7zKxvfXLINL7BYUta3Ah7mgPXq/duo8LN98ZwGCLFnfJhXmqJrVTdb8rTqHrcj21y ndMXGbWwHgZqfJAq4H0meCEacU+/jV/SLSsm/3SehIUKZb7H93LKB/PA+/kCifha5jr0 EYaGD/X3CkY5TNz79wzyKZNYto8LU13/aOYxLeiE/BfosTl+z6PG4gF9PQFcjJMjKO0l DLtL5SfxlqZtZqvpI8RvyrjaGFfBd8guCX9i9lpEgMiCKNEggMxZQ4ZeHo1gupzBsOMM Xuy2m8hP7xUxOE25ZGCNb7bT7jWBzDMb17rRS4vPivkbEraTYB/NxGSK9d8XQc+GmeOF SJYg== X-Gm-Message-State: AOJu0YzFIf7cXso3vX4weqZgvTXhdNRK2sf21luGijVs8WkanPVMdYm4 U+sY4v53raDjFTftA9NHl6CDPTajuo6AoMGrIGQ= X-Google-Smtp-Source: AGHT+IGS610Cok7YvXvSsTnoRaQpaP3hbMwyrP+wTux5TdTqxrXSKBioDa2Myjs/8YMH0j4npqZKWdEuEXDQ7MPyXew= X-Received: by 2002:a05:6512:3d08:b0:50b:fe99:b477 with SMTP id d8-20020a0565123d0800b0050bfe99b477mr325948lfv.59.1701699057283; Mon, 04 Dec 2023 06:10:57 -0800 (PST) MIME-Version: 1.0 References: <20231204053208.908533-1-hongtao.liu@intel.com> In-Reply-To: <20231204053208.908533-1-hongtao.liu@intel.com> From: Richard Biener Date: Mon, 4 Dec 2023 15:10:44 +0100 Message-ID: Subject: Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct and stores To: liuhongt Cc: gcc-patches@gcc.gnu.org, crazylht@gmail.com, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Dec 4, 2023 at 6:32=E2=80=AFAM liuhongt wro= te: > > .i.e. for below cases. > a[0] =3D b1; > a[1] =3D b2; > .. > a[n] =3D bn; > > There're extra dependences when contructing the vector, but not for > scalar store. According to experiments, it's generally worse. > > The patch adds an cut-off heuristic when vec_stmt is just > vec_construct and vector store. It improves SPEC2017 a little bit. > > BenchMarks Ratio > 500.perlbench_r 2.60% > 502.gcc_r 0.30% > 505.mcf_r 0.40% > 520.omnetpp_r -1.00% > 523.xalancbmk_r 0.90% > 525.x264_r 0.00% > 531.deepsjeng_r 0.30% > 541.leela_r 0.90% > 548.exchange2_r 3.20% > 557.xz_r 1.40% > 503.bwaves_r 0.00% > 507.cactuBSSN_r 0.00% > 508.namd_r 0.30% > 510.parest_r 0.00% > 511.povray_r 0.20% > 519.lbm_r SAME BIN > 521.wrf_r -0.30% > 526.blender_r -1.20% > 527.cam4_r -0.20% > 538.imagick_r 4.00% > 544.nab_r 0.40% > 549.fotonik3d_r 0.00% > 554.roms_r 0.00% > Geomean-int 0.90% > Geomean-fp 0.30% > Geomean-all 0.50% > > And > Regressed testcases: > > gcc.target/i386/part-vect-absneghf.c > gcc.target/i386/part-vect-copysignhf.c > gcc.target/i386/part-vect-xorsignhf.c > > Regressed under -m32 since it generates 2 vector > .ABS/NEG/XORSIGN/COPYSIGN vs original 1 64-bit vec_construct. The > original testcases are used to test vectorization capability for > .ABS/NEG/XORG/COPYSIGN, so just restrict testcase to TARGET_64BIT. > > gcc.target/i386/pr111023-2.c > gcc.target/i386/pr111023.c > Regressed under -m32 > > testcase as below > > void > v8hi_v8qi (v8hi *dst, v16qi src) > { > short tem[8]; > tem[0] =3D src[0]; > tem[1] =3D src[1]; > tem[2] =3D src[2]; > tem[3] =3D src[3]; > tem[4] =3D src[4]; > tem[5] =3D src[5]; > tem[6] =3D src[6]; > tem[7] =3D src[7]; > dst[0] =3D *(v8hi *) tem; > } > > under 64-bit target, vectorizer realize it's just permutation of > original src vector, but under -m32, vectorizer relies on > vec_construct for vectorization. I think optimziation for this case > under 32-bit target maynot impact much, so just add > -fno-vect-cost-model. > > gcc.target/i386/pr91446.c: This testcase is guard for cost model of > vector store, not vectorization capability, so just adjust testcase. > > gcc.target/i386/pr108938-3.c: This testcase relies on vec_construct to > optimize for bswap, like other optimziation vectorizer can't realize > optimization after it. So the current solution is add > -fno-vect-cost-model to the testcase. > > costmodel-pr104582-1.c > costmodel-pr104582-2.c > costmodel-pr104582-4.c > > Failed since it's now not vectorized, looked at the PR, it's exactly > what's wanted, so adjust testcase to scan-tree-dump-not. > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? So the original motivation to not more aggressively prune store-from-CTOR vectorization in the vectorizer itself is that the vector store is possibly better for STLF (larger stores are good, larger loads eventually problematic). I'd also expect the costs to play out to not make those profitable. OTOH, if you have a series of 'double' stores you can convert to a series of V2DF stores you _may_ be faster if this reduces pressure on the store unit. Esp. V2DF is cheap to construct with one movhpd. So I don't think we want to try to pattern match it this way? In fact the SLP vectorization cases could all arrive with an SLP node specified (vectorizable_store would have to be changed here), which means you could check for an vect_external_def child instead? But as said, I would hope that we can arrive at a better way assessing the CONSTRUCTOR cost. IMHO one big issue is that load and store cost are comparatively high compared to simple stmt ops so it's very hard to offset saving many stores with "ops". That's because we generally think of 'cost' to model latency but as you say stores don't really have latency - we only have store bandwidth of the store unit and of course issue width (but that's true for other ops as well). I wonder what happens if we set both scalar and vector store cost to zero? Or maybe one (to count one issue slot)? Richard. > gcc/ChangeLog: > > PR target/99881 > PR target/104582 > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): > Check if kind is vec_construct or vector store. > (ix86_vector_costs::finish_cost): Don't do vectorization when > vector stmts are only vec_construct and stores. > (ix86_vector_costs::ix86_vect_construct_store_only_p): New > function. > (ix86_vector_costs::ix86_vect_cut_off): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/part-vect-absneghf.c: Restrict testcase to > TARGET_64BIT. > * gcc.target/i386/part-vect-copysignhf.c: Ditto. > * gcc.target/i386/part-vect-xorsignhf.c: Ditto. > * gcc.target/i386/pr91446.c: Adjust testcase. > * gcc.target/i386/pr108938-3.c: Add -fno-vect-cost-model. > * gcc.target/i386/pr111023-2.c: Ditto. > * gcc.target/i386/pr111023.c: Ditto. > * gcc.target/i386/pr99881.c: Remove xfail. > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: Changed > to Scan-tree-dump-not. > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Ditto. > * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Ditto. > --- > gcc/config/i386/i386.cc | 81 ++++++++++++++++++- > .../costmodel/x86_64/costmodel-pr104582-1.c | 2 +- > .../costmodel/x86_64/costmodel-pr104582-3.c | 2 +- > .../costmodel/x86_64/costmodel-pr104582-4.c | 2 +- > .../gcc.target/i386/part-vect-absneghf.c | 4 +- > .../gcc.target/i386/part-vect-copysignhf.c | 4 +- > .../gcc.target/i386/part-vect-xorsignhf.c | 4 +- > gcc/testsuite/gcc.target/i386/pr108938-3.c | 2 +- > gcc/testsuite/gcc.target/i386/pr111023-2.c | 2 +- > gcc/testsuite/gcc.target/i386/pr111023.c | 2 +- > gcc/testsuite/gcc.target/i386/pr91446.c | 14 ++-- > gcc/testsuite/gcc.target/i386/pr99881.c | 2 +- > 12 files changed, 99 insertions(+), 22 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index dcaea6c2096..a4b23e29eba 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -24573,6 +24573,10 @@ public: > > private: > > + /* Don't do vectorization for certain patterns. */ > + void ix86_vect_cut_off (); > + > + bool ix86_vect_construct_store_only_p (vect_cost_for_stmt, stmt_vec_in= fo); > /* Estimate register pressure of the vectorized code. */ > void ix86_vect_estimate_reg_pressure (); > /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used f= or > @@ -24581,12 +24585,15 @@ private: > where we know it's not loaded from memory. */ > unsigned m_num_gpr_needed[3]; > unsigned m_num_sse_needed[3]; > + > + bool m_vec_construct_store_only; > }; > > ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool costing_for_= scalar) > : vector_costs (vinfo, costing_for_scalar), > m_num_gpr_needed (), > - m_num_sse_needed () > + m_num_sse_needed (), > + m_vec_construct_store_only (true) > { > } > > @@ -24609,6 +24616,10 @@ ix86_vector_costs::add_stmt_cost (int count, vec= t_cost_for_stmt kind, > =3D (kind =3D=3D scalar_stmt || kind =3D=3D scalar_load || kind =3D= =3D scalar_store); > int stmt_cost =3D - 1; > > + if (m_vec_construct_store_only > + && !ix86_vect_construct_store_only_p (kind, stmt_info)) > + m_vec_construct_store_only =3D false; > + > bool fp =3D false; > machine_mode mode =3D scalar_p ? SImode : TImode; > > @@ -24865,8 +24876,45 @@ ix86_vector_costs::ix86_vect_estimate_reg_pressu= re () > } > } > > +/* Return true if KIND and STMT_INFO is either vec_construct or vector > + store, stmt_info could be promote/demote between vec_construct and > + vector store, also count that. */ > +bool > +ix86_vector_costs::ix86_vect_construct_store_only_p (vect_cost_for_stmt = kind, > + stmt_vec_info stmt_in= fo) > +{ > + switch (kind) > + { > + case vec_construct: > + case vector_store: > + case unaligned_store: > + case vector_scatter_store: > + case vec_promote_demote: > + return true; > + > + /* Vectorizer will try VEC_PACK_TRUNK_EXPR for things likes > + char* a; > + short b1, b2, b3, b4; > + a[0] =3D b1; > + a[1] =3D b2; > + a[2] =3D b3; > + a[3] =3D b4; > + Also don't vectorized it. */ > + case vector_stmt: > + if (stmt_info && stmt_info->stmt > + && gimple_code (stmt_info->stmt) =3D=3D GIMPLE_ASSIGN > + && gimple_assign_rhs_code (stmt_info->stmt) =3D=3D NOP_EXPR) > + return true; > + > + default: > + break; > + } > + > + return false; > +} > + > void > -ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) > +ix86_vector_costs::ix86_vect_cut_off () > { > loop_vec_info loop_vinfo =3D dyn_cast (m_vinfo); > if (loop_vinfo && !m_costing_for_scalar) > @@ -24885,10 +24933,39 @@ ix86_vector_costs::finish_cost (const vector_co= sts *scalar_costs) > && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant= ()) > > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo)))) > m_costs[vect_body] =3D INT_MAX; > + return; > + } > + > + /* Don't do vectorization for things like > + > + a[0] =3D b1; > + a[1] =3D b2; > + .. > + a[n] =3D bn; > + > + There're extra dependences when contructing the vector, but not for > + scalar store. According to experiments, it's generally worse. */ > + if (m_vec_construct_store_only) > + { > + m_costs[0] =3D m_costs[1] =3D m_costs[2] =3D INT_MAX; > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "Skip vectorization for stmts which only contain= s" > + " vec_construct and vector store.\n"); > + return; > } > > + return; > +} > + > +void > +ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) > +{ > + > ix86_vect_estimate_reg_pressure (); > > + ix86_vect_cut_off (); > + > vector_costs::finish_cost (scalar_costs); > } > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr10458= 2-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c > index 992a845ad7a..f940af70b72 100644 > --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c > @@ -12,4 +12,4 @@ foo (unsigned long *a, unsigned long *b) > s.b =3D b_; > } > > -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } = */ > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" = } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr10458= 2-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c > index 999c4905708..eff60b2a82a 100644 > --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c > @@ -10,4 +10,4 @@ foo (double a, double b) > s.b =3D b; > } > > -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } = */ > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" = } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr10458= 2-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c > index cc471e1ed73..2f354338e96 100644 > --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c > @@ -12,4 +12,4 @@ foo (signed long *a, unsigned long *b) > s.b =3D b_; > } > > -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } = */ > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" = } } */ > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c b/gcc/tes= tsuite/gcc.target/i386/part-vect-absneghf.c > index 48aed14d604..4052210ec39 100644 > --- a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c > +++ b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c > @@ -85,7 +85,7 @@ do_test (void) > abort (); > } > > -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 2= "slp2" } } */ > -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 2= "slp2" } } */ > +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 2= "slp2" { target { ! ia32 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 2= "slp2" { target { ! ia32 } } } } */ > /* { dg-final { scan-tree-dump-times {(?n)ABS_EXPR /* { dg-final { scan-tree-dump-times {(?n)=3D -vect} 2 "optimized" { tar= get { ! ia32 } } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c b/gcc/t= estsuite/gcc.target/i386/part-vect-copysignhf.c > index 811617bc3dd..006941f6651 100644 > --- a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c > +++ b/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c > @@ -55,6 +55,6 @@ do_test (void) > abort (); > } > > -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1= "slp2" } } */ > -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1= "slp2" } } */ > +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1= "slp2" { target { ! ia32 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1= "slp2" { target { ! ia32 } } } } */ > /* { dg-final { scan-tree-dump-times ".COPYSIGN" 2 "optimized" { target = { ! ia32 } } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c b/gcc/te= stsuite/gcc.target/i386/part-vect-xorsignhf.c > index a8ec60a088a..a57dc5aba23 100644 > --- a/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c > +++ b/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c > @@ -55,6 +55,6 @@ do_test (void) > abort (); > } > > -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1= "slp2" } } */ > -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1= "slp2" } } */ > +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1= "slp2" { target { ! ia32 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1= "slp2" { target { ! ia32 } } } } */ > /* { dg-final { scan-tree-dump-times ".XORSIGN" 2 "optimized" { target {= ! ia32 } } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr108938-3.c b/gcc/testsuite/g= cc.target/i386/pr108938-3.c > index 32ac544c7ed..24725a9ab1d 100644 > --- a/gcc/testsuite/gcc.target/i386/pr108938-3.c > +++ b/gcc/testsuite/gcc.target/i386/pr108938-3.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize -mno-movbe" } */ > +/* { dg-options "-O2 -ftree-vectorize -mno-movbe -fno-vect-cost-model" }= */ > /* { dg-final { scan-assembler-times "bswap\[\t ]+" 2 { target { ! ia32 = } } } } */ > /* { dg-final { scan-assembler-times "bswap\[\t ]+" 3 { target ia32 } } = } */ > > diff --git a/gcc/testsuite/gcc.target/i386/pr111023-2.c b/gcc/testsuite/g= cc.target/i386/pr111023-2.c > index 6c69f947544..db25668a9ae 100644 > --- a/gcc/testsuite/gcc.target/i386/pr111023-2.c > +++ b/gcc/testsuite/gcc.target/i386/pr111023-2.c > @@ -1,6 +1,6 @@ > /* PR target/111023 */ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn= o-sse4.1" } */ > +/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn= o-sse4.1 -fno-vect-cost-model" } */ > > typedef char v16qi __attribute__((vector_size (16))); > typedef short v8hi __attribute__((vector_size (16))); > diff --git a/gcc/testsuite/gcc.target/i386/pr111023.c b/gcc/testsuite/gcc= .target/i386/pr111023.c > index 6144c371f32..18cd579d937 100644 > --- a/gcc/testsuite/gcc.target/i386/pr111023.c > +++ b/gcc/testsuite/gcc.target/i386/pr111023.c > @@ -1,6 +1,6 @@ > /* PR target/111023 */ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn= o-sse4.1" } */ > +/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn= o-sse4.1 -fno-vect-cost-model" } */ > > typedef unsigned char v16qi __attribute__((vector_size (16))); > typedef unsigned short v8hi __attribute__((vector_size (16))); > diff --git a/gcc/testsuite/gcc.target/i386/pr91446.c b/gcc/testsuite/gcc.= target/i386/pr91446.c > index 0243ca3ea68..3e0f8a4b315 100644 > --- a/gcc/testsuite/gcc.target/i386/pr91446.c > +++ b/gcc/testsuite/gcc.target/i386/pr91446.c > @@ -10,15 +10,15 @@ typedef struct > extern void bar (info *); > > void > -foo (unsigned long long width, unsigned long long height, > - long long x, long long y) > +foo (unsigned long long* width, > + long long* x) > { > info t; > - t.width =3D width; > - t.height =3D height; > - t.x =3D x; > - t.y =3D y; > + t.width =3D width[0]; > + t.height =3D width[1]; > + t.x =3D x[0]; > + t.y =3D x[1]; > bar (&t); > } > > -/* { dg-final { scan-assembler-times "vmovdqa\[^\n\r\]*xmm\[0-9\]" 2 } }= */ > +/* { dg-final { scan-assembler-times "vmovdq\[au\]\[^\n\r\]*xmm\[0-9\]" = 4 } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr99881.c b/gcc/testsuite/gcc.= target/i386/pr99881.c > index 3e087eb2ed7..1a59325ac1c 100644 > --- a/gcc/testsuite/gcc.target/i386/pr99881.c > +++ b/gcc/testsuite/gcc.target/i386/pr99881.c > @@ -1,7 +1,7 @@ > /* PR target/99881. */ > /* { dg-do compile { target { ! ia32 } } } */ > /* { dg-options "-Ofast -march=3Dskylake" } */ > -/* { dg-final { scan-assembler-not "xmm\[0-9\]" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "xmm\[0-9\]" } } */ > > void > foo (int* __restrict a, int n, int c) > -- > 2.31.1 >