From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=5lJ6=HP=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135])
	by sourceware.org (Postfix) with ESMTPS id 7EECC3858C98
	for <gcc-patches@gcc.gnu.org>; Mon,  4 Dec 2023 14:10:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7EECC3858C98
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7EECC3858C98
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::135
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701699062; cv=none;
	b=Z2wR2BlrvcXSnxyBbItBmoPKtiCqPyZAuypuvtmmNx7GYv9rFsaSeucL2riUKjnP/ypmmVBUwoPILHhXcR1hNJr958KKqvvEhgERV7PB2EyDZPzO387byDD+uvverrMQB22NyQyScrTmPhuHO81lpsAhw/rxo4mj05m+yS0GNJI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1701699062; c=relaxed/simple;
	bh=dZWv4mhnnxA7Sc6eRHPvuld9MEjrlduPZyZZM9NLHSM=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=EEO+UuwC2NYYTXaMk/MVrxbznlufc5CSIFWrBUcwz2Meo/1iEr9eLdJeuBJWSd/CUMJDae9iG0/eC3J4wFpeAjhpouH2dvTi2SmaWH13S/aeQJACdcJX52lP3QaK8YQsXIH++CXXFONe7ZRykHG83CDKbaL4XxL+4eD5OP+YxpU=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-50bf4f97752so1416058e87.1
        for <gcc-patches@gcc.gnu.org>; Mon, 04 Dec 2023 06:10:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1701699058; x=1702303858; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hndS2WMiCR9rZKzvucLmmyJMfN0OtjXM+99cRypn+iA=;
        b=lcMMpugFNHKvoA8lkkjrQkT/XBQaUMFqHrWZqtWfImh+anM9ytzuai/sQXZtsO4c4D
         UXoMzkiLABUdSGrZXqG4klf/TMl2oxCmRW/z2Gz+YqR2tyFlSZQvx+QaPVZrrJRBjmUp
         E0/O0Gd1oVhfpyI3ohlZn6+/FahZGRHuu+5fmV5ZvgSl01oTnRz+GOLxDE7YBG02Fe+d
         i13fqOBDlptRdtGbpyJ7ROQYfL3REj+sT+dtiXf1XQgTMD4hlMhjuY7OP+sHRIh1fXPl
         nSPNEJcBZqobSzNy9Z3F724Oqn5IPJB8MI5xuT9Mq7EaXggL/EJPuQsjeIT1w8QabFKK
         wZ0g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1701699058; x=1702303858;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hndS2WMiCR9rZKzvucLmmyJMfN0OtjXM+99cRypn+iA=;
        b=J7zKxvfXLINL7BYUta3Ah7mgPXq/duo8LN98ZwGCLFnfJhXmqJrVTdb8rTqHrcj21y
         ndMXGbWwHgZqfJAq4H0meCEacU+/jV/SLSsm/3SehIUKZb7H93LKB/PA+/kCifha5jr0
         EYaGD/X3CkY5TNz79wzyKZNYto8LU13/aOYxLeiE/BfosTl+z6PG4gF9PQFcjJMjKO0l
         DLtL5SfxlqZtZqvpI8RvyrjaGFfBd8guCX9i9lpEgMiCKNEggMxZQ4ZeHo1gupzBsOMM
         Xuy2m8hP7xUxOE25ZGCNb7bT7jWBzDMb17rRS4vPivkbEraTYB/NxGSK9d8XQc+GmeOF
         SJYg==
X-Gm-Message-State: AOJu0YzFIf7cXso3vX4weqZgvTXhdNRK2sf21luGijVs8WkanPVMdYm4
	U+sY4v53raDjFTftA9NHl6CDPTajuo6AoMGrIGQ=
X-Google-Smtp-Source: AGHT+IGS610Cok7YvXvSsTnoRaQpaP3hbMwyrP+wTux5TdTqxrXSKBioDa2Myjs/8YMH0j4npqZKWdEuEXDQ7MPyXew=
X-Received: by 2002:a05:6512:3d08:b0:50b:fe99:b477 with SMTP id
 d8-20020a0565123d0800b0050bfe99b477mr325948lfv.59.1701699057283; Mon, 04 Dec
 2023 06:10:57 -0800 (PST)
MIME-Version: 1.0
References: <20231204053208.908533-1-hongtao.liu@intel.com>
In-Reply-To: <20231204053208.908533-1-hongtao.liu@intel.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 4 Dec 2023 15:10:44 +0100
Message-ID: <CAFiYyc2sZCSAHd+tS_eHk8O+b8_THCsbycdVjrgwPTJe=3BZmQ@mail.gmail.com>
Subject: Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct
 and stores
To: liuhongt <hongtao.liu@intel.com>
Cc: gcc-patches@gcc.gnu.org, crazylht@gmail.com, hjl.tools@gmail.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Dec 4, 2023 at 6:32=E2=80=AFAM liuhongt <hongtao.liu@intel.com> wro=
te:
>
> .i.e. for below cases.
>    a[0] =3D b1;
>    a[1] =3D b2;
>    ..
>    a[n] =3D bn;
>
> There're extra dependences when contructing the vector, but not for
> scalar store. According to experiments, it's generally worse.
>
> The patch adds an cut-off heuristic when vec_stmt is just
> vec_construct and vector store. It improves SPEC2017 a little bit.
>
> BenchMarks              Ratio
> 500.perlbench_r         2.60%
> 502.gcc_r               0.30%
> 505.mcf_r               0.40%
> 520.omnetpp_r           -1.00%
> 523.xalancbmk_r         0.90%
> 525.x264_r              0.00%
> 531.deepsjeng_r         0.30%
> 541.leela_r             0.90%
> 548.exchange2_r         3.20%
> 557.xz_r                1.40%
> 503.bwaves_r            0.00%
> 507.cactuBSSN_r         0.00%
> 508.namd_r              0.30%
> 510.parest_r            0.00%
> 511.povray_r            0.20%
> 519.lbm_r               SAME BIN
> 521.wrf_r               -0.30%
> 526.blender_r           -1.20%
> 527.cam4_r              -0.20%
> 538.imagick_r           4.00%
> 544.nab_r               0.40%
> 549.fotonik3d_r         0.00%
> 554.roms_r              0.00%
> Geomean-int             0.90%
> Geomean-fp              0.30%
> Geomean-all             0.50%
>
> And
> Regressed testcases:
>
> gcc.target/i386/part-vect-absneghf.c
> gcc.target/i386/part-vect-copysignhf.c
> gcc.target/i386/part-vect-xorsignhf.c
>
> Regressed under -m32 since it generates 2 vector
> .ABS/NEG/XORSIGN/COPYSIGN vs original 1 64-bit vec_construct. The
> original testcases are used to test vectorization capability for
> .ABS/NEG/XORG/COPYSIGN, so just restrict testcase to TARGET_64BIT.
>
> gcc.target/i386/pr111023-2.c
> gcc.target/i386/pr111023.c
> Regressed under -m32
>
> testcase as below
>
> void
> v8hi_v8qi (v8hi *dst, v16qi src)
> {
>   short tem[8];
>   tem[0] =3D src[0];
>   tem[1] =3D src[1];
>   tem[2] =3D src[2];
>   tem[3] =3D src[3];
>   tem[4] =3D src[4];
>   tem[5] =3D src[5];
>   tem[6] =3D src[6];
>   tem[7] =3D src[7];
>   dst[0] =3D *(v8hi *) tem;
> }
>
> under 64-bit target, vectorizer realize it's just permutation of
> original src vector, but under -m32, vectorizer relies on
> vec_construct for vectorization. I think optimziation for this case
> under 32-bit target maynot impact much, so just add
> -fno-vect-cost-model.
>
> gcc.target/i386/pr91446.c: This testcase is guard for cost model of
> vector store, not vectorization capability, so just adjust testcase.
>
> gcc.target/i386/pr108938-3.c: This testcase relies on vec_construct to
> optimize for bswap, like other optimziation vectorizer can't realize
> optimization after it. So the current solution is add
> -fno-vect-cost-model to the testcase.
>
> costmodel-pr104582-1.c
> costmodel-pr104582-2.c
> costmodel-pr104582-4.c
>
> Failed since it's now not vectorized, looked at the PR, it's exactly
> what's wanted, so adjust testcase to scan-tree-dump-not.
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

So the original motivation to not more aggressively prune
store-from-CTOR vectorization in the vectorizer itself is that
the vector store is possibly better for STLF (larger stores are
good, larger loads eventually problematic).

I'd also expect the costs to play out to not make those profitable.

OTOH, if you have a series of 'double' stores you can convert to
a series of V2DF stores you _may_ be faster if this reduces
pressure on the store unit.  Esp. V2DF is cheap to construct
with one movhpd.

So I don't think we want to try to pattern match it this way?

In fact the SLP vectorization cases could all arrive with an
SLP node specified (vectorizable_store would have to be
changed here), which means you could check for an
vect_external_def child instead?

But as said, I would hope that we can arrive at a better way
assessing the CONSTRUCTOR cost.  IMHO one big issue
is that load and store cost are comparatively high compared
to simple stmt ops so it's very hard to offset saving many
stores with "ops".  That's because we generally think of
'cost' to model latency but as you say stores don't really
have latency - we only have store bandwidth of the store
unit and of course issue width (but that's true for other ops
as well).  I wonder what happens if we set both scalar and
vector store cost to zero?  Or maybe one (to count one
issue slot)?

Richard.


> gcc/ChangeLog:
>
>         PR target/99881
>         PR target/104582
>         * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
>         Check if kind is vec_construct or vector store.
>         (ix86_vector_costs::finish_cost): Don't do vectorization when
>         vector stmts are only vec_construct and stores.
>         (ix86_vector_costs::ix86_vect_construct_store_only_p): New
>         function.
>         (ix86_vector_costs::ix86_vect_cut_off): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/part-vect-absneghf.c: Restrict testcase to
>         TARGET_64BIT.
>         * gcc.target/i386/part-vect-copysignhf.c: Ditto.
>         * gcc.target/i386/part-vect-xorsignhf.c: Ditto.
>         * gcc.target/i386/pr91446.c: Adjust testcase.
>         * gcc.target/i386/pr108938-3.c: Add -fno-vect-cost-model.
>         * gcc.target/i386/pr111023-2.c: Ditto.
>         * gcc.target/i386/pr111023.c: Ditto.
>         * gcc.target/i386/pr99881.c: Remove xfail.
>         * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: Changed
>         to Scan-tree-dump-not.
>         * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Ditto.
>         * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Ditto.
> ---
>  gcc/config/i386/i386.cc                       | 81 ++++++++++++++++++-
>  .../costmodel/x86_64/costmodel-pr104582-1.c   |  2 +-
>  .../costmodel/x86_64/costmodel-pr104582-3.c   |  2 +-
>  .../costmodel/x86_64/costmodel-pr104582-4.c   |  2 +-
>  .../gcc.target/i386/part-vect-absneghf.c      |  4 +-
>  .../gcc.target/i386/part-vect-copysignhf.c    |  4 +-
>  .../gcc.target/i386/part-vect-xorsignhf.c     |  4 +-
>  gcc/testsuite/gcc.target/i386/pr108938-3.c    |  2 +-
>  gcc/testsuite/gcc.target/i386/pr111023-2.c    |  2 +-
>  gcc/testsuite/gcc.target/i386/pr111023.c      |  2 +-
>  gcc/testsuite/gcc.target/i386/pr91446.c       | 14 ++--
>  gcc/testsuite/gcc.target/i386/pr99881.c       |  2 +-
>  12 files changed, 99 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index dcaea6c2096..a4b23e29eba 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24573,6 +24573,10 @@ public:
>
>  private:
>
> +  /* Don't do vectorization for certain patterns.  */
> +  void ix86_vect_cut_off ();
> +
> +  bool ix86_vect_construct_store_only_p (vect_cost_for_stmt, stmt_vec_in=
fo);
>    /* Estimate register pressure of the vectorized code.  */
>    void ix86_vect_estimate_reg_pressure ();
>    /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used f=
or
> @@ -24581,12 +24585,15 @@ private:
>       where we know it's not loaded from memory.  */
>    unsigned m_num_gpr_needed[3];
>    unsigned m_num_sse_needed[3];
> +
> +  bool m_vec_construct_store_only;
>  };
>
>  ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool costing_for_=
scalar)
>    : vector_costs (vinfo, costing_for_scalar),
>      m_num_gpr_needed (),
> -    m_num_sse_needed ()
> +    m_num_sse_needed (),
> +    m_vec_construct_store_only (true)
>  {
>  }
>
> @@ -24609,6 +24616,10 @@ ix86_vector_costs::add_stmt_cost (int count, vec=
t_cost_for_stmt kind,
>      =3D (kind =3D=3D scalar_stmt || kind =3D=3D scalar_load || kind =3D=
=3D scalar_store);
>    int stmt_cost =3D - 1;
>
> +  if (m_vec_construct_store_only
> +      && !ix86_vect_construct_store_only_p (kind, stmt_info))
> +    m_vec_construct_store_only =3D false;
> +
>    bool fp =3D false;
>    machine_mode mode =3D scalar_p ? SImode : TImode;
>
> @@ -24865,8 +24876,45 @@ ix86_vector_costs::ix86_vect_estimate_reg_pressu=
re ()
>      }
>  }
>
> +/* Return true if KIND and STMT_INFO is either vec_construct or vector
> +   store, stmt_info could be promote/demote between vec_construct and
> +   vector store, also count that.  */
> +bool
> +ix86_vector_costs::ix86_vect_construct_store_only_p (vect_cost_for_stmt =
kind,
> +                                                   stmt_vec_info stmt_in=
fo)
> +{
> +  switch (kind)
> +    {
> +    case vec_construct:
> +    case vector_store:
> +    case unaligned_store:
> +    case vector_scatter_store:
> +    case vec_promote_demote:
> +      return true;
> +
> +      /* Vectorizer will try VEC_PACK_TRUNK_EXPR for things likes
> +        char* a;
> +        short b1, b2, b3, b4;
> +        a[0] =3D b1;
> +        a[1] =3D b2;
> +        a[2] =3D b3;
> +        a[3] =3D b4;
> +        Also don't vectorized it.  */
> +    case vector_stmt:
> +      if (stmt_info && stmt_info->stmt
> +         && gimple_code (stmt_info->stmt) =3D=3D GIMPLE_ASSIGN
> +         && gimple_assign_rhs_code (stmt_info->stmt) =3D=3D NOP_EXPR)
> +       return true;
> +
> +    default:
> +      break;
> +    }
> +
> +  return false;
> +}
> +
>  void
> -ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
> +ix86_vector_costs::ix86_vect_cut_off ()
>  {
>    loop_vec_info loop_vinfo =3D dyn_cast<loop_vec_info> (m_vinfo);
>    if (loop_vinfo && !m_costing_for_scalar)
> @@ -24885,10 +24933,39 @@ ix86_vector_costs::finish_cost (const vector_co=
sts *scalar_costs)
>           && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant=
 ())
>               > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo))))
>         m_costs[vect_body] =3D INT_MAX;
> +      return;
> +    }
> +
> +  /* Don't do vectorization for things like
> +
> +     a[0] =3D b1;
> +     a[1] =3D b2;
> +     ..
> +     a[n] =3D bn;
> +
> +     There're extra dependences when contructing the vector, but not for
> +     scalar store. According to experiments, it's generally worse.  */
> +  if (m_vec_construct_store_only)
> +    {
> +      m_costs[0] =3D m_costs[1] =3D m_costs[2] =3D INT_MAX;
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "Skip vectorization for stmts which only contain=
s"
> +                        " vec_construct and vector store.\n");
> +      return;
>      }
>
> +  return;
> +}
> +
> +void
> +ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
> +{
> +
>    ix86_vect_estimate_reg_pressure ();
>
> +  ix86_vect_cut_off ();
> +
>    vector_costs::finish_cost (scalar_costs);
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr10458=
2-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
> index 992a845ad7a..f940af70b72 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
> @@ -12,4 +12,4 @@ foo (unsigned long *a, unsigned long *b)
>    s.b =3D b_;
>  }
>
> -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } =
*/
> +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" =
} } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr10458=
2-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
> index 999c4905708..eff60b2a82a 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
> @@ -10,4 +10,4 @@ foo (double a, double b)
>    s.b =3D b;
>  }
>
> -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } =
*/
> +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" =
} } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr10458=
2-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
> index cc471e1ed73..2f354338e96 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
> @@ -12,4 +12,4 @@ foo (signed long *a, unsigned long *b)
>    s.b =3D b_;
>  }
>
> -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } =
*/
> +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" =
} } */
> diff --git a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c b/gcc/tes=
tsuite/gcc.target/i386/part-vect-absneghf.c
> index 48aed14d604..4052210ec39 100644
> --- a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> @@ -85,7 +85,7 @@ do_test (void)
>        abort ();
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 2=
 "slp2" } } */
> -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 2=
 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 2=
 "slp2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 2=
 "slp2" { target { ! ia32 } } } } */
>  /* { dg-final { scan-tree-dump-times {(?n)ABS_EXPR <vect} 2 "optimized" =
{ target { ! ia32 } } } } */
>  /* { dg-final { scan-tree-dump-times {(?n)=3D -vect} 2 "optimized" { tar=
get { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c b/gcc/t=
estsuite/gcc.target/i386/part-vect-copysignhf.c
> index 811617bc3dd..006941f6651 100644
> --- a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
> @@ -55,6 +55,6 @@ do_test (void)
>        abort ();
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1=
 "slp2" } } */
> -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1=
 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1=
 "slp2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1=
 "slp2" { target { ! ia32 } } } } */
>  /* { dg-final { scan-tree-dump-times ".COPYSIGN" 2 "optimized" { target =
{ ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c b/gcc/te=
stsuite/gcc.target/i386/part-vect-xorsignhf.c
> index a8ec60a088a..a57dc5aba23 100644
> --- a/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c
> @@ -55,6 +55,6 @@ do_test (void)
>        abort ();
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1=
 "slp2" } } */
> -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1=
 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1=
 "slp2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1=
 "slp2" { target { ! ia32 } } } } */
>  /* { dg-final { scan-tree-dump-times ".XORSIGN" 2 "optimized" { target {=
 ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr108938-3.c b/gcc/testsuite/g=
cc.target/i386/pr108938-3.c
> index 32ac544c7ed..24725a9ab1d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr108938-3.c
> +++ b/gcc/testsuite/gcc.target/i386/pr108938-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -mno-movbe" } */
> +/* { dg-options "-O2 -ftree-vectorize -mno-movbe -fno-vect-cost-model" }=
 */
>  /* { dg-final { scan-assembler-times "bswap\[\t ]+" 2 { target { ! ia32 =
} } } } */
>  /* { dg-final { scan-assembler-times "bswap\[\t ]+" 3 { target ia32 } } =
} */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr111023-2.c b/gcc/testsuite/g=
cc.target/i386/pr111023-2.c
> index 6c69f947544..db25668a9ae 100644
> --- a/gcc/testsuite/gcc.target/i386/pr111023-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr111023-2.c
> @@ -1,6 +1,6 @@
>  /* PR target/111023 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn=
o-sse4.1" } */
> +/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn=
o-sse4.1 -fno-vect-cost-model" } */
>
>  typedef char v16qi __attribute__((vector_size (16)));
>  typedef short v8hi __attribute__((vector_size (16)));
> diff --git a/gcc/testsuite/gcc.target/i386/pr111023.c b/gcc/testsuite/gcc=
.target/i386/pr111023.c
> index 6144c371f32..18cd579d937 100644
> --- a/gcc/testsuite/gcc.target/i386/pr111023.c
> +++ b/gcc/testsuite/gcc.target/i386/pr111023.c
> @@ -1,6 +1,6 @@
>  /* PR target/111023 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn=
o-sse4.1" } */
> +/* { dg-options "-O2 -mtune=3Dicelake-server -ftree-vectorize -msse2 -mn=
o-sse4.1 -fno-vect-cost-model" } */
>
>  typedef unsigned char v16qi __attribute__((vector_size (16)));
>  typedef unsigned short v8hi __attribute__((vector_size (16)));
> diff --git a/gcc/testsuite/gcc.target/i386/pr91446.c b/gcc/testsuite/gcc.=
target/i386/pr91446.c
> index 0243ca3ea68..3e0f8a4b315 100644
> --- a/gcc/testsuite/gcc.target/i386/pr91446.c
> +++ b/gcc/testsuite/gcc.target/i386/pr91446.c
> @@ -10,15 +10,15 @@ typedef struct
>  extern void bar (info *);
>
>  void
> -foo (unsigned long long width, unsigned long long height,
> -     long long x, long long y)
> +foo (unsigned long long* width,
> +     long long* x)
>  {
>    info t;
> -  t.width =3D width;
> -  t.height =3D height;
> -  t.x =3D x;
> -  t.y =3D y;
> +  t.width =3D width[0];
> +  t.height =3D width[1];
> +  t.x =3D x[0];
> +  t.y =3D x[1];
>    bar (&t);
>  }
>
> -/* { dg-final { scan-assembler-times "vmovdqa\[^\n\r\]*xmm\[0-9\]" 2 } }=
 */
> +/* { dg-final { scan-assembler-times "vmovdq\[au\]\[^\n\r\]*xmm\[0-9\]" =
4 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr99881.c b/gcc/testsuite/gcc.=
target/i386/pr99881.c
> index 3e087eb2ed7..1a59325ac1c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr99881.c
> +++ b/gcc/testsuite/gcc.target/i386/pr99881.c
> @@ -1,7 +1,7 @@
>  /* PR target/99881.  */
>  /* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-Ofast -march=3Dskylake" } */
> -/* { dg-final { scan-assembler-not "xmm\[0-9\]" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "xmm\[0-9\]"  } } */
>
>  void
>  foo (int* __restrict a, int n, int c)
> --
> 2.31.1
>