Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct and stores

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hongtao Liu <crazylht@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: liuhongt <hongtao.liu@intel.com>,
	gcc-patches@gcc.gnu.org, hjl.tools@gmail.com
Subject: Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct and stores
Date: Wed, 6 Dec 2023 10:16:55 +0800	[thread overview]
Message-ID: <CAMZc-bwA1z0qDSmLBFxL1rEmabkw3LCg8DCbo5jhsHfTe1ZxbQ@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2sZCSAHd+tS_eHk8O+b8_THCsbycdVjrgwPTJe=3BZmQ@mail.gmail.com>

On Mon, Dec 4, 2023 at 10:10 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Dec 4, 2023 at 6:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > .i.e. for below cases.
> >    a[0] = b1;
> >    a[1] = b2;
> >    ..
> >    a[n] = bn;
> >
> > There're extra dependences when contructing the vector, but not for
> > scalar store. According to experiments, it's generally worse.
> >
> > The patch adds an cut-off heuristic when vec_stmt is just
> > vec_construct and vector store. It improves SPEC2017 a little bit.
> >
> > BenchMarks              Ratio
> > 500.perlbench_r         2.60%
> > 502.gcc_r               0.30%
> > 505.mcf_r               0.40%
> > 520.omnetpp_r           -1.00%
> > 523.xalancbmk_r         0.90%
> > 525.x264_r              0.00%
> > 531.deepsjeng_r         0.30%
> > 541.leela_r             0.90%
> > 548.exchange2_r         3.20%
> > 557.xz_r                1.40%
> > 503.bwaves_r            0.00%
> > 507.cactuBSSN_r         0.00%
> > 508.namd_r              0.30%
> > 510.parest_r            0.00%
> > 511.povray_r            0.20%
> > 519.lbm_r               SAME BIN
> > 521.wrf_r               -0.30%
> > 526.blender_r           -1.20%
> > 527.cam4_r              -0.20%
> > 538.imagick_r           4.00%
> > 544.nab_r               0.40%
> > 549.fotonik3d_r         0.00%
> > 554.roms_r              0.00%
> > Geomean-int             0.90%
> > Geomean-fp              0.30%
> > Geomean-all             0.50%
> >
> > And
> > Regressed testcases:
> >
> > gcc.target/i386/part-vect-absneghf.c
> > gcc.target/i386/part-vect-copysignhf.c
> > gcc.target/i386/part-vect-xorsignhf.c
> >
> > Regressed under -m32 since it generates 2 vector
> > .ABS/NEG/XORSIGN/COPYSIGN vs original 1 64-bit vec_construct. The
> > original testcases are used to test vectorization capability for
> > .ABS/NEG/XORG/COPYSIGN, so just restrict testcase to TARGET_64BIT.
> >
> > gcc.target/i386/pr111023-2.c
> > gcc.target/i386/pr111023.c
> > Regressed under -m32
> >
> > testcase as below
> >
> > void
> > v8hi_v8qi (v8hi *dst, v16qi src)
> > {
> >   short tem[8];
> >   tem[0] = src[0];
> >   tem[1] = src[1];
> >   tem[2] = src[2];
> >   tem[3] = src[3];
> >   tem[4] = src[4];
> >   tem[5] = src[5];
> >   tem[6] = src[6];
> >   tem[7] = src[7];
> >   dst[0] = *(v8hi *) tem;
> > }
> >
> > under 64-bit target, vectorizer realize it's just permutation of
> > original src vector, but under -m32, vectorizer relies on
> > vec_construct for vectorization. I think optimziation for this case
> > under 32-bit target maynot impact much, so just add
> > -fno-vect-cost-model.
> >
> > gcc.target/i386/pr91446.c: This testcase is guard for cost model of
> > vector store, not vectorization capability, so just adjust testcase.
> >
> > gcc.target/i386/pr108938-3.c: This testcase relies on vec_construct to
> > optimize for bswap, like other optimziation vectorizer can't realize
> > optimization after it. So the current solution is add
> > -fno-vect-cost-model to the testcase.
> >
> > costmodel-pr104582-1.c
> > costmodel-pr104582-2.c
> > costmodel-pr104582-4.c
> >
> > Failed since it's now not vectorized, looked at the PR, it's exactly
> > what's wanted, so adjust testcase to scan-tree-dump-not.
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> So the original motivation to not more aggressively prune
> store-from-CTOR vectorization in the vectorizer itself is that
> the vector store is possibly better for STLF (larger stores are
> good, larger loads eventually problematic).

That's exactly what I worried about, and I didn't observe any STLF
stall in SPEC2017, I'll try with more benchmarks.
But on the other hand, the cost model is not suitable for solving this
problem, at best it only circumvents part of this.

>
> I'd also expect the costs to play out to not make those profitable.
>
> OTOH, if you have a series of 'double' stores you can convert to
> a series of V2DF stores you _may_ be faster if this reduces
> pressure on the store unit.  Esp. V2DF is cheap to construct
> with one movhpd.

>
> So I don't think we want to try to pattern match it this way?
>
> In fact the SLP vectorization cases could all arrive with an
> SLP node specified (vectorizable_store would have to be
> changed here), which means you could check for an
> vect_external_def child instead?
>
> But as said, I would hope that we can arrive at a better way
> assessing the CONSTRUCTOR cost.  IMHO one big issue
> is that load and store cost are comparatively high compared
> to simple stmt ops so it's very hard to offset saving many
> stores with "ops".  That's because we generally think of
> 'cost' to model latency but as you say stores don't really
> have latency - we only have store bandwidth of the store

Yes.

> unit and of course issue width (but that's true for other ops
> as well).  I wonder what happens if we set both scalar and
> vector store cost to zero?  Or maybe one (to count one
> issue slot)?

I tried to reduce the cost of the scalar store, but it regressed in other cases.
But I haven't tried with zero/one, it sounds like a good idea, Let me try that.

>
> Richard.
>
>
> > gcc/ChangeLog:
> >
> >         PR target/99881
> >         PR target/104582
> >         * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> >         Check if kind is vec_construct or vector store.
> >         (ix86_vector_costs::finish_cost): Don't do vectorization when
> >         vector stmts are only vec_construct and stores.
> >         (ix86_vector_costs::ix86_vect_construct_store_only_p): New
> >         function.
> >         (ix86_vector_costs::ix86_vect_cut_off): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/part-vect-absneghf.c: Restrict testcase to
> >         TARGET_64BIT.
> >         * gcc.target/i386/part-vect-copysignhf.c: Ditto.
> >         * gcc.target/i386/part-vect-xorsignhf.c: Ditto.
> >         * gcc.target/i386/pr91446.c: Adjust testcase.
> >         * gcc.target/i386/pr108938-3.c: Add -fno-vect-cost-model.
> >         * gcc.target/i386/pr111023-2.c: Ditto.
> >         * gcc.target/i386/pr111023.c: Ditto.
> >         * gcc.target/i386/pr99881.c: Remove xfail.
> >         * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: Changed
> >         to Scan-tree-dump-not.
> >         * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Ditto.
> >         * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Ditto.
> > ---
> >  gcc/config/i386/i386.cc                       | 81 ++++++++++++++++++-
> >  .../costmodel/x86_64/costmodel-pr104582-1.c   |  2 +-
> >  .../costmodel/x86_64/costmodel-pr104582-3.c   |  2 +-
> >  .../costmodel/x86_64/costmodel-pr104582-4.c   |  2 +-
> >  .../gcc.target/i386/part-vect-absneghf.c      |  4 +-
> >  .../gcc.target/i386/part-vect-copysignhf.c    |  4 +-
> >  .../gcc.target/i386/part-vect-xorsignhf.c     |  4 +-
> >  gcc/testsuite/gcc.target/i386/pr108938-3.c    |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr111023-2.c    |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr111023.c      |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr91446.c       | 14 ++--
> >  gcc/testsuite/gcc.target/i386/pr99881.c       |  2 +-
> >  12 files changed, 99 insertions(+), 22 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index dcaea6c2096..a4b23e29eba 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -24573,6 +24573,10 @@ public:
> >
> >  private:
> >
> > +  /* Don't do vectorization for certain patterns.  */
> > +  void ix86_vect_cut_off ();
> > +
> > +  bool ix86_vect_construct_store_only_p (vect_cost_for_stmt, stmt_vec_info);
> >    /* Estimate register pressure of the vectorized code.  */
> >    void ix86_vect_estimate_reg_pressure ();
> >    /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for
> > @@ -24581,12 +24585,15 @@ private:
> >       where we know it's not loaded from memory.  */
> >    unsigned m_num_gpr_needed[3];
> >    unsigned m_num_sse_needed[3];
> > +
> > +  bool m_vec_construct_store_only;
> >  };
> >
> >  ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool costing_for_scalar)
> >    : vector_costs (vinfo, costing_for_scalar),
> >      m_num_gpr_needed (),
> > -    m_num_sse_needed ()
> > +    m_num_sse_needed (),
> > +    m_vec_construct_store_only (true)
> >  {
> >  }
> >
> > @@ -24609,6 +24616,10 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
> >      = (kind == scalar_stmt || kind == scalar_load || kind == scalar_store);
> >    int stmt_cost = - 1;
> >
> > +  if (m_vec_construct_store_only
> > +      && !ix86_vect_construct_store_only_p (kind, stmt_info))
> > +    m_vec_construct_store_only = false;
> > +
> >    bool fp = false;
> >    machine_mode mode = scalar_p ? SImode : TImode;
> >
> > @@ -24865,8 +24876,45 @@ ix86_vector_costs::ix86_vect_estimate_reg_pressure ()
> >      }
> >  }
> >
> > +/* Return true if KIND and STMT_INFO is either vec_construct or vector
> > +   store, stmt_info could be promote/demote between vec_construct and
> > +   vector store, also count that.  */
> > +bool
> > +ix86_vector_costs::ix86_vect_construct_store_only_p (vect_cost_for_stmt kind,
> > +                                                   stmt_vec_info stmt_info)
> > +{
> > +  switch (kind)
> > +    {
> > +    case vec_construct:
> > +    case vector_store:
> > +    case unaligned_store:
> > +    case vector_scatter_store:
> > +    case vec_promote_demote:
> > +      return true;
> > +
> > +      /* Vectorizer will try VEC_PACK_TRUNK_EXPR for things likes
> > +        char* a;
> > +        short b1, b2, b3, b4;
> > +        a[0] = b1;
> > +        a[1] = b2;
> > +        a[2] = b3;
> > +        a[3] = b4;
> > +        Also don't vectorized it.  */
> > +    case vector_stmt:
> > +      if (stmt_info && stmt_info->stmt
> > +         && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN
> > +         && gimple_assign_rhs_code (stmt_info->stmt) == NOP_EXPR)
> > +       return true;
> > +
> > +    default:
> > +      break;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> >  void
> > -ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
> > +ix86_vector_costs::ix86_vect_cut_off ()
> >  {
> >    loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (m_vinfo);
> >    if (loop_vinfo && !m_costing_for_scalar)
> > @@ -24885,10 +24933,39 @@ ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
> >           && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ())
> >               > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo))))
> >         m_costs[vect_body] = INT_MAX;
> > +      return;
> > +    }
> > +
> > +  /* Don't do vectorization for things like
> > +
> > +     a[0] = b1;
> > +     a[1] = b2;
> > +     ..
> > +     a[n] = bn;
> > +
> > +     There're extra dependences when contructing the vector, but not for
> > +     scalar store. According to experiments, it's generally worse.  */
> > +  if (m_vec_construct_store_only)
> > +    {
> > +      m_costs[0] = m_costs[1] = m_costs[2] = INT_MAX;
> > +      if (dump_enabled_p ())
> > +       dump_printf_loc (MSG_NOTE, vect_location,
> > +                        "Skip vectorization for stmts which only contains"
> > +                        " vec_construct and vector store.\n");
> > +      return;
> >      }
> >
> > +  return;
> > +}
> > +
> > +void
> > +ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
> > +{
> > +
> >    ix86_vect_estimate_reg_pressure ();
> >
> > +  ix86_vect_cut_off ();
> > +
> >    vector_costs::finish_cost (scalar_costs);
> >  }
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
> > index 992a845ad7a..f940af70b72 100644
> > --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
> > @@ -12,4 +12,4 @@ foo (unsigned long *a, unsigned long *b)
> >    s.b = b_;
> >  }
> >
> > -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } */
> > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
> > index 999c4905708..eff60b2a82a 100644
> > --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
> > @@ -10,4 +10,4 @@ foo (double a, double b)
> >    s.b = b;
> >  }
> >
> > -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } */
> > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
> > index cc471e1ed73..2f354338e96 100644
> > --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
> > @@ -12,4 +12,4 @@ foo (signed long *a, unsigned long *b)
> >    s.b = b_;
> >  }
> >
> > -/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" } } */
> > +/* { dg-final { scan-tree-dump-not "basic block part vectorized" "slp2" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> > index 48aed14d604..4052210ec39 100644
> > --- a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> > +++ b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> > @@ -85,7 +85,7 @@ do_test (void)
> >        abort ();
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 2 "slp2" } } */
> > -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 2 "slp2" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 2 "slp2" { target { ! ia32 } } } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 2 "slp2" { target { ! ia32 } } } } */
> >  /* { dg-final { scan-tree-dump-times {(?n)ABS_EXPR <vect} 2 "optimized" { target { ! ia32 } } } } */
> >  /* { dg-final { scan-tree-dump-times {(?n)= -vect} 2 "optimized" { target { ! ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c b/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
> > index 811617bc3dd..006941f6651 100644
> > --- a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
> > +++ b/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
> > @@ -55,6 +55,6 @@ do_test (void)
> >        abort ();
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1 "slp2" } } */
> > -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1 "slp2" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1 "slp2" { target { ! ia32 } } } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1 "slp2" { target { ! ia32 } } } } */
> >  /* { dg-final { scan-tree-dump-times ".COPYSIGN" 2 "optimized" { target { ! ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c b/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c
> > index a8ec60a088a..a57dc5aba23 100644
> > --- a/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c
> > +++ b/gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c
> > @@ -55,6 +55,6 @@ do_test (void)
> >        abort ();
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1 "slp2" } } */
> > -/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1 "slp2" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized using 8 byte vectors" 1 "slp2" { target { ! ia32 } } } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized using 4 byte vectors" 1 "slp2" { target { ! ia32 } } } } */
> >  /* { dg-final { scan-tree-dump-times ".XORSIGN" 2 "optimized" { target { ! ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr108938-3.c b/gcc/testsuite/gcc.target/i386/pr108938-3.c
> > index 32ac544c7ed..24725a9ab1d 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr108938-3.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr108938-3.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -ftree-vectorize -mno-movbe" } */
> > +/* { dg-options "-O2 -ftree-vectorize -mno-movbe -fno-vect-cost-model" } */
> >  /* { dg-final { scan-assembler-times "bswap\[\t ]+" 2 { target { ! ia32 } } } } */
> >  /* { dg-final { scan-assembler-times "bswap\[\t ]+" 3 { target ia32 } } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr111023-2.c b/gcc/testsuite/gcc.target/i386/pr111023-2.c
> > index 6c69f947544..db25668a9ae 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr111023-2.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr111023-2.c
> > @@ -1,6 +1,6 @@
> >  /* PR target/111023 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mtune=icelake-server -ftree-vectorize -msse2 -mno-sse4.1" } */
> > +/* { dg-options "-O2 -mtune=icelake-server -ftree-vectorize -msse2 -mno-sse4.1 -fno-vect-cost-model" } */
> >
> >  typedef char v16qi __attribute__((vector_size (16)));
> >  typedef short v8hi __attribute__((vector_size (16)));
> > diff --git a/gcc/testsuite/gcc.target/i386/pr111023.c b/gcc/testsuite/gcc.target/i386/pr111023.c
> > index 6144c371f32..18cd579d937 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr111023.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr111023.c
> > @@ -1,6 +1,6 @@
> >  /* PR target/111023 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mtune=icelake-server -ftree-vectorize -msse2 -mno-sse4.1" } */
> > +/* { dg-options "-O2 -mtune=icelake-server -ftree-vectorize -msse2 -mno-sse4.1 -fno-vect-cost-model" } */
> >
> >  typedef unsigned char v16qi __attribute__((vector_size (16)));
> >  typedef unsigned short v8hi __attribute__((vector_size (16)));
> > diff --git a/gcc/testsuite/gcc.target/i386/pr91446.c b/gcc/testsuite/gcc.target/i386/pr91446.c
> > index 0243ca3ea68..3e0f8a4b315 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr91446.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr91446.c
> > @@ -10,15 +10,15 @@ typedef struct
> >  extern void bar (info *);
> >
> >  void
> > -foo (unsigned long long width, unsigned long long height,
> > -     long long x, long long y)
> > +foo (unsigned long long* width,
> > +     long long* x)
> >  {
> >    info t;
> > -  t.width = width;
> > -  t.height = height;
> > -  t.x = x;
> > -  t.y = y;
> > +  t.width = width[0];
> > +  t.height = width[1];
> > +  t.x = x[0];
> > +  t.y = x[1];
> >    bar (&t);
> >  }
> >
> > -/* { dg-final { scan-assembler-times "vmovdqa\[^\n\r\]*xmm\[0-9\]" 2 } } */
> > +/* { dg-final { scan-assembler-times "vmovdq\[au\]\[^\n\r\]*xmm\[0-9\]" 4 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr99881.c b/gcc/testsuite/gcc.target/i386/pr99881.c
> > index 3e087eb2ed7..1a59325ac1c 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr99881.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr99881.c
> > @@ -1,7 +1,7 @@
> >  /* PR target/99881.  */
> >  /* { dg-do compile { target { ! ia32 } } } */
> >  /* { dg-options "-Ofast -march=skylake" } */
> > -/* { dg-final { scan-assembler-not "xmm\[0-9\]" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "xmm\[0-9\]"  } } */
> >
> >  void
> >  foo (int* __restrict a, int n, int c)
> > --
> > 2.31.1
> >



-- 
BR,
Hongtao

     prev parent reply	other threads:[~2023-12-06  2:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04  5:32 liuhongt
2023-12-04 14:10 ` Richard Biener
2023-12-06  2:16   ` Hongtao Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-bwA1z0qDSmLBFxL1rEmabkw3LCg8DCbo5jhsHfTe1ZxbQ@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=hongtao.liu@intel.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).