public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns
Date: Thu, 25 Feb 2021 18:01:33 +0100	[thread overview]
Message-ID: <YDfX7V4iLtHjs1um@localhost.localdomain> (raw)
In-Reply-To: <YCj7GniMRQ5DkS37@localhost.localdomain>

Ping

On Sun, Feb 14, 2021 at 11:27:40AM +0100, Stefan Schulze Frielinghaus wrote:
> On Tue, Feb 09, 2021 at 09:57:58AM +0100, Richard Biener wrote:
> > On Mon, Feb 8, 2021 at 3:11 PM Stefan Schulze Frielinghaus via
> > Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > This patch adds support for recognizing loops which mimic the behaviour
> > > of function rawmemchr, and replaces those with an internal function call
> > > in case a target provides them.  In contrast to the original rawmemchr
> > > function, this patch also supports different instances where the memory
> > > pointed to and the pattern are interpreted as 8, 16, and 32 bit sized,
> > > respectively.
> > >
> > > This patch is not final and I'm looking for some feedback:
> > >
> > > Previously, only loops which mimic the behaviours of functions memset,
> > > memcpy, and memmove have been detected and replaced by corresponding
> > > function calls.  One characteristic of those loops/partitions is that
> > > they don't have a reduction.  In contrast, loops which mimic the
> > > behaviour of rawmemchr compute a result and therefore have a reduction.
> > > My current attempt is to ensure that the reduction statement is not used
> > > in any other partition and only in that case ignore the reduction and
> > > replace the loop by a function call.  We then only need to replace the
> > > reduction variable of the loop which contained the loop result by the
> > > variable of the lhs of the internal function call.  This should ensure
> > > that the transformation is correct independently of how partitions are
> > > fused/distributed in the end.  Any thoughts about this?
> > 
> > Currently we're forcing reduction partitions last (and force to have a single
> > one by fusing all partitions containing a reduction) because code-generation
> > does not properly update SSA form for the reduction results.  ISTR that
> > might be just because we do not copy the LC PHI nodes or do not adjust
> > them when copying.  That might not be an issue in case you replace the
> > partition with a call.  I guess you can try to have a testcase with
> > two rawmemchr patterns and a regular loop part that has to be scheduled
> > inbetween both for correctness.
> 
> Ah ok, in that case I updated my patch by removing the constraint that
> the reduction statement must be in precisely one partition.  Please find
> attached the testcases I came up so far.  Since transforming a loop into
> a rawmemchr function call is backend dependend, I planned to include
> those only in my backend patch.  I wasn't able to come up with any
> testcase where a loop is distributed into multiple partitions and where
> one is classified as a rawmemchr builtin.  The latter boils down to a
> for loop with an empty body only in which case I suspect that loop
> distribution shouldn't be done anyway.
> 
> > > Furthermore, I simply added two new members (pattern, fn) to structure
> > > builtin_info which I consider rather hacky.  For the long run I thought
> > > about to split up structure builtin_info into a union where each member
> > > is a structure for a particular builtin of a partition, i.e., something
> > > like this:
> > >
> > > union builtin_info
> > > {
> > >   struct binfo_memset *memset;
> > >   struct binfo_memcpymove *memcpymove;
> > >   struct binfo_rawmemchr *rawmemchr;
> > > };
> > >
> > > Such that a structure for one builtin does not get "polluted" by a
> > > different one.  Any thoughts about this?
> > 
> > Probably makes sense if the list of recognized patterns grow further.
> > 
> > I see you use internal functions rather than builtin functions.  I guess
> > that's OK.  But you use new target hooks for expansion where I think
> > new optab entries similar to cmpmem would be more appropriate
> > where the distinction between 8, 16 or 32 bits can be encoded in
> > the modes.
> 
> The optab implementation is really nice which allows me to use iterators
> in the backend which in the end saves me some boiler plate code compared
> to the previous implementation :)
> 
> While using optabs now, I only require one additional member (pattern)
> in the builtin_info struct.  Thus I didn't want to overcomplicate things
> and kept the single struct approach as is.
> 
> For the long run, should I resubmit this patch once stage 1 opens or how
> would you propose to proceed?
> 
> Thanks for your review so far!
> 
> Cheers,
> Stefan
> 
> > 
> > Richard.
> > 
> > > Cheers,
> > > Stefan
> > > ---
> > >  gcc/internal-fn.c            |  42 ++++++
> > >  gcc/internal-fn.def          |   3 +
> > >  gcc/target-insns.def         |   3 +
> > >  gcc/tree-loop-distribution.c | 257 ++++++++++++++++++++++++++++++-----
> > >  4 files changed, 272 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > index dd7173126fb..9cd62544a1a 100644
> > > --- a/gcc/internal-fn.c
> > > +++ b/gcc/internal-fn.c
> > > @@ -2917,6 +2917,48 @@ expand_VEC_CONVERT (internal_fn, gcall *)
> > >    gcc_unreachable ();
> > >  }
> > >
> > > +static void
> > > +expand_RAWMEMCHR8 (internal_fn, gcall *stmt)
> > > +{
> > > +  if (targetm.have_rawmemchr8 ())
> > > +    {
> > > +      rtx result = expand_expr (gimple_call_lhs (stmt), NULL_RTX, VOIDmode, EXPAND_WRITE);
> > > +      rtx start = expand_normal (gimple_call_arg (stmt, 0));
> > > +      rtx pattern = expand_normal (gimple_call_arg (stmt, 1));
> > > +      emit_insn (targetm.gen_rawmemchr8 (result, start, pattern));
> > > +    }
> > > +  else
> > > +    gcc_unreachable();
> > > +}
> > > +
> > > +static void
> > > +expand_RAWMEMCHR16 (internal_fn, gcall *stmt)
> > > +{
> > > +  if (targetm.have_rawmemchr16 ())
> > > +    {
> > > +      rtx result = expand_expr (gimple_call_lhs (stmt), NULL_RTX, VOIDmode, EXPAND_WRITE);
> > > +      rtx start = expand_normal (gimple_call_arg (stmt, 0));
> > > +      rtx pattern = expand_normal (gimple_call_arg (stmt, 1));
> > > +      emit_insn (targetm.gen_rawmemchr16 (result, start, pattern));
> > > +    }
> > > +  else
> > > +    gcc_unreachable();
> > > +}
> > > +
> > > +static void
> > > +expand_RAWMEMCHR32 (internal_fn, gcall *stmt)
> > > +{
> > > +  if (targetm.have_rawmemchr32 ())
> > > +    {
> > > +      rtx result = expand_expr (gimple_call_lhs (stmt), NULL_RTX, VOIDmode, EXPAND_WRITE);
> > > +      rtx start = expand_normal (gimple_call_arg (stmt, 0));
> > > +      rtx pattern = expand_normal (gimple_call_arg (stmt, 1));
> > > +      emit_insn (targetm.gen_rawmemchr32 (result, start, pattern));
> > > +    }
> > > +  else
> > > +    gcc_unreachable();
> > > +}
> > > +
> > >  /* Expand the IFN_UNIQUE function according to its first argument.  */
> > >
> > >  static void
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index daeace7a34e..34247859704 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -348,6 +348,9 @@ DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> > >  DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> > >  DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> > >  DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> > > +DEF_INTERNAL_FN (RAWMEMCHR8, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL)
> > > +DEF_INTERNAL_FN (RAWMEMCHR16, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL)
> > > +DEF_INTERNAL_FN (RAWMEMCHR32, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL)
> > >
> > >  /* An unduplicable, uncombinable function.  Generally used to preserve
> > >     a CFG property in the face of jump threading, tail merging or
> > > diff --git a/gcc/target-insns.def b/gcc/target-insns.def
> > > index 672c35698d7..9248554cbf3 100644
> > > --- a/gcc/target-insns.def
> > > +++ b/gcc/target-insns.def
> > > @@ -106,3 +106,6 @@ DEF_TARGET_INSN (trap, (void))
> > >  DEF_TARGET_INSN (unique, (void))
> > >  DEF_TARGET_INSN (untyped_call, (rtx x0, rtx x1, rtx x2))
> > >  DEF_TARGET_INSN (untyped_return, (rtx x0, rtx x1))
> > > +DEF_TARGET_INSN (rawmemchr8, (rtx x0, rtx x1, rtx x2))
> > > +DEF_TARGET_INSN (rawmemchr16, (rtx x0, rtx x1, rtx x2))
> > > +DEF_TARGET_INSN (rawmemchr32, (rtx x0, rtx x1, rtx x2))
> > > diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> > > index 7ee19fc8677..f5b24bf53bc 100644
> > > --- a/gcc/tree-loop-distribution.c
> > > +++ b/gcc/tree-loop-distribution.c
> > > @@ -218,7 +218,7 @@ enum partition_kind {
> > >         be unnecessary and removed once distributed memset can be understood
> > >         and analyzed in data reference analysis.  See PR82604 for more.  */
> > >      PKIND_PARTIAL_MEMSET,
> > > -    PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
> > > +    PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE, PKIND_RAWMEMCHR
> > >  };
> > >
> > >  /* Type of distributed loop.  */
> > > @@ -244,6 +244,8 @@ struct builtin_info
> > >       is only used in memset builtin distribution for now.  */
> > >    tree dst_base_base;
> > >    unsigned HOST_WIDE_INT dst_base_offset;
> > > +  tree pattern;
> > > +  internal_fn fn;
> > >  };
> > >
> > >  /* Partition for loop distribution.  */
> > > @@ -588,7 +590,8 @@ class loop_distribution
> > >    bool
> > >    classify_partition (loop_p loop,
> > >                       struct graph *rdg, partition *partition,
> > > -                     bitmap stmt_in_all_partitions);
> > > +                     bitmap stmt_in_all_partitions,
> > > +                     vec<struct partition *> *partitions);
> > >
> > >
> > >    /* Returns true when PARTITION1 and PARTITION2 access the same memory
> > > @@ -1232,6 +1235,67 @@ generate_memcpy_builtin (class loop *loop, partition *partition)
> > >      }
> > >  }
> > >
> > > +/* Generate a call to rawmemchr{8,16,32} for PARTITION in LOOP.  */
> > > +
> > > +static void
> > > +generate_rawmemchr_builtin (class loop *loop, partition *partition)
> > > +{
> > > +  gimple_stmt_iterator gsi;
> > > +  tree mem, pattern;
> > > +  struct builtin_info *builtin = partition->builtin;
> > > +  gimple *fn_call;
> > > +
> > > +  data_reference_p dr = builtin->src_dr;
> > > +  tree base = builtin->src_base;
> > > +
> > > +  tree result_old = TREE_OPERAND (DR_REF (dr), 0);
> > > +  tree result_new = copy_ssa_name (result_old);
> > > +
> > > +  /* The new statements will be placed before LOOP.  */
> > > +  gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
> > > +
> > > +  mem = force_gimple_operand_gsi (&gsi, base, true, NULL_TREE, false, GSI_CONTINUE_LINKING);
> > > +  pattern = builtin->pattern;
> > > +  if (TREE_CODE (pattern) == INTEGER_CST)
> > > +    pattern = fold_convert (integer_type_node, pattern);
> > > +  fn_call = gimple_build_call_internal (builtin->fn, 2, mem, pattern);
> > > +  gimple_call_set_lhs (fn_call, result_new);
> > > +  gimple_set_location (fn_call, partition->loc);
> > > +  gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> > > +
> > > +  imm_use_iterator iter;
> > > +  gimple *stmt;
> > > +  use_operand_p use_p;
> > > +  FOR_EACH_IMM_USE_STMT (stmt, iter, result_old)
> > > +    {
> > > +      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> > > +       SET_USE (use_p, result_new);
> > > +
> > > +      update_stmt (stmt);
> > > +    }
> > > +
> > > +  fold_stmt (&gsi);
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +    switch (builtin->fn)
> > > +      {
> > > +      case IFN_RAWMEMCHR8:
> > > +       fprintf (dump_file, "generated rawmemchr8\n");
> > > +       break;
> > > +
> > > +      case IFN_RAWMEMCHR16:
> > > +       fprintf (dump_file, "generated rawmemchr16\n");
> > > +       break;
> > > +
> > > +      case IFN_RAWMEMCHR32:
> > > +       fprintf (dump_file, "generated rawmemchr32\n");
> > > +       break;
> > > +
> > > +      default:
> > > +       gcc_unreachable ();
> > > +      }
> > > +}
> > > +
> > >  /* Remove and destroy the loop LOOP.  */
> > >
> > >  static void
> > > @@ -1334,6 +1398,10 @@ generate_code_for_partition (class loop *loop,
> > >        generate_memcpy_builtin (loop, partition);
> > >        break;
> > >
> > > +    case PKIND_RAWMEMCHR:
> > > +      generate_rawmemchr_builtin (loop, partition);
> > > +      break;
> > > +
> > >      default:
> > >        gcc_unreachable ();
> > >      }
> > > @@ -1525,44 +1593,53 @@ find_single_drs (class loop *loop, struct graph *rdg, partition *partition,
> > >         }
> > >      }
> > >
> > > -  if (!single_st)
> > > +  if (!single_ld && !single_st)
> > >      return false;
> > >
> > > -  /* Bail out if this is a bitfield memory reference.  */
> > > -  if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
> > > -      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
> > > -    return false;
> > > -
> > > -  /* Data reference must be executed exactly once per iteration of each
> > > -     loop in the loop nest.  We only need to check dominance information
> > > -     against the outermost one in a perfect loop nest because a bb can't
> > > -     dominate outermost loop's latch without dominating inner loop's.  */
> > > -  basic_block bb_st = gimple_bb (DR_STMT (single_st));
> > > -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
> > > -    return false;
> > > +  basic_block bb_ld = NULL;
> > > +  basic_block bb_st = NULL;
> > >
> > >    if (single_ld)
> > >      {
> > > -      gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
> > > -      /* Direct aggregate copy or via an SSA name temporary.  */
> > > -      if (load != store
> > > -         && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
> > > -       return false;
> > > -
> > >        /* Bail out if this is a bitfield memory reference.  */
> > >        if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF
> > >           && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1)))
> > >         return false;
> > >
> > > -      /* Load and store must be in the same loop nest.  */
> > > -      basic_block bb_ld = gimple_bb (DR_STMT (single_ld));
> > > -      if (bb_st->loop_father != bb_ld->loop_father)
> > > +      /* Data reference must be executed exactly once per iteration of each
> > > +        loop in the loop nest.  We only need to check dominance information
> > > +        against the outermost one in a perfect loop nest because a bb can't
> > > +        dominate outermost loop's latch without dominating inner loop's.  */
> > > +      bb_ld = gimple_bb (DR_STMT (single_ld));
> > > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
> > > +       return false;
> > > +    }
> > > +
> > > +  if (single_st)
> > > +    {
> > > +      /* Bail out if this is a bitfield memory reference.  */
> > > +      if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
> > > +         && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
> > >         return false;
> > >
> > >        /* Data reference must be executed exactly once per iteration.
> > > -        Same as single_st, we only need to check against the outermost
> > > +        Same as single_ld, we only need to check against the outermost
> > >          loop.  */
> > > -      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
> > > +      bb_st = gimple_bb (DR_STMT (single_st));
> > > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
> > > +       return false;
> > > +    }
> > > +
> > > +  if (single_ld && single_st)
> > > +    {
> > > +      gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
> > > +      /* Direct aggregate copy or via an SSA name temporary.  */
> > > +      if (load != store
> > > +         && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
> > > +       return false;
> > > +
> > > +      /* Load and store must be in the same loop nest.  */
> > > +      if (bb_st->loop_father != bb_ld->loop_father)
> > >         return false;
> > >
> > >        edge e = single_exit (bb_st->loop_father);
> > > @@ -1681,6 +1758,84 @@ alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr,
> > >    return builtin;
> > >  }
> > >
> > > +/* Given data reference DR in loop nest LOOP, classify if it forms builtin
> > > +   rawmemchr{8,16,32} call.  */
> > > +
> > > +static bool
> > > +classify_builtin_rawmemchr (loop_p loop, partition *partition, data_reference_p dr, tree loop_result)
> > > +{
> > > +  tree dr_ref = DR_REF (dr);
> > > +  tree dr_access_base = build_fold_addr_expr (dr_ref);
> > > +  tree dr_access_size = TYPE_SIZE_UNIT (TREE_TYPE (dr_ref));
> > > +  gimple *dr_stmt = DR_STMT (dr);
> > > +  tree rhs1 = gimple_assign_rhs1 (dr_stmt);
> > > +  affine_iv iv;
> > > +  tree pattern;
> > > +
> > > +  if (TREE_OPERAND (rhs1, 0) != loop_result)
> > > +    return false;
> > > +
> > > +  /* A limitation of the current implementation is that we only support
> > > +     constant patterns.  */
> > > +  gcond *cond_stmt = as_a <gcond *> (last_stmt (loop->header));
> > > +  pattern = gimple_cond_rhs (cond_stmt);
> > > +  if (gimple_cond_code (cond_stmt) != NE_EXPR
> > > +      || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (dr_stmt)
> > > +      || TREE_CODE (pattern) != INTEGER_CST)
> > > +    return false;
> > > +
> > > +  /* Bail out if no affine induction variable with constant step can be
> > > +     determined.  */
> > > +  if (!simple_iv (loop, loop, dr_access_base, &iv, false))
> > > +    return false;
> > > +
> > > +  /* Bail out if memory accesses are not consecutive.  */
> > > +  if (!operand_equal_p (iv.step, dr_access_size, 0))
> > > +    return false;
> > > +
> > > +  /* Bail out if direction of memory accesses is not growing.  */
> > > +  if (get_range_pos_neg (iv.step) != 1)
> > > +    return false;
> > > +
> > > +  internal_fn fn;
> > > +  switch (TREE_INT_CST_LOW (iv.step))
> > > +    {
> > > +    case 1:
> > > +      if (!targetm.have_rawmemchr8 ())
> > > +       return false;
> > > +      fn = IFN_RAWMEMCHR8;
> > > +      break;
> > > +
> > > +    case 2:
> > > +      if (!targetm.have_rawmemchr16 ())
> > > +       return false;
> > > +      fn = IFN_RAWMEMCHR16;
> > > +      break;
> > > +
> > > +    case 4:
> > > +      if (!targetm.have_rawmemchr32 ())
> > > +       return false;
> > > +      fn = IFN_RAWMEMCHR32;
> > > +      break;
> > > +
> > > +    default:
> > > +      return false;
> > > +    }
> > > +
> > > +  struct builtin_info *builtin;
> > > +  builtin = alloc_builtin (NULL, NULL, NULL_TREE, NULL_TREE, NULL_TREE);
> > > +  builtin->src_dr = dr;
> > > +  builtin->src_base = iv.base;
> > > +  builtin->pattern = pattern;
> > > +  builtin->fn = fn;
> > > +
> > > +  partition->loc = gimple_location (dr_stmt);
> > > +  partition->builtin = builtin;
> > > +  partition->kind = PKIND_RAWMEMCHR;
> > > +
> > > +  return true;
> > > +}
> > > +
> > >  /* Given data reference DR in loop nest LOOP, classify if it forms builtin
> > >     memset call.  */
> > >
> > > @@ -1792,12 +1947,16 @@ loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg,
> > >  bool
> > >  loop_distribution::classify_partition (loop_p loop,
> > >                                        struct graph *rdg, partition *partition,
> > > -                                      bitmap stmt_in_all_partitions)
> > > +                                      bitmap stmt_in_all_partitions,
> > > +                                      vec<struct partition *> *partitions)
> > >  {
> > >    bitmap_iterator bi;
> > >    unsigned i;
> > >    data_reference_p single_ld = NULL, single_st = NULL;
> > >    bool volatiles_p = false, has_reduction = false;
> > > +  unsigned nreductions = 0;
> > > +  gimple *reduction_stmt = NULL;
> > > +  bool has_interpar_reduction = false;
> > >
> > >    EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi)
> > >      {
> > > @@ -1821,6 +1980,19 @@ loop_distribution::classify_partition (loop_p loop,
> > >             partition->reduction_p = true;
> > >           else
> > >             has_reduction = true;
> > > +
> > > +         /* Determine whether the reduction statement occurs in other
> > > +            partitions than the current one.  */
> > > +         struct partition *piter;
> > > +         for (unsigned j = 0; partitions->iterate (j, &piter); ++j)
> > > +           {
> > > +             if (piter == partition)
> > > +               continue;
> > > +             if (bitmap_bit_p (piter->stmts, i))
> > > +               has_interpar_reduction = true;
> > > +           }
> > > +         reduction_stmt = stmt;
> > > +         ++nreductions;
> > >         }
> > >      }
> > >
> > > @@ -1840,6 +2012,30 @@ loop_distribution::classify_partition (loop_p loop,
> > >    if (!find_single_drs (loop, rdg, partition, &single_st, &single_ld))
> > >      return has_reduction;
> > >
> > > +  /* If we determined a single load and a single reduction statement which does
> > > +     not occur in any other partition, then try to classify this partition as a
> > > +     rawmemchr builtin.  */
> > > +  if (single_ld != NULL
> > > +      && single_st == NULL
> > > +      && nreductions == 1
> > > +      && !has_interpar_reduction
> > > +      && is_gimple_assign (reduction_stmt))
> > > +    {
> > > +      /* If we classified the partition as a builtin, then ignoring the single
> > > +        reduction is safe, since the reduction variable is not used in other
> > > +        partitions.  */
> > > +      tree reduction_var = gimple_assign_lhs (reduction_stmt);
> > > +      return !classify_builtin_rawmemchr (loop, partition, single_ld, reduction_var);
> > > +    }
> > > +
> > > +  if (single_st == NULL)
> > > +    return has_reduction;
> > > +
> > > +  /* Don't distribute loop if niters is unknown.  */
> > > +  tree niters = number_of_latch_executions (loop);
> > > +  if (niters == NULL_TREE || niters == chrec_dont_know)
> > > +    return has_reduction;
> > > +
> > >    partition->loc = gimple_location (DR_STMT (single_st));
> > >
> > >    /* Classify the builtin kind.  */
> > > @@ -2979,7 +3175,7 @@ loop_distribution::distribute_loop (class loop *loop, vec<gimple *> stmts,
> > >    FOR_EACH_VEC_ELT (partitions, i, partition)
> > >      {
> > >        reduction_in_all
> > > -       |= classify_partition (loop, rdg, partition, stmt_in_all_partitions);
> > > +       |= classify_partition (loop, rdg, partition, stmt_in_all_partitions, &partitions);
> > >        any_builtin |= partition_builtin_p (partition);
> > >      }
> > >
> > > @@ -3290,11 +3486,6 @@ loop_distribution::execute (function *fun)
> > >               && !optimize_loop_for_speed_p (loop)))
> > >         continue;
> > >
> > > -      /* Don't distribute loop if niters is unknown.  */
> > > -      tree niters = number_of_latch_executions (loop);
> > > -      if (niters == NULL_TREE || niters == chrec_dont_know)
> > > -       continue;
> > > -
> > >        /* Get the perfect loop nest for distribution.  */
> > >        loop = prepare_perfect_loop_nest (loop);
> > >        for (; loop; loop = loop->inner)
> > > --
> > > 2.23.0
> > >

> commit bf792239150
> Author: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
> Date:   Mon Feb 8 10:35:30 2021 +0100
> 
>     ldist: Recognize rawmemchr loop patterns
>     
>     This patch adds support for recognizing loops which mimic the behaviour
>     of function rawmemchr, and replaces those with an internal function call
>     in case a target provides them.  In contrast to the original rawmemchr
>     function, this patch also supports different instances where the memory
>     pointed to and the pattern are interpreted as 8, 16, and 32 bit sized,
>     respectively.
> 
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index dd7173126fb..18e12b863c6 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -2917,6 +2917,31 @@ expand_VEC_CONVERT (internal_fn, gcall *)
>    gcc_unreachable ();
>  }
>  
> +void
> +expand_RAWMEMCHR (internal_fn, gcall *stmt)
> +{
> +  expand_operand ops[3];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  create_output_operand (&ops[0], lhs_rtx, TYPE_MODE (lhs_type));
> +
> +  for (unsigned int i = 0; i < 2; ++i)
> +    {
> +      tree rhs = gimple_call_arg (stmt, i);
> +      tree rhs_type = TREE_TYPE (rhs);
> +      rtx rhs_rtx = expand_normal (rhs);
> +      create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type));
> +    }
> +
> +  insn_code icode = direct_optab_handler (rawmemchr_optab, ops[2].mode);
> +
> +  expand_insn (icode, 3, ops);
> +  if (!rtx_equal_p (lhs_rtx, ops[0].value))
> +    emit_move_insn (lhs_rtx, ops[0].value);
> +}
> +
>  /* Expand the IFN_UNIQUE function according to its first argument.  */
>  
>  static void
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index daeace7a34e..95c76795648 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -348,6 +348,7 @@ DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (RAWMEMCHR, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL)
>  
>  /* An unduplicable, uncombinable function.  Generally used to preserve
>     a CFG property in the face of jump threading, tail merging or
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index b192a9d070b..f7c69f914ce 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -267,6 +267,7 @@ OPTAB_D (cpymem_optab, "cpymem$a")
>  OPTAB_D (movmem_optab, "movmem$a")
>  OPTAB_D (setmem_optab, "setmem$a")
>  OPTAB_D (strlen_optab, "strlen$a")
> +OPTAB_D (rawmemchr_optab, "rawmemchr$I$a")
>  
>  OPTAB_DC(fma_optab, "fma$a4", FMA)
>  OPTAB_D (fms_optab, "fms$a4")
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 7ee19fc8677..09f200da61f 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -115,6 +115,10 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-vectorizer.h"
>  #include "tree-eh.h"
>  #include "gimple-fold.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
>  
>  
>  #define MAX_DATAREFS_NUM \
> @@ -218,7 +222,7 @@ enum partition_kind {
>         be unnecessary and removed once distributed memset can be understood
>         and analyzed in data reference analysis.  See PR82604 for more.  */
>      PKIND_PARTIAL_MEMSET,
> -    PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
> +    PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE, PKIND_RAWMEMCHR
>  };
>  
>  /* Type of distributed loop.  */
> @@ -244,6 +248,8 @@ struct builtin_info
>       is only used in memset builtin distribution for now.  */
>    tree dst_base_base;
>    unsigned HOST_WIDE_INT dst_base_offset;
> +  /* Pattern is used only in rawmemchr builtin distribution for now.  */
> +  tree pattern;
>  };
>  
>  /* Partition for loop distribution.  */
> @@ -1232,6 +1238,66 @@ generate_memcpy_builtin (class loop *loop, partition *partition)
>      }
>  }
>  
> +/* Generate a call to rawmemchr for PARTITION in LOOP.  */
> +
> +static void
> +generate_rawmemchr_builtin (class loop *loop, partition *partition)
> +{
> +  gimple_stmt_iterator gsi;
> +  tree mem, pattern;
> +  struct builtin_info *builtin = partition->builtin;
> +  gimple *fn_call;
> +
> +  data_reference_p dr = builtin->src_dr;
> +  tree base = builtin->src_base;
> +
> +  tree result_old = build_fold_addr_expr (DR_REF (dr));
> +  tree result_new = copy_ssa_name (result_old);
> +
> +  /* The new statements will be placed before LOOP.  */
> +  gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
> +
> +  mem = force_gimple_operand_gsi (&gsi, base, true, NULL_TREE, false,
> +				  GSI_CONTINUE_LINKING);
> +  pattern = builtin->pattern;
> +  fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, mem, pattern);
> +  gimple_call_set_lhs (fn_call, result_new);
> +  gimple_set_location (fn_call, partition->loc);
> +  gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> +
> +  imm_use_iterator iter;
> +  gimple *stmt;
> +  use_operand_p use_p;
> +  FOR_EACH_IMM_USE_STMT (stmt, iter, result_old)
> +    {
> +      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> +	SET_USE (use_p, result_new);
> +
> +      update_stmt (stmt);
> +    }
> +
> +  fold_stmt (&gsi);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    switch (TYPE_MODE (TREE_TYPE (pattern)))
> +      {
> +      case QImode:
> +	fprintf (dump_file, "generated rawmemchrqi\n");
> +	break;
> +
> +      case HImode:
> +	fprintf (dump_file, "generated rawmemchrhi\n");
> +	break;
> +
> +      case SImode:
> +	fprintf (dump_file, "generated rawmemchrsi\n");
> +	break;
> +
> +      default:
> +	gcc_unreachable ();
> +      }
> +}
> +
>  /* Remove and destroy the loop LOOP.  */
>  
>  static void
> @@ -1334,6 +1400,10 @@ generate_code_for_partition (class loop *loop,
>        generate_memcpy_builtin (loop, partition);
>        break;
>  
> +    case PKIND_RAWMEMCHR:
> +      generate_rawmemchr_builtin (loop, partition);
> +      break;
> +
>      default:
>        gcc_unreachable ();
>      }
> @@ -1525,44 +1595,53 @@ find_single_drs (class loop *loop, struct graph *rdg, partition *partition,
>  	}
>      }
>  
> -  if (!single_st)
> +  if (!single_ld && !single_st)
>      return false;
>  
> -  /* Bail out if this is a bitfield memory reference.  */
> -  if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
> -      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
> -    return false;
> -
> -  /* Data reference must be executed exactly once per iteration of each
> -     loop in the loop nest.  We only need to check dominance information
> -     against the outermost one in a perfect loop nest because a bb can't
> -     dominate outermost loop's latch without dominating inner loop's.  */
> -  basic_block bb_st = gimple_bb (DR_STMT (single_st));
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
> -    return false;
> +  basic_block bb_ld = NULL;
> +  basic_block bb_st = NULL;
>  
>    if (single_ld)
>      {
> -      gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
> -      /* Direct aggregate copy or via an SSA name temporary.  */
> -      if (load != store
> -	  && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
> -	return false;
> -
>        /* Bail out if this is a bitfield memory reference.  */
>        if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF
>  	  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1)))
>  	return false;
>  
> -      /* Load and store must be in the same loop nest.  */
> -      basic_block bb_ld = gimple_bb (DR_STMT (single_ld));
> -      if (bb_st->loop_father != bb_ld->loop_father)
> +      /* Data reference must be executed exactly once per iteration of each
> +	 loop in the loop nest.  We only need to check dominance information
> +	 against the outermost one in a perfect loop nest because a bb can't
> +	 dominate outermost loop's latch without dominating inner loop's.  */
> +      bb_ld = gimple_bb (DR_STMT (single_ld));
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
> +	return false;
> +    }
> +
> +  if (single_st)
> +    {
> +      /* Bail out if this is a bitfield memory reference.  */
> +      if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
> +	  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
>  	return false;
>  
>        /* Data reference must be executed exactly once per iteration.
> -	 Same as single_st, we only need to check against the outermost
> +	 Same as single_ld, we only need to check against the outermost
>  	 loop.  */
> -      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
> +      bb_st = gimple_bb (DR_STMT (single_st));
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
> +	return false;
> +    }
> +
> +  if (single_ld && single_st)
> +    {
> +      gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
> +      /* Direct aggregate copy or via an SSA name temporary.  */
> +      if (load != store
> +	  && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
> +	return false;
> +
> +      /* Load and store must be in the same loop nest.  */
> +      if (bb_st->loop_father != bb_ld->loop_father)
>  	return false;
>  
>        edge e = single_exit (bb_st->loop_father);
> @@ -1681,6 +1760,68 @@ alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr,
>    return builtin;
>  }
>  
> +/* Given data reference DR in loop nest LOOP, classify if it forms builtin
> +   rawmemchr call.  */
> +
> +static bool
> +classify_builtin_rawmemchr (loop_p loop, partition *partition,
> +			    data_reference_p dr, tree loop_result)
> +{
> +  tree dr_ref = DR_REF (dr);
> +  tree dr_access_base = build_fold_addr_expr (dr_ref);
> +  tree dr_access_size = TYPE_SIZE_UNIT (TREE_TYPE (dr_ref));
> +  gimple *dr_stmt = DR_STMT (dr);
> +  affine_iv iv;
> +  tree pattern;
> +
> +  if (dr_access_base != loop_result)
> +    return false;
> +
> +  /* A limitation of the current implementation is that we only support
> +     constant patterns.  */
> +  gcond *cond_stmt = as_a <gcond *> (last_stmt (loop->header));
> +  pattern = gimple_cond_rhs (cond_stmt);
> +  if (gimple_cond_code (cond_stmt) != NE_EXPR
> +      || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (dr_stmt)
> +      || TREE_CODE (pattern) != INTEGER_CST)
> +    return false;
> +
> +  /* Bail out if no affine induction variable with constant step can be
> +     determined.  */
> +  if (!simple_iv (loop, loop, dr_access_base, &iv, false))
> +    return false;
> +
> +  /* Bail out if memory accesses are not consecutive.  */
> +  if (!operand_equal_p (iv.step, dr_access_size, 0))
> +    return false;
> +
> +  /* Bail out if direction of memory accesses is not growing.  */
> +  if (get_range_pos_neg (iv.step) != 1)
> +    return false;
> +
> +  /* Bail out if target does not provide rawmemchr for a certain mode.  */
> +  machine_mode mode;
> +  switch (TREE_INT_CST_LOW (iv.step))
> +    {
> +    case 1: mode = QImode; break;
> +    case 2: mode = HImode; break;
> +    case 4: mode = SImode; break;
> +    default: return false;
> +    }
> +  if (direct_optab_handler (rawmemchr_optab, mode) == CODE_FOR_nothing)
> +    return false;
> +
> +  struct builtin_info *builtin;
> +  builtin = alloc_builtin (NULL, dr, NULL_TREE, iv.base, NULL_TREE);
> +  builtin->pattern = pattern;
> +
> +  partition->loc = gimple_location (dr_stmt);
> +  partition->builtin = builtin;
> +  partition->kind = PKIND_RAWMEMCHR;
> +
> +  return true;
> +}
> +
>  /* Given data reference DR in loop nest LOOP, classify if it forms builtin
>     memset call.  */
>  
> @@ -1798,6 +1939,8 @@ loop_distribution::classify_partition (loop_p loop,
>    unsigned i;
>    data_reference_p single_ld = NULL, single_st = NULL;
>    bool volatiles_p = false, has_reduction = false;
> +  unsigned nreductions = 0;
> +  gimple *reduction_stmt = NULL;
>  
>    EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi)
>      {
> @@ -1821,6 +1964,10 @@ loop_distribution::classify_partition (loop_p loop,
>  	    partition->reduction_p = true;
>  	  else
>  	    has_reduction = true;
> +
> +	  /* Determine whether STMT is the only reduction statement or not.  */
> +	  reduction_stmt = stmt;
> +	  ++nreductions;
>  	}
>      }
>  
> @@ -1840,6 +1987,27 @@ loop_distribution::classify_partition (loop_p loop,
>    if (!find_single_drs (loop, rdg, partition, &single_st, &single_ld))
>      return has_reduction;
>  
> +  /* If we determined a single load and a single reduction statement, then try
> +     to classify this partition as a rawmemchr builtin.  */
> +  if (single_ld != NULL
> +      && single_st == NULL
> +      && nreductions == 1
> +      && is_gimple_assign (reduction_stmt))
> +    {
> +      /* If we classified the partition as a builtin, then ignoring the single
> +	 reduction is safe, since the whole partition is replaced by a call.  */
> +      tree reduction_var = gimple_assign_lhs (reduction_stmt);
> +      return !classify_builtin_rawmemchr (loop, partition, single_ld, reduction_var);
> +    }
> +
> +  if (single_st == NULL)
> +    return has_reduction;
> +
> +  /* Don't distribute loop if niters is unknown.  */
> +  tree niters = number_of_latch_executions (loop);
> +  if (niters == NULL_TREE || niters == chrec_dont_know)
> +    return has_reduction;
> +
>    partition->loc = gimple_location (DR_STMT (single_st));
>  
>    /* Classify the builtin kind.  */
> @@ -3290,11 +3458,6 @@ loop_distribution::execute (function *fun)
>  	      && !optimize_loop_for_speed_p (loop)))
>  	continue;
>  
> -      /* Don't distribute loop if niters is unknown.  */
> -      tree niters = number_of_latch_executions (loop);
> -      if (niters == NULL_TREE || niters == chrec_dont_know)
> -	continue;
> -
>        /* Get the perfect loop nest for distribution.  */
>        loop = prepare_perfect_loop_nest (loop);
>        for (; loop; loop = loop->inner)

> /* { dg-do compile } */
> /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details" } */
> 
> #include <cstdint>
> 
> template<typename T, T pattern>
> T *rawmemchr(T *s)
> {
>   while (*s != pattern)
>     ++s;
>   return s;
> }
> 
> template uint8_t *rawmemchr<uint8_t, 0xab>(uint8_t *);
> template uint16_t *rawmemchr<uint16_t, 0xabcd>(uint16_t *);
> template uint32_t *rawmemchr<uint32_t, 0xabcdef15>(uint32_t *);
> 
> template int8_t *rawmemchr<int8_t, (int8_t)0xab>(int8_t *);
> template int16_t *rawmemchr<int16_t, (int16_t)0xabcd>(int16_t *);
> template int32_t *rawmemchr<int32_t, (int32_t)0xabcdef15>(int32_t *);
> 
> /* { dg-final { scan-tree-dump-times "generated rawmemchrqi" 2 "ldist" } } */
> /* { dg-final { scan-tree-dump-times "generated rawmemchrhi" 2 "ldist" } } */
> /* { dg-final { scan-tree-dump-times "generated rawmemchrsi" 2 "ldist" } } */

> /* { dg-do run } */
> /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details -mzarch -march=z13" } */
> 
> #include <cstring>
> #include <cassert>
> #include <cstdint>
> #include <iostream>
> 
> template <typename T, T pattern>
> __attribute__((noinline,noclone))
> T* rawmemchr (T *s)
> {
>   while (*s != pattern)
>     ++s;
>   return s;
> }
> 
> template <typename T, T pattern>
> void doit()
> {
>   T *buf = new T[4096 * 2];
>   assert (buf != NULL);
>   memset (buf, 0xa, 4096 * 2);
>   // ensure q is 4096-byte aligned
>   T *q = buf + (4096 - ((uintptr_t)buf & 4095));
>   T *p;
> 
>   // unaligned + block boundary + 1st load
>   p = (T *) ((uintptr_t)q - 8);
>   p[2] = pattern;
>   assert ((rawmemchr<T, pattern> (&p[0]) == &p[2]));
>   p[2] = (T) 0xaaaaaaaa;
> 
>   // unaligned + block boundary + 2nd load
>   p = (T *) ((uintptr_t)q - 8);
>   p[6] = pattern;
>   assert ((rawmemchr<T, pattern> (&p[0]) == &p[6]));
>   p[6] = (T) 0xaaaaaaaa;
> 
> 
> 
>   // unaligned + 1st load
>   q[5] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[2]) == &q[5]));
>   q[5] = (T) 0xaaaaaaaa;
> 
>   // unaligned + 2nd load
>   q[14] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[2]) == &q[14]));
>   q[14] = (T) 0xaaaaaaaa;
> 
>   // unaligned + 3rd load
>   q[19] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[2]) == &q[19]));
>   q[19] = (T) 0xaaaaaaaa;
> 
>   // unaligned + 4th load
>   q[25] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[2]) == &q[25]));
>   q[25] = (T) 0xaaaaaaaa;
> 
> 
> 
>   // aligned + 1st load
>   q[5] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[0]) == &q[5]));
>   q[5] = (T) 0xaaaaaaaa;
> 
>   // aligned + 2nd load
>   q[14] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[0]) == &q[14]));
>   q[14] = (T) 0xaaaaaaaa;
> 
>   // aligned + 3rd load
>   q[19] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[0]) == &q[19]));
>   q[19] = (T) 0xaaaaaaaa;
> 
>   // aligned + 4th load
>   q[25] = pattern;
>   assert ((rawmemchr<T, pattern>(&q[0]) == &q[25]));
>   q[25] = (T) 0xaaaaaaaa;
> 
>   delete buf;
> }
> 
> int main(void)
> {
>   doit<uint8_t, 0xde> ();
>   doit<int8_t, (int8_t)0xde> ();
>   doit<uint16_t, 0xdead> ();
>   doit<int16_t, (int16_t)0xdead> ();
>   doit<uint32_t, 0xdeadbeef> ();
>   doit<int32_t, (int32_t)0xdeadbeef> ();
>   return 0;
> }
> 
> /* { dg-final { scan-tree-dump-times "generated rawmemchrqi" 2 "ldist" } } */
> /* { dg-final { scan-tree-dump-times "generated rawmemchrhi" 2 "ldist" } } */
> /* { dg-final { scan-tree-dump-times "generated rawmemchrsi" 2 "ldist" } } */


  reply	other threads:[~2021-02-25 17:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 12:47 Stefan Schulze Frielinghaus
2021-02-09  8:57 ` Richard Biener
2021-02-14 10:27   ` Stefan Schulze Frielinghaus
2021-02-25 17:01     ` Stefan Schulze Frielinghaus [this message]
2021-02-25 23:49       ` Jeff Law
2021-03-02 12:29     ` Richard Biener
2021-03-03 17:17       ` Stefan Schulze Frielinghaus
2021-03-16 17:13         ` Stefan Schulze Frielinghaus
2021-04-08  8:23           ` Stefan Schulze Frielinghaus
2021-05-04 17:25             ` Stefan Schulze Frielinghaus
2021-05-05  9:36           ` Richard Biener
2021-05-05 10:03             ` Richard Biener
2021-05-07 12:32             ` Stefan Schulze Frielinghaus
2021-05-20  9:24               ` Richard Biener
2021-05-20 18:37                 ` Stefan Schulze Frielinghaus
2021-06-14 17:26                   ` Stefan Schulze Frielinghaus
2021-06-16 14:22                     ` Richard Biener
2021-06-25 10:23                       ` Stefan Schulze Frielinghaus
2021-08-06 14:02                         ` Stefan Schulze Frielinghaus
2021-08-20 10:35                         ` Richard Biener
2021-09-03  8:00                           ` Stefan Schulze Frielinghaus
2021-09-06  9:56                             ` Richard Biener
2021-09-13 14:53                               ` Stefan Schulze Frielinghaus
2021-09-17  8:08                                 ` Richard Biener
2021-10-11 16:02                                   ` Stefan Schulze Frielinghaus
2022-01-31 13:16                                   ` Tom de Vries
2022-01-31 15:00                                     ` Richard Biener
2022-01-31 16:26                                       ` [PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns Tom de Vries
2022-02-01  7:04                                         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YDfX7V4iLtHjs1um@localhost.localdomain \
    --to=stefansf@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).