From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id DC00C3858024 for ; Sun, 14 Feb 2021 10:27:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DC00C3858024 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 11EA3R0T091904 for ; Sun, 14 Feb 2021 05:27:45 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 36q0vx1n62-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sun, 14 Feb 2021 05:27:45 -0500 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 11EAGN8k134541 for ; Sun, 14 Feb 2021 05:27:45 -0500 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 36q0vx1n55-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 14 Feb 2021 05:27:44 -0500 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 11EARgud023927; Sun, 14 Feb 2021 10:27:42 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma04ams.nl.ibm.com with ESMTP id 36p6d88x8a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 14 Feb 2021 10:27:42 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 11EART4x23396668 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 14 Feb 2021 10:27:29 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 117C7A4062; Sun, 14 Feb 2021 10:27:40 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4A343A405F; Sun, 14 Feb 2021 10:27:39 +0000 (GMT) Received: from localhost.localdomain (unknown [9.145.43.105]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Sun, 14 Feb 2021 10:27:39 +0000 (GMT) Date: Sun, 14 Feb 2021 11:27:38 +0100 From: Stefan Schulze Frielinghaus To: Richard Biener Cc: GCC Patches Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns Message-ID: References: <20210208124716.487709-1-stefansf@linux.ibm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="IyJl+y+E9uwyBvgX" Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369, 18.0.737 definitions=2021-02-13_02:2021-02-12, 2021-02-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 phishscore=0 impostorscore=0 mlxscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 malwarescore=0 bulkscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2102140080 X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Feb 2021 10:27:51 -0000 --IyJl+y+E9uwyBvgX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Feb 09, 2021 at 09:57:58AM +0100, Richard Biener wrote: > On Mon, Feb 8, 2021 at 3:11 PM Stefan Schulze Frielinghaus via > Gcc-patches wrote: > > > > This patch adds support for recognizing loops which mimic the behaviour > > of function rawmemchr, and replaces those with an internal function call > > in case a target provides them. In contrast to the original rawmemchr > > function, this patch also supports different instances where the memory > > pointed to and the pattern are interpreted as 8, 16, and 32 bit sized, > > respectively. > > > > This patch is not final and I'm looking for some feedback: > > > > Previously, only loops which mimic the behaviours of functions memset, > > memcpy, and memmove have been detected and replaced by corresponding > > function calls. One characteristic of those loops/partitions is that > > they don't have a reduction. In contrast, loops which mimic the > > behaviour of rawmemchr compute a result and therefore have a reduction. > > My current attempt is to ensure that the reduction statement is not used > > in any other partition and only in that case ignore the reduction and > > replace the loop by a function call. We then only need to replace the > > reduction variable of the loop which contained the loop result by the > > variable of the lhs of the internal function call. This should ensure > > that the transformation is correct independently of how partitions are > > fused/distributed in the end. Any thoughts about this? > > Currently we're forcing reduction partitions last (and force to have a single > one by fusing all partitions containing a reduction) because code-generation > does not properly update SSA form for the reduction results. ISTR that > might be just because we do not copy the LC PHI nodes or do not adjust > them when copying. That might not be an issue in case you replace the > partition with a call. I guess you can try to have a testcase with > two rawmemchr patterns and a regular loop part that has to be scheduled > inbetween both for correctness. Ah ok, in that case I updated my patch by removing the constraint that the reduction statement must be in precisely one partition. Please find attached the testcases I came up so far. Since transforming a loop into a rawmemchr function call is backend dependend, I planned to include those only in my backend patch. I wasn't able to come up with any testcase where a loop is distributed into multiple partitions and where one is classified as a rawmemchr builtin. The latter boils down to a for loop with an empty body only in which case I suspect that loop distribution shouldn't be done anyway. > > Furthermore, I simply added two new members (pattern, fn) to structure > > builtin_info which I consider rather hacky. For the long run I thought > > about to split up structure builtin_info into a union where each member > > is a structure for a particular builtin of a partition, i.e., something > > like this: > > > > union builtin_info > > { > > struct binfo_memset *memset; > > struct binfo_memcpymove *memcpymove; > > struct binfo_rawmemchr *rawmemchr; > > }; > > > > Such that a structure for one builtin does not get "polluted" by a > > different one. Any thoughts about this? > > Probably makes sense if the list of recognized patterns grow further. > > I see you use internal functions rather than builtin functions. I guess > that's OK. But you use new target hooks for expansion where I think > new optab entries similar to cmpmem would be more appropriate > where the distinction between 8, 16 or 32 bits can be encoded in > the modes. The optab implementation is really nice which allows me to use iterators in the backend which in the end saves me some boiler plate code compared to the previous implementation :) While using optabs now, I only require one additional member (pattern) in the builtin_info struct. Thus I didn't want to overcomplicate things and kept the single struct approach as is. For the long run, should I resubmit this patch once stage 1 opens or how would you propose to proceed? Thanks for your review so far! Cheers, Stefan > > Richard. > > > Cheers, > > Stefan > > --- > > gcc/internal-fn.c | 42 ++++++ > > gcc/internal-fn.def | 3 + > > gcc/target-insns.def | 3 + > > gcc/tree-loop-distribution.c | 257 ++++++++++++++++++++++++++++++----- > > 4 files changed, 272 insertions(+), 33 deletions(-) > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > > index dd7173126fb..9cd62544a1a 100644 > > --- a/gcc/internal-fn.c > > +++ b/gcc/internal-fn.c > > @@ -2917,6 +2917,48 @@ expand_VEC_CONVERT (internal_fn, gcall *) > > gcc_unreachable (); > > } > > > > +static void > > +expand_RAWMEMCHR8 (internal_fn, gcall *stmt) > > +{ > > + if (targetm.have_rawmemchr8 ()) > > + { > > + rtx result = expand_expr (gimple_call_lhs (stmt), NULL_RTX, VOIDmode, EXPAND_WRITE); > > + rtx start = expand_normal (gimple_call_arg (stmt, 0)); > > + rtx pattern = expand_normal (gimple_call_arg (stmt, 1)); > > + emit_insn (targetm.gen_rawmemchr8 (result, start, pattern)); > > + } > > + else > > + gcc_unreachable(); > > +} > > + > > +static void > > +expand_RAWMEMCHR16 (internal_fn, gcall *stmt) > > +{ > > + if (targetm.have_rawmemchr16 ()) > > + { > > + rtx result = expand_expr (gimple_call_lhs (stmt), NULL_RTX, VOIDmode, EXPAND_WRITE); > > + rtx start = expand_normal (gimple_call_arg (stmt, 0)); > > + rtx pattern = expand_normal (gimple_call_arg (stmt, 1)); > > + emit_insn (targetm.gen_rawmemchr16 (result, start, pattern)); > > + } > > + else > > + gcc_unreachable(); > > +} > > + > > +static void > > +expand_RAWMEMCHR32 (internal_fn, gcall *stmt) > > +{ > > + if (targetm.have_rawmemchr32 ()) > > + { > > + rtx result = expand_expr (gimple_call_lhs (stmt), NULL_RTX, VOIDmode, EXPAND_WRITE); > > + rtx start = expand_normal (gimple_call_arg (stmt, 0)); > > + rtx pattern = expand_normal (gimple_call_arg (stmt, 1)); > > + emit_insn (targetm.gen_rawmemchr32 (result, start, pattern)); > > + } > > + else > > + gcc_unreachable(); > > +} > > + > > /* Expand the IFN_UNIQUE function according to its first argument. */ > > > > static void > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index daeace7a34e..34247859704 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -348,6 +348,9 @@ DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > > DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) > > DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL) > > DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > > +DEF_INTERNAL_FN (RAWMEMCHR8, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL) > > +DEF_INTERNAL_FN (RAWMEMCHR16, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL) > > +DEF_INTERNAL_FN (RAWMEMCHR32, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL) > > > > /* An unduplicable, uncombinable function. Generally used to preserve > > a CFG property in the face of jump threading, tail merging or > > diff --git a/gcc/target-insns.def b/gcc/target-insns.def > > index 672c35698d7..9248554cbf3 100644 > > --- a/gcc/target-insns.def > > +++ b/gcc/target-insns.def > > @@ -106,3 +106,6 @@ DEF_TARGET_INSN (trap, (void)) > > DEF_TARGET_INSN (unique, (void)) > > DEF_TARGET_INSN (untyped_call, (rtx x0, rtx x1, rtx x2)) > > DEF_TARGET_INSN (untyped_return, (rtx x0, rtx x1)) > > +DEF_TARGET_INSN (rawmemchr8, (rtx x0, rtx x1, rtx x2)) > > +DEF_TARGET_INSN (rawmemchr16, (rtx x0, rtx x1, rtx x2)) > > +DEF_TARGET_INSN (rawmemchr32, (rtx x0, rtx x1, rtx x2)) > > diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c > > index 7ee19fc8677..f5b24bf53bc 100644 > > --- a/gcc/tree-loop-distribution.c > > +++ b/gcc/tree-loop-distribution.c > > @@ -218,7 +218,7 @@ enum partition_kind { > > be unnecessary and removed once distributed memset can be understood > > and analyzed in data reference analysis. See PR82604 for more. */ > > PKIND_PARTIAL_MEMSET, > > - PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE > > + PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE, PKIND_RAWMEMCHR > > }; > > > > /* Type of distributed loop. */ > > @@ -244,6 +244,8 @@ struct builtin_info > > is only used in memset builtin distribution for now. */ > > tree dst_base_base; > > unsigned HOST_WIDE_INT dst_base_offset; > > + tree pattern; > > + internal_fn fn; > > }; > > > > /* Partition for loop distribution. */ > > @@ -588,7 +590,8 @@ class loop_distribution > > bool > > classify_partition (loop_p loop, > > struct graph *rdg, partition *partition, > > - bitmap stmt_in_all_partitions); > > + bitmap stmt_in_all_partitions, > > + vec *partitions); > > > > > > /* Returns true when PARTITION1 and PARTITION2 access the same memory > > @@ -1232,6 +1235,67 @@ generate_memcpy_builtin (class loop *loop, partition *partition) > > } > > } > > > > +/* Generate a call to rawmemchr{8,16,32} for PARTITION in LOOP. */ > > + > > +static void > > +generate_rawmemchr_builtin (class loop *loop, partition *partition) > > +{ > > + gimple_stmt_iterator gsi; > > + tree mem, pattern; > > + struct builtin_info *builtin = partition->builtin; > > + gimple *fn_call; > > + > > + data_reference_p dr = builtin->src_dr; > > + tree base = builtin->src_base; > > + > > + tree result_old = TREE_OPERAND (DR_REF (dr), 0); > > + tree result_new = copy_ssa_name (result_old); > > + > > + /* The new statements will be placed before LOOP. */ > > + gsi = gsi_last_bb (loop_preheader_edge (loop)->src); > > + > > + mem = force_gimple_operand_gsi (&gsi, base, true, NULL_TREE, false, GSI_CONTINUE_LINKING); > > + pattern = builtin->pattern; > > + if (TREE_CODE (pattern) == INTEGER_CST) > > + pattern = fold_convert (integer_type_node, pattern); > > + fn_call = gimple_build_call_internal (builtin->fn, 2, mem, pattern); > > + gimple_call_set_lhs (fn_call, result_new); > > + gimple_set_location (fn_call, partition->loc); > > + gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING); > > + > > + imm_use_iterator iter; > > + gimple *stmt; > > + use_operand_p use_p; > > + FOR_EACH_IMM_USE_STMT (stmt, iter, result_old) > > + { > > + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) > > + SET_USE (use_p, result_new); > > + > > + update_stmt (stmt); > > + } > > + > > + fold_stmt (&gsi); > > + > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + switch (builtin->fn) > > + { > > + case IFN_RAWMEMCHR8: > > + fprintf (dump_file, "generated rawmemchr8\n"); > > + break; > > + > > + case IFN_RAWMEMCHR16: > > + fprintf (dump_file, "generated rawmemchr16\n"); > > + break; > > + > > + case IFN_RAWMEMCHR32: > > + fprintf (dump_file, "generated rawmemchr32\n"); > > + break; > > + > > + default: > > + gcc_unreachable (); > > + } > > +} > > + > > /* Remove and destroy the loop LOOP. */ > > > > static void > > @@ -1334,6 +1398,10 @@ generate_code_for_partition (class loop *loop, > > generate_memcpy_builtin (loop, partition); > > break; > > > > + case PKIND_RAWMEMCHR: > > + generate_rawmemchr_builtin (loop, partition); > > + break; > > + > > default: > > gcc_unreachable (); > > } > > @@ -1525,44 +1593,53 @@ find_single_drs (class loop *loop, struct graph *rdg, partition *partition, > > } > > } > > > > - if (!single_st) > > + if (!single_ld && !single_st) > > return false; > > > > - /* Bail out if this is a bitfield memory reference. */ > > - if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF > > - && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1))) > > - return false; > > - > > - /* Data reference must be executed exactly once per iteration of each > > - loop in the loop nest. We only need to check dominance information > > - against the outermost one in a perfect loop nest because a bb can't > > - dominate outermost loop's latch without dominating inner loop's. */ > > - basic_block bb_st = gimple_bb (DR_STMT (single_st)); > > - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st)) > > - return false; > > + basic_block bb_ld = NULL; > > + basic_block bb_st = NULL; > > > > if (single_ld) > > { > > - gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld); > > - /* Direct aggregate copy or via an SSA name temporary. */ > > - if (load != store > > - && gimple_assign_lhs (load) != gimple_assign_rhs1 (store)) > > - return false; > > - > > /* Bail out if this is a bitfield memory reference. */ > > if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF > > && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1))) > > return false; > > > > - /* Load and store must be in the same loop nest. */ > > - basic_block bb_ld = gimple_bb (DR_STMT (single_ld)); > > - if (bb_st->loop_father != bb_ld->loop_father) > > + /* Data reference must be executed exactly once per iteration of each > > + loop in the loop nest. We only need to check dominance information > > + against the outermost one in a perfect loop nest because a bb can't > > + dominate outermost loop's latch without dominating inner loop's. */ > > + bb_ld = gimple_bb (DR_STMT (single_ld)); > > + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld)) > > + return false; > > + } > > + > > + if (single_st) > > + { > > + /* Bail out if this is a bitfield memory reference. */ > > + if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF > > + && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1))) > > return false; > > > > /* Data reference must be executed exactly once per iteration. > > - Same as single_st, we only need to check against the outermost > > + Same as single_ld, we only need to check against the outermost > > loop. */ > > - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld)) > > + bb_st = gimple_bb (DR_STMT (single_st)); > > + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st)) > > + return false; > > + } > > + > > + if (single_ld && single_st) > > + { > > + gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld); > > + /* Direct aggregate copy or via an SSA name temporary. */ > > + if (load != store > > + && gimple_assign_lhs (load) != gimple_assign_rhs1 (store)) > > + return false; > > + > > + /* Load and store must be in the same loop nest. */ > > + if (bb_st->loop_father != bb_ld->loop_father) > > return false; > > > > edge e = single_exit (bb_st->loop_father); > > @@ -1681,6 +1758,84 @@ alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr, > > return builtin; > > } > > > > +/* Given data reference DR in loop nest LOOP, classify if it forms builtin > > + rawmemchr{8,16,32} call. */ > > + > > +static bool > > +classify_builtin_rawmemchr (loop_p loop, partition *partition, data_reference_p dr, tree loop_result) > > +{ > > + tree dr_ref = DR_REF (dr); > > + tree dr_access_base = build_fold_addr_expr (dr_ref); > > + tree dr_access_size = TYPE_SIZE_UNIT (TREE_TYPE (dr_ref)); > > + gimple *dr_stmt = DR_STMT (dr); > > + tree rhs1 = gimple_assign_rhs1 (dr_stmt); > > + affine_iv iv; > > + tree pattern; > > + > > + if (TREE_OPERAND (rhs1, 0) != loop_result) > > + return false; > > + > > + /* A limitation of the current implementation is that we only support > > + constant patterns. */ > > + gcond *cond_stmt = as_a (last_stmt (loop->header)); > > + pattern = gimple_cond_rhs (cond_stmt); > > + if (gimple_cond_code (cond_stmt) != NE_EXPR > > + || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (dr_stmt) > > + || TREE_CODE (pattern) != INTEGER_CST) > > + return false; > > + > > + /* Bail out if no affine induction variable with constant step can be > > + determined. */ > > + if (!simple_iv (loop, loop, dr_access_base, &iv, false)) > > + return false; > > + > > + /* Bail out if memory accesses are not consecutive. */ > > + if (!operand_equal_p (iv.step, dr_access_size, 0)) > > + return false; > > + > > + /* Bail out if direction of memory accesses is not growing. */ > > + if (get_range_pos_neg (iv.step) != 1) > > + return false; > > + > > + internal_fn fn; > > + switch (TREE_INT_CST_LOW (iv.step)) > > + { > > + case 1: > > + if (!targetm.have_rawmemchr8 ()) > > + return false; > > + fn = IFN_RAWMEMCHR8; > > + break; > > + > > + case 2: > > + if (!targetm.have_rawmemchr16 ()) > > + return false; > > + fn = IFN_RAWMEMCHR16; > > + break; > > + > > + case 4: > > + if (!targetm.have_rawmemchr32 ()) > > + return false; > > + fn = IFN_RAWMEMCHR32; > > + break; > > + > > + default: > > + return false; > > + } > > + > > + struct builtin_info *builtin; > > + builtin = alloc_builtin (NULL, NULL, NULL_TREE, NULL_TREE, NULL_TREE); > > + builtin->src_dr = dr; > > + builtin->src_base = iv.base; > > + builtin->pattern = pattern; > > + builtin->fn = fn; > > + > > + partition->loc = gimple_location (dr_stmt); > > + partition->builtin = builtin; > > + partition->kind = PKIND_RAWMEMCHR; > > + > > + return true; > > +} > > + > > /* Given data reference DR in loop nest LOOP, classify if it forms builtin > > memset call. */ > > > > @@ -1792,12 +1947,16 @@ loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg, > > bool > > loop_distribution::classify_partition (loop_p loop, > > struct graph *rdg, partition *partition, > > - bitmap stmt_in_all_partitions) > > + bitmap stmt_in_all_partitions, > > + vec *partitions) > > { > > bitmap_iterator bi; > > unsigned i; > > data_reference_p single_ld = NULL, single_st = NULL; > > bool volatiles_p = false, has_reduction = false; > > + unsigned nreductions = 0; > > + gimple *reduction_stmt = NULL; > > + bool has_interpar_reduction = false; > > > > EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi) > > { > > @@ -1821,6 +1980,19 @@ loop_distribution::classify_partition (loop_p loop, > > partition->reduction_p = true; > > else > > has_reduction = true; > > + > > + /* Determine whether the reduction statement occurs in other > > + partitions than the current one. */ > > + struct partition *piter; > > + for (unsigned j = 0; partitions->iterate (j, &piter); ++j) > > + { > > + if (piter == partition) > > + continue; > > + if (bitmap_bit_p (piter->stmts, i)) > > + has_interpar_reduction = true; > > + } > > + reduction_stmt = stmt; > > + ++nreductions; > > } > > } > > > > @@ -1840,6 +2012,30 @@ loop_distribution::classify_partition (loop_p loop, > > if (!find_single_drs (loop, rdg, partition, &single_st, &single_ld)) > > return has_reduction; > > > > + /* If we determined a single load and a single reduction statement which does > > + not occur in any other partition, then try to classify this partition as a > > + rawmemchr builtin. */ > > + if (single_ld != NULL > > + && single_st == NULL > > + && nreductions == 1 > > + && !has_interpar_reduction > > + && is_gimple_assign (reduction_stmt)) > > + { > > + /* If we classified the partition as a builtin, then ignoring the single > > + reduction is safe, since the reduction variable is not used in other > > + partitions. */ > > + tree reduction_var = gimple_assign_lhs (reduction_stmt); > > + return !classify_builtin_rawmemchr (loop, partition, single_ld, reduction_var); > > + } > > + > > + if (single_st == NULL) > > + return has_reduction; > > + > > + /* Don't distribute loop if niters is unknown. */ > > + tree niters = number_of_latch_executions (loop); > > + if (niters == NULL_TREE || niters == chrec_dont_know) > > + return has_reduction; > > + > > partition->loc = gimple_location (DR_STMT (single_st)); > > > > /* Classify the builtin kind. */ > > @@ -2979,7 +3175,7 @@ loop_distribution::distribute_loop (class loop *loop, vec stmts, > > FOR_EACH_VEC_ELT (partitions, i, partition) > > { > > reduction_in_all > > - |= classify_partition (loop, rdg, partition, stmt_in_all_partitions); > > + |= classify_partition (loop, rdg, partition, stmt_in_all_partitions, &partitions); > > any_builtin |= partition_builtin_p (partition); > > } > > > > @@ -3290,11 +3486,6 @@ loop_distribution::execute (function *fun) > > && !optimize_loop_for_speed_p (loop))) > > continue; > > > > - /* Don't distribute loop if niters is unknown. */ > > - tree niters = number_of_latch_executions (loop); > > - if (niters == NULL_TREE || niters == chrec_dont_know) > > - continue; > > - > > /* Get the perfect loop nest for distribution. */ > > loop = prepare_perfect_loop_nest (loop); > > for (; loop; loop = loop->inner) > > -- > > 2.23.0 > > --IyJl+y+E9uwyBvgX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="rawmemchr.diff" commit bf792239150 Author: Stefan Schulze Frielinghaus Date: Mon Feb 8 10:35:30 2021 +0100 ldist: Recognize rawmemchr loop patterns This patch adds support for recognizing loops which mimic the behaviour of function rawmemchr, and replaces those with an internal function call in case a target provides them. In contrast to the original rawmemchr function, this patch also supports different instances where the memory pointed to and the pattern are interpreted as 8, 16, and 32 bit sized, respectively. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index dd7173126fb..18e12b863c6 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2917,6 +2917,31 @@ expand_VEC_CONVERT (internal_fn, gcall *) gcc_unreachable (); } +void +expand_RAWMEMCHR (internal_fn, gcall *stmt) +{ + expand_operand ops[3]; + + tree lhs = gimple_call_lhs (stmt); + tree lhs_type = TREE_TYPE (lhs); + rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + create_output_operand (&ops[0], lhs_rtx, TYPE_MODE (lhs_type)); + + for (unsigned int i = 0; i < 2; ++i) + { + tree rhs = gimple_call_arg (stmt, i); + tree rhs_type = TREE_TYPE (rhs); + rtx rhs_rtx = expand_normal (rhs); + create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type)); + } + + insn_code icode = direct_optab_handler (rawmemchr_optab, ops[2].mode); + + expand_insn (icode, 3, ops); + if (!rtx_equal_p (lhs_rtx, ops[0].value)) + emit_move_insn (lhs_rtx, ops[0].value); +} + /* Expand the IFN_UNIQUE function according to its first argument. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index daeace7a34e..95c76795648 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -348,6 +348,7 @@ DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL) DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (RAWMEMCHR, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL) /* An unduplicable, uncombinable function. Generally used to preserve a CFG property in the face of jump threading, tail merging or diff --git a/gcc/optabs.def b/gcc/optabs.def index b192a9d070b..f7c69f914ce 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -267,6 +267,7 @@ OPTAB_D (cpymem_optab, "cpymem$a") OPTAB_D (movmem_optab, "movmem$a") OPTAB_D (setmem_optab, "setmem$a") OPTAB_D (strlen_optab, "strlen$a") +OPTAB_D (rawmemchr_optab, "rawmemchr$I$a") OPTAB_DC(fma_optab, "fma$a4", FMA) OPTAB_D (fms_optab, "fms$a4") diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 7ee19fc8677..09f200da61f 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -115,6 +115,10 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" #include "gimple-fold.h" +#include "rtl.h" +#include "memmodel.h" +#include "insn-codes.h" +#include "optabs.h" #define MAX_DATAREFS_NUM \ @@ -218,7 +222,7 @@ enum partition_kind { be unnecessary and removed once distributed memset can be understood and analyzed in data reference analysis. See PR82604 for more. */ PKIND_PARTIAL_MEMSET, - PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE + PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE, PKIND_RAWMEMCHR }; /* Type of distributed loop. */ @@ -244,6 +248,8 @@ struct builtin_info is only used in memset builtin distribution for now. */ tree dst_base_base; unsigned HOST_WIDE_INT dst_base_offset; + /* Pattern is used only in rawmemchr builtin distribution for now. */ + tree pattern; }; /* Partition for loop distribution. */ @@ -1232,6 +1238,66 @@ generate_memcpy_builtin (class loop *loop, partition *partition) } } +/* Generate a call to rawmemchr for PARTITION in LOOP. */ + +static void +generate_rawmemchr_builtin (class loop *loop, partition *partition) +{ + gimple_stmt_iterator gsi; + tree mem, pattern; + struct builtin_info *builtin = partition->builtin; + gimple *fn_call; + + data_reference_p dr = builtin->src_dr; + tree base = builtin->src_base; + + tree result_old = build_fold_addr_expr (DR_REF (dr)); + tree result_new = copy_ssa_name (result_old); + + /* The new statements will be placed before LOOP. */ + gsi = gsi_last_bb (loop_preheader_edge (loop)->src); + + mem = force_gimple_operand_gsi (&gsi, base, true, NULL_TREE, false, + GSI_CONTINUE_LINKING); + pattern = builtin->pattern; + fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, mem, pattern); + gimple_call_set_lhs (fn_call, result_new); + gimple_set_location (fn_call, partition->loc); + gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING); + + imm_use_iterator iter; + gimple *stmt; + use_operand_p use_p; + FOR_EACH_IMM_USE_STMT (stmt, iter, result_old) + { + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, result_new); + + update_stmt (stmt); + } + + fold_stmt (&gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + switch (TYPE_MODE (TREE_TYPE (pattern))) + { + case QImode: + fprintf (dump_file, "generated rawmemchrqi\n"); + break; + + case HImode: + fprintf (dump_file, "generated rawmemchrhi\n"); + break; + + case SImode: + fprintf (dump_file, "generated rawmemchrsi\n"); + break; + + default: + gcc_unreachable (); + } +} + /* Remove and destroy the loop LOOP. */ static void @@ -1334,6 +1400,10 @@ generate_code_for_partition (class loop *loop, generate_memcpy_builtin (loop, partition); break; + case PKIND_RAWMEMCHR: + generate_rawmemchr_builtin (loop, partition); + break; + default: gcc_unreachable (); } @@ -1525,44 +1595,53 @@ find_single_drs (class loop *loop, struct graph *rdg, partition *partition, } } - if (!single_st) + if (!single_ld && !single_st) return false; - /* Bail out if this is a bitfield memory reference. */ - if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF - && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1))) - return false; - - /* Data reference must be executed exactly once per iteration of each - loop in the loop nest. We only need to check dominance information - against the outermost one in a perfect loop nest because a bb can't - dominate outermost loop's latch without dominating inner loop's. */ - basic_block bb_st = gimple_bb (DR_STMT (single_st)); - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st)) - return false; + basic_block bb_ld = NULL; + basic_block bb_st = NULL; if (single_ld) { - gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld); - /* Direct aggregate copy or via an SSA name temporary. */ - if (load != store - && gimple_assign_lhs (load) != gimple_assign_rhs1 (store)) - return false; - /* Bail out if this is a bitfield memory reference. */ if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1))) return false; - /* Load and store must be in the same loop nest. */ - basic_block bb_ld = gimple_bb (DR_STMT (single_ld)); - if (bb_st->loop_father != bb_ld->loop_father) + /* Data reference must be executed exactly once per iteration of each + loop in the loop nest. We only need to check dominance information + against the outermost one in a perfect loop nest because a bb can't + dominate outermost loop's latch without dominating inner loop's. */ + bb_ld = gimple_bb (DR_STMT (single_ld)); + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld)) + return false; + } + + if (single_st) + { + /* Bail out if this is a bitfield memory reference. */ + if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1))) return false; /* Data reference must be executed exactly once per iteration. - Same as single_st, we only need to check against the outermost + Same as single_ld, we only need to check against the outermost loop. */ - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld)) + bb_st = gimple_bb (DR_STMT (single_st)); + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st)) + return false; + } + + if (single_ld && single_st) + { + gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld); + /* Direct aggregate copy or via an SSA name temporary. */ + if (load != store + && gimple_assign_lhs (load) != gimple_assign_rhs1 (store)) + return false; + + /* Load and store must be in the same loop nest. */ + if (bb_st->loop_father != bb_ld->loop_father) return false; edge e = single_exit (bb_st->loop_father); @@ -1681,6 +1760,68 @@ alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr, return builtin; } +/* Given data reference DR in loop nest LOOP, classify if it forms builtin + rawmemchr call. */ + +static bool +classify_builtin_rawmemchr (loop_p loop, partition *partition, + data_reference_p dr, tree loop_result) +{ + tree dr_ref = DR_REF (dr); + tree dr_access_base = build_fold_addr_expr (dr_ref); + tree dr_access_size = TYPE_SIZE_UNIT (TREE_TYPE (dr_ref)); + gimple *dr_stmt = DR_STMT (dr); + affine_iv iv; + tree pattern; + + if (dr_access_base != loop_result) + return false; + + /* A limitation of the current implementation is that we only support + constant patterns. */ + gcond *cond_stmt = as_a (last_stmt (loop->header)); + pattern = gimple_cond_rhs (cond_stmt); + if (gimple_cond_code (cond_stmt) != NE_EXPR + || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (dr_stmt) + || TREE_CODE (pattern) != INTEGER_CST) + return false; + + /* Bail out if no affine induction variable with constant step can be + determined. */ + if (!simple_iv (loop, loop, dr_access_base, &iv, false)) + return false; + + /* Bail out if memory accesses are not consecutive. */ + if (!operand_equal_p (iv.step, dr_access_size, 0)) + return false; + + /* Bail out if direction of memory accesses is not growing. */ + if (get_range_pos_neg (iv.step) != 1) + return false; + + /* Bail out if target does not provide rawmemchr for a certain mode. */ + machine_mode mode; + switch (TREE_INT_CST_LOW (iv.step)) + { + case 1: mode = QImode; break; + case 2: mode = HImode; break; + case 4: mode = SImode; break; + default: return false; + } + if (direct_optab_handler (rawmemchr_optab, mode) == CODE_FOR_nothing) + return false; + + struct builtin_info *builtin; + builtin = alloc_builtin (NULL, dr, NULL_TREE, iv.base, NULL_TREE); + builtin->pattern = pattern; + + partition->loc = gimple_location (dr_stmt); + partition->builtin = builtin; + partition->kind = PKIND_RAWMEMCHR; + + return true; +} + /* Given data reference DR in loop nest LOOP, classify if it forms builtin memset call. */ @@ -1798,6 +1939,8 @@ loop_distribution::classify_partition (loop_p loop, unsigned i; data_reference_p single_ld = NULL, single_st = NULL; bool volatiles_p = false, has_reduction = false; + unsigned nreductions = 0; + gimple *reduction_stmt = NULL; EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi) { @@ -1821,6 +1964,10 @@ loop_distribution::classify_partition (loop_p loop, partition->reduction_p = true; else has_reduction = true; + + /* Determine whether STMT is the only reduction statement or not. */ + reduction_stmt = stmt; + ++nreductions; } } @@ -1840,6 +1987,27 @@ loop_distribution::classify_partition (loop_p loop, if (!find_single_drs (loop, rdg, partition, &single_st, &single_ld)) return has_reduction; + /* If we determined a single load and a single reduction statement, then try + to classify this partition as a rawmemchr builtin. */ + if (single_ld != NULL + && single_st == NULL + && nreductions == 1 + && is_gimple_assign (reduction_stmt)) + { + /* If we classified the partition as a builtin, then ignoring the single + reduction is safe, since the whole partition is replaced by a call. */ + tree reduction_var = gimple_assign_lhs (reduction_stmt); + return !classify_builtin_rawmemchr (loop, partition, single_ld, reduction_var); + } + + if (single_st == NULL) + return has_reduction; + + /* Don't distribute loop if niters is unknown. */ + tree niters = number_of_latch_executions (loop); + if (niters == NULL_TREE || niters == chrec_dont_know) + return has_reduction; + partition->loc = gimple_location (DR_STMT (single_st)); /* Classify the builtin kind. */ @@ -3290,11 +3458,6 @@ loop_distribution::execute (function *fun) && !optimize_loop_for_speed_p (loop))) continue; - /* Don't distribute loop if niters is unknown. */ - tree niters = number_of_latch_executions (loop); - if (niters == NULL_TREE || niters == chrec_dont_know) - continue; - /* Get the perfect loop nest for distribution. */ loop = prepare_perfect_loop_nest (loop); for (; loop; loop = loop->inner) --IyJl+y+E9uwyBvgX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="rawmemchr-1.C" /* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details" } */ #include template T *rawmemchr(T *s) { while (*s != pattern) ++s; return s; } template uint8_t *rawmemchr(uint8_t *); template uint16_t *rawmemchr(uint16_t *); template uint32_t *rawmemchr(uint32_t *); template int8_t *rawmemchr(int8_t *); template int16_t *rawmemchr(int16_t *); template int32_t *rawmemchr(int32_t *); /* { dg-final { scan-tree-dump-times "generated rawmemchrqi" 2 "ldist" } } */ /* { dg-final { scan-tree-dump-times "generated rawmemchrhi" 2 "ldist" } } */ /* { dg-final { scan-tree-dump-times "generated rawmemchrsi" 2 "ldist" } } */ --IyJl+y+E9uwyBvgX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="rawmemchr-2.C" /* { dg-do run } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details -mzarch -march=z13" } */ #include #include #include #include template __attribute__((noinline,noclone)) T* rawmemchr (T *s) { while (*s != pattern) ++s; return s; } template void doit() { T *buf = new T[4096 * 2]; assert (buf != NULL); memset (buf, 0xa, 4096 * 2); // ensure q is 4096-byte aligned T *q = buf + (4096 - ((uintptr_t)buf & 4095)); T *p; // unaligned + block boundary + 1st load p = (T *) ((uintptr_t)q - 8); p[2] = pattern; assert ((rawmemchr (&p[0]) == &p[2])); p[2] = (T) 0xaaaaaaaa; // unaligned + block boundary + 2nd load p = (T *) ((uintptr_t)q - 8); p[6] = pattern; assert ((rawmemchr (&p[0]) == &p[6])); p[6] = (T) 0xaaaaaaaa; // unaligned + 1st load q[5] = pattern; assert ((rawmemchr(&q[2]) == &q[5])); q[5] = (T) 0xaaaaaaaa; // unaligned + 2nd load q[14] = pattern; assert ((rawmemchr(&q[2]) == &q[14])); q[14] = (T) 0xaaaaaaaa; // unaligned + 3rd load q[19] = pattern; assert ((rawmemchr(&q[2]) == &q[19])); q[19] = (T) 0xaaaaaaaa; // unaligned + 4th load q[25] = pattern; assert ((rawmemchr(&q[2]) == &q[25])); q[25] = (T) 0xaaaaaaaa; // aligned + 1st load q[5] = pattern; assert ((rawmemchr(&q[0]) == &q[5])); q[5] = (T) 0xaaaaaaaa; // aligned + 2nd load q[14] = pattern; assert ((rawmemchr(&q[0]) == &q[14])); q[14] = (T) 0xaaaaaaaa; // aligned + 3rd load q[19] = pattern; assert ((rawmemchr(&q[0]) == &q[19])); q[19] = (T) 0xaaaaaaaa; // aligned + 4th load q[25] = pattern; assert ((rawmemchr(&q[0]) == &q[25])); q[25] = (T) 0xaaaaaaaa; delete buf; } int main(void) { doit (); doit (); doit (); doit (); doit (); doit (); return 0; } /* { dg-final { scan-tree-dump-times "generated rawmemchrqi" 2 "ldist" } } */ /* { dg-final { scan-tree-dump-times "generated rawmemchrhi" 2 "ldist" } } */ /* { dg-final { scan-tree-dump-times "generated rawmemchrsi" 2 "ldist" } } */ --IyJl+y+E9uwyBvgX--