From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x235.google.com (mail-lj1-x235.google.com [IPv6:2a00:1450:4864:20::235]) by sourceware.org (Postfix) with ESMTPS id 6CB57385E827 for ; Wed, 29 May 2024 11:12:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6CB57385E827 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6CB57385E827 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::235 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716981181; cv=none; b=As7OktNt8wabwcGL4FRHV/jj1SPAa8gMrgAM7lcwYbuVU9r8Q3xOL7+2A9SCWjQqTqt0JFnMOA1c36PN9m1yuRHYgHGUbF3VfCx/vp08TAalh0Mfxdsox6GMIVmKLzS03Tz8lY5IfCKF1UeYjdLtQqOalPY0vMVWS7cYl9tTSr8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716981181; c=relaxed/simple; bh=bdl6+AUDZS6yKi8IzFLS0jRTx4XXpb+L5A1ELUwsYjY=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=WeErkMQeSFnMzF2mLx3/xIP7WfOhm5RBsnHTrRHeSbWZRN9JmsQwBrxhlDri9jIApsPiJDi8o3THqIJBQ5PQ2Fj1ndSwph8KaDW2j/WyHDw5ByDWPhjfuD4CYJT8XyyJbgNWMS7Qo1bJK2vIAYsJjRu9HxUge77E68as+rkkbP0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x235.google.com with SMTP id 38308e7fff4ca-2e72b8931caso19787481fa.0 for ; Wed, 29 May 2024 04:12:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716981175; x=1717585975; darn=gcc.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=v9tJ6ZJ13LwMNnVrGJhm2lnw+AwYvDca4uKF6JmemsE=; b=kTA5FJjBcU95tIRJjCNk7mTlZMl7FJpHUGyGF/eT1hR8N4CQ2VUnlTsGpCLznDmJYZ 7Yewgk2jJLNU+1/7abGAwCNPE/8WkU/2joXglMb4fOTqhvVPktvnr25JyDTWpy92ErDz aqaqGbYfouoDUJPWSQedRrGkYWvCouu5fpGgqSMDR5qjA1O/LUSovHzv364Fsgnm+TBK yQ5UmUK7So9hx1TTAXZaFZ2PjeAw1BXZzuBVQGox+bESqwq2nR0KAxzWR9bA5N7XN1t6 EOvw0xcvBo8cxvj8JTk5tkboofUCGsnymtPUKzDmPrq7i0XsN7/hyw4/3eccI4V0Smrd Z+ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716981175; x=1717585975; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=v9tJ6ZJ13LwMNnVrGJhm2lnw+AwYvDca4uKF6JmemsE=; b=esoKuQS8k+wxar/lwuKfvhAGsLc76e6OPoLvHZ4+ea3hwu12BNHCXRo1j7x4P96Kxr QmcI+TF5TSU0glbHj6czbK2wnMyEY+ITuWGPVkw0anshqVapppAuYYRBgzfWoh2bz+aS DO0uagOB/VqJrngBE2Zmv/ogIH6RvMzQoAgou+973AmX659Ftm5ib+GqW6jPTsFDSxQi v0r4zkVtPbIztvVg6iGk/iQDY+zzdGcyltvOjPADwgKsT2r2ugKRli95vTX0j9DtKy4F twpwPcUjoOyb4J8yrSYjFxBdEAz50SvpPLEsHELPUFnyqofhh/n+Mln8gbWwMyVpwxRg tfwA== X-Gm-Message-State: AOJu0YzYYt9uktAggzDE++81oQUg1b4wwxhu6ep9KwutNo/uYVPLgIhp Q3lWN7+snwAE5cm3gDWESVEPkxkqIH2fSsDRfXzEEOw36nDaMXVtd2VIz/z5DmskG+Ox6Ed0z3t Jyfb0zyxLHHq+p3NU5s4M1iLp/8sQ7QgV X-Google-Smtp-Source: AGHT+IE8KBmUDzXWVIOtcgv0aoaWjnorWLLeTzy5mgdnMABAYR2E0E6FRqucYzXLT6UGn6LpqZd/ho1gnEIAPej4Y9k= X-Received: by 2002:a2e:9102:0:b0:2ea:3216:7af8 with SMTP id 38308e7fff4ca-2ea32168890mr11624661fa.28.1716981174339; Wed, 29 May 2024 04:12:54 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mariam Arutunian Date: Wed, 29 May 2024 15:12:42 +0400 Message-ID: Subject: Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection To: Jeff Law Cc: GCC Patches Content-Type: multipart/alternative; boundary="000000000000fb927c061995d492" X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000fb927c061995d492 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, May 28, 2024 at 8:20=E2=80=AFAM Jeff Law wr= ote: > > > On 5/24/24 2:42 AM, Mariam Arutunian wrote: > > This patch adds a new compiler pass aimed at identifying naive CRC > > implementations, > > characterized by the presence of a loop calculating a CRC (polynomial > > long division). > > Upon detection of a potential CRC, the pass prints an informational > message. > > > > Performs CRC optimization if optimization level is >=3D 2, > > besides optimizations for size and if fno_gimple_crc_optimization given. > > > > This pass is added for the detection and optimization of naive CRC > > implementations, > > improving the efficiency of CRC-related computations. > > > > This patch includes only initial fast checks for filtering out non-CRCs, > > detected possible CRCs verification and optimization parts will be > > provided in subsequent patches. > > > > gcc/ > > > > * Makefile.in (OBJS): Add gimple-crc-optimization.o. > > * common.opt (fgimple-crc-optimization): New option. > > * doc/invoke.texi (-fgimple-crc-optimization): Add documentation. > > * gimple-crc-optimization.cc: New file. > > * gimple.cc (set_phi_stmts_not_visited): New function. > > (set_gimple_stmts_not_visited): Likewise. > > (set_bbs_stmts_not_visited): Likewise. > > * gimple.h (set_gimple_stmts_not_visited): New extern function > > declaration. > > (set_phi_stmts_not_visited): New extern function declaration. > > (set_bbs_stmts_not_visited): New extern function declaration. > > * opts.cc (default_options_table): Add OPT_fgimple_crc_optimizatio= n. > > (enable_fdo_optimizations): Enable gimple-crc-optimization. > > * passes.def (pass_crc_optimization): Add new pass. > > * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar. > > * tree-pass.h (make_pass_crc_optimization): New extern function > > declaration. > > > > Signed-off-by: Mariam Arutunian > > > > > diff --git a/gcc/common.opt b/gcc/common.opt > > index 2c078fdd1f8..53f7ab255dd 100644 > > --- a/gcc/common.opt > > +++ b/gcc/common.opt > > @@ -1757,6 +1757,12 @@ Common Var(flag_gcse_after_reload) Optimization > > Perform global common subexpression elimination after register > allocation has > > finished. > > > > +fgimple-crc-optimization > > +Common Var(flag_gimple_crc_optimization) Optimization > > +Detect loops calculating CRC and replace with faster implementation. > > +If the target supports carry-less-multiplication instruction, generate > CRC using > > +it; otherwise generate table-based CRC. > This probably needs a minor update since your code can also generate a > CRC instruction on x86 when it detects a CRC loop with the right > polynomial. > Thanks. I forgot to modify this part. > > > > > > + > > + /* Returns true if there is only two conditional blocks in the loop > > + (one may be for the CRC bit check and the other for the loop > counter). > > + This may filter out some real CRCs, where more than one condition > > + is checked for the CRC calculation. */ > > + static bool loop_contains_two_conditional_bb (basic_block *loop_bbs, > > + unsigned loop_num_nodes); > It's been a while, so if we're rehashing something we already worked > through, I apologize. > > IIRC we looked at the problem of canonicalizing the loop into a form > where we didn't necessarily have conditional blocks, instead we had > branchless sequences for the conditional xor and dealing with the high > bit in the crc. My recollection was that the coremark CRC loop would > always canonicalize, but that in general we still saw multiple CRC > implementations that did not canonicalize and thus we still needed the > more complex matching. Correct? > > The loop in CoreMark is not fully canonicalized in that form, as there are still branches present for the conditional XOR operation. I checked that using the -O2 and -O3 flags. > > > + > > + /* Checks whether found XOR_STMT is for calculating CRC. > > + The function CRC_FUN calculates CRC only if there is a shift > operation > > + in the crc loop. */ > > + bool xor_calculates_crc (function *crc_fun, basic_block *loop_bbs, > > + const gimple *xor_stmt); > So the second sentence in the comment doesn't really seem to relate to > this function. It also seems like we potentially have two xors in a CRC > loop. One which xors one bit of the data with one bit of the crc. The > other is the conditional xor. Which does this refer to? I'm guesing > the latter since the former can likely hoist out of the loop given a > sufficiently smart loop optimizer. > > I'll try to provide a better description of this function. Here, XOR refers to the conditional XOR. > > > > + > > + /* Checks that the variable used in the condition COND is the assumed > CRC > > + (or depends on the assumed CRC). > > + Also sets data member m_phi_for_data if it isn't set and exists. > */ > > + bool is_crc_checked (gcond *cond, basic_block *loop_bbs); > > + > > + /* Returns true if condition COND checks MSB/LSB bit is 1. > > + Otherwise, return false. */ > > + static bool cond_true_is_checked_for_bit_one (const gcond *cond); > > + > > + /* Returns opposite block of the XOR_BB from PRED_BB's dest blocks. > */ > > + static basic_block get_xor_bb_opposite (basic_block pred_bb, > > + basic_block xor_bb); > > + > > + /* Checks whether the pair of xor's shift exists in the opposite > > + basic block (OPPOSITE_BB). > > + If there is a shift and xor in the same block, > > + then in the opposite block must be another shift. */ > > + bool exists_shift_for_opp_xor_shift (basic_block opposite_bb); > > + > > + /* Goes down by def-use chain (within the CRC loop) and returns the > statement > > + where variable (dependent on xor-ed variable) is shifted with 1. > > + Between xor and shift operations only phi statements are allowed. > > + Otherwise, returns nullptr. */ > > + gimple *find_shift_after_xor (tree xored_crc); > > + > > + /* Returns the statement which does shift 1 operation. > > + If there is no such statement, returns nullptr. > > + STMTS - are the statements within the loop before xor operation on > > + which possible CRC variable depends. */ > > + gimple *find_shift_before_xor (const auto_vec &stmts); > > + > > + /* Returns true if ASSIGN_STMT does shift with 1. > > + Otherwise, returns false. */ > > + bool can_be_crc_shift (gimple *assign_stmt); > > + > > + /* Returns true if the operation done in ASSIGN_STMT can occur during > CRC > > + calculation. Otherwise, returns false. */ > > + bool can_not_be_crc_stmt (gimple *assign_stmt); > > + > > + /* Returns true if the statement with STMT_CODE may occur in CRC > calculation. > > + Otherwise, returns false. */ > > + static bool is_acceptable_stmt_code (const tree_code &stmt_code); > > + > > + /* Prints extracted details of CRC calculation. */ > > + void dump_crc_information (); > > + > > + public: > > + unsigned int execute (function *fun); > > +}; > > + > > > > + > > +/* Goes down by def-use chain (within the CRC loop) and returns the > statement > > + where variable (dependent on xor-ed variable) is shifted with 1. > > + Between xor and shift operations only phi statements are allowed. > > + Otherwise, returns nullptr. */ > Follow def-use chains of XORED_CRC and return the statement where > XORED_CRC is shifted by one bit position. Only PHI statements are > allowed between XORED_CRC and the shift in the def-use chain. > > If no such statement is found, return NULL. > > Seems like a better function comment here. > > Yes). Thanks. > > > + > > +gimple * > > +crc_optimization::find_shift_after_xor (tree xored_crc) > > +{ > > + imm_use_iterator imm_iter; > > + use_operand_p use_p; > > + > > + if (TREE_CODE (xored_crc) !=3D SSA_NAME) > > + return nullptr; > If we always expect XORED_CRC to be an SSA_NAME, we might be able to use > gcc_assert TREE_CODE (XORED_CRC) =3D=3D SSA_NAME); > I'm not sure that it always has to be an SSA_NAME. The difference is the assert means the property should always hold. > Whereas the IF as written just says that if we don't have an SSA_NAME, > give up trying to optimize. > Yes, I assumed that if it's not an SSA_NAME, we should just skip the optimization. > > +> +/* Returns opposite block of the XOR_BB from PRED_BB's dest > blocks. */ > > + > > +basic_block > > +crc_optimization::get_xor_bb_opposite (basic_block pred_bb, basic_block > xor_bb) > > +{ > > + if (EDGE_COUNT (pred_bb->succs) !=3D 2) > > + return nullptr; > > + > > + if (EDGE_SUCC (pred_bb, 0)->dest !=3D xor_bb) > > + return EDGE_SUCC (pred_bb, 0)->dest; > > + return EDGE_SUCC (pred_bb, 1)->dest; > > +} > Consider using a gcc_assert to ensure that one of the two successors is > xor_bb. > > You might also want to reject edge->flags & EDGE_COMPLEX is nonzero.> + > > Ok. > > > +/* Returns true if condition COND checks MSB/LSB bit is 1. > > + Otherwise, return false. */ > > + > > +bool > > +crc_optimization::cond_true_is_checked_for_bit_one (const gcond *cond) > > +{ > > + if (!cond) > > + return false; > > + > > + tree rhs =3D gimple_cond_rhs (cond); > > + enum tree_code code =3D gimple_cond_code (cond); > > + > > + /* If the condition is something =3D=3D 1 -> return true. */ > > + if (integer_onep (rhs) && code =3D=3D EQ_EXPR) > > + return true; > Generally we prefer to have the cheapest test first. So checking the > code against EQ_EXPR first is marginally better. > > Ok. > > + > > + /* If the condition is something !=3D 0 or something < 0 -> return > true. */ > > + if (integer_zerop (rhs) && (code =3D=3D NE_EXPR || code =3D=3D LT_EX= PR)) > > + return true; > We generally discourage having && and || conditions on the same line. > I'd rewrite as > if ((code =3D=3D NE_EXPR || CODE =3D=3D LT_EXPR) > && integer_zerop (rhs)) > > Do you need to check for EQ_EXPR with rhs having the value 0? What > about NE_EXPR with the value 1? > > I think there is no need to check those cases. If the condition isn't one of the cases checked above, then it will be the cases you mentioned (the opposite form). With this function, I check whether the condition verifies if the MSB of the lhs value is one. If yes, the xor should occur on the true branch; otherwise, it should occur on the false branch. > +> +/* Checks that the condition is for checking CRC. > > + Returns true if xor is done under the condition of MSB/LSB being 1, > and > > + the condition's variable and xor-ed variable depend on the same > variable. > > + Otherwise, returns false. > > + XOR_BB is the basic block, where the xor operation is done. > > + PRED_BB is the predecessor basic block of the XOR_BB, it is assumed > that > > + the last stmt of PRED_BB checks the condition under which xor is > done. */ > > + > > +bool > > +crc_optimization::crc_cond (basic_block pred_bb, basic_block xor_bb, > > + basic_block *loop_bbs) > > +{ > > + /* Check whether PRED_BB contains condition. We will consider only > those > > + cases when xor is done immediately under the condition. */ > > + gcond *cond =3D safe_dyn_cast (*gsi_last_bb (pred_bb)); > Hmm. I think this works, but it's a bit obscure as it depends on > operator* returning a pointer to the statement. I didn't even know that > worked. > > Something like this might be more obvious: > > gsi_stmt (gsi_last_bb (pred_bb)) should give you the last statement. > Then you can see if it's a gimple cond. > > Thanks. > > > + > > +/* Returns true if the statement with STMT_CODE may occur in CRC > calculation. > > + Otherwise, returns false. */ > > + > > +bool > > +crc_optimization::is_acceptable_stmt_code (const tree_code &stmt_code) > > +{ > > + return stmt_code =3D=3D BIT_IOR_EXPR > > + || stmt_code =3D=3D BIT_AND_EXPR > > + || stmt_code =3D=3D BIT_XOR_EXPR > > + || stmt_code =3D=3D MINUS_EXPR > > + || stmt_code =3D=3D PLUS_EXPR > > + || stmt_code =3D=3D RSHIFT_EXPR > > + || stmt_code =3D=3D LSHIFT_EXPR > > + || TREE_CODE_CLASS (stmt_code) =3D=3D tcc_unary; > Go ahead and put parenthesis around all those tests :-) > > Ok) > > > + > > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields. > > + Returns false if there are more than two (as in CRC calculation only > CRC's > > + and data's phi may exist) or no phi statements in STMTS (at least > there must > > + be CRC's phi). > > + Otherwise, returns true. */ > > + > > +bool > > +crc_optimization::set_crc_and_data_phi (auto_vec &stmts) > > +{ > > + for (auto stmt_it =3D stmts.begin (); stmt_it !=3D stmts.end (); > stmt_it++) > > + { > > + if (is_a (*stmt_it) && bb_loop_header_p (gimple_bb > (*stmt_it))) > > + { > > + if (!m_phi_for_crc) > > + m_phi_for_crc =3D as_a (*stmt_it); > > + else if (!m_phi_for_data) > > + m_phi_for_data =3D as_a (*stmt_it); > > + else > > + { > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, "Xor-ed variable depends on more than > 2 " > > + "phis.\n"); > > + return false; > > + } > > + } > > + } > > + return m_phi_for_crc; > Hmm. For a given PHI, how do we know if it's for the data item or the > crc item, or something else (like a loop counter) entirely? > I trace the def-use chain upwards from the XOR statement to determine which PHI node corresponds to CRC and data. Since we assume the loop calculates CRC, I expect only variables representing data and CRC to participate in these operations. In the implementations I support, the loop counter is used only for the iteration. Any misidentification of CRC and data would occur only if the loop doesn't calculate CRC, in which case next checks would fail, leading the algorithm to identify it as not CRC. Here, the PHI nodes for CRC and data might be mixed in places. I just assume that the first found PHI is CRC, second data. I correctly determine them later with the *swap_crc_and_data_if_needed* function. > > > > diff --git a/gcc/gimple.cc b/gcc/gimple.cc > > index a9f968cb038..76518d8897d 100644 > > --- a/gcc/gimple.cc > > +++ b/gcc/gimple.cc > > @@ -3426,6 +3426,44 @@ gimple_or_expr_nonartificial_location (gimple > *stmt, tree expr) > > return expansion_point_location_if_in_system_header (loc); > > } > > > > +/* Set GIMPLE_PHI statements of the BB not visited. */ > > + > > +void > > +set_phi_stmts_not_visited (basic_block bb) > [ ... ] > Something tells me we probably need to convert some code to use these > new API points. Not a requirement for this to go forward, but a useful > cleanup. > Sure, I don't recall finding such cases before, but I'll take a closer look later. > > Overall it looks pretty good. I don't see anything that looks horribly > wrong. My biggest worry is that set_crc_and_data_phi routine. But > there's other small things noted above that we need to clean up. > > So let's get the questions answered and some cleanups done, then repost. > > Ok. Thanks, Mariam > Thanks, > Jeff > --000000000000fb927c061995d492--