From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa33.google.com (mail-vk1-xa33.google.com [IPv6:2607:f8b0:4864:20::a33]) by sourceware.org (Postfix) with ESMTPS id 2AA7D3858C78 for ; Tue, 28 May 2024 04:20:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2AA7D3858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2AA7D3858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::a33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716870038; cv=none; b=xTPipid1qYPhQLiui+/2oSRmLf6N2lQe0nj/1S/IDnq8dX3Q4HvYzIm69sx4Wx0EFscuCW2nK6deZ35PRooOv9sx60klRVrPes90/Ou09b/myLMlpyYiXtlgW1r7ij7rkvEc7FzwTg6nId0CAdqPrnlIfYG3XjFbZU7+bXl1lFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716870038; c=relaxed/simple; bh=+SkrVaeeRm3NQ628qYwQk6vlZ0pAqL1jTKTiBWoT/9w=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=Bo0JmlLbOFH9crNka4jIozN2o1uJ10zIcS7Ua7ohVRJWzODaP1t62Fu2xHKSm8pqjZm91mP+CbGvEInAwSp8F8xNLtX7WRBxJ1OwvhUZ/t8xLmgptG6O2fqjGVv12f02suTZclF0zxU9htHWi4eX6SIq4XTf/mn8Y/XO67NLF88= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-vk1-xa33.google.com with SMTP id 71dfb90a1353d-4e1c721c040so150749e0c.3 for ; Mon, 27 May 2024 21:20:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716870035; x=1717474835; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=vLVgvlmv7wObZ6k7lcZlbzq/WHeyXy8K/0AyoOXn4/g=; b=PurdJ6/d1T+UuySWfgcgJGQ0md91DflPpZZr77z53pmvMqwmUFinweFeZeU+ddjXqx 4QK+0s9QFYPlgviWHFlZrJ3sTuUrpl8I1F4evqhtg98AGIVKgMpPSedtf+A4zbaiRhVs 7OJY0/gYELwqJ4b6Damzesnmkh8obmxAMkTWQcsitJ+yzTauSi2aFrc7IXITloshw0AR ZtVxWTBOIhEOautOudd+z95RAfi86sY1gk/A+Yd+6540wO728sFkjPkXo1ygrHxwb8Ee ubNPrB36FmniCRzchr/ArfSAiP0die/dZA97/iAtQqZgm5wRwC92VDCP0krxN38pUluo LJZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716870035; x=1717474835; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vLVgvlmv7wObZ6k7lcZlbzq/WHeyXy8K/0AyoOXn4/g=; b=BTq19Xbs7U9+ZMR5MQqiKNaTtZiIn3rvkgEg9kEzyR3ct9MxMln2A3Ym4IBU59IsQk GrdwxJewbFHO8w9TVXER9BngXFagNi5IFfol1i1TMuO150K1/u8PZa7a3svLgrCBfXGg caLjmNGaUme5DUjPMVKPGMl26zfshMiTJrY0gCjGmK6ooJNEJBPYGZKklTPQWjwfu6WM y2+tDsYMhqhq99CKGXbeajvDWjmuJyx/s1YonxWSbRjgTdx+z+mdKT5i/WelaZ6zBACX +23JPvQYdC1XWevcnfISRTf2/gJTGXiLE2hWVyR/gJCcUCahEG4SWUwJsuHYcevPOQN3 8rxQ== X-Forwarded-Encrypted: i=1; AJvYcCU6tE+pcP5Pu2CtMfu275GgDPTvqFS52I3rYatHxW2FXvzwRi63sZoYIwweQ5T8Q4xj5Cq1Udv5DgP96cNqwAs03WQNmDn40w== X-Gm-Message-State: AOJu0Yy8hnBkWaLyHinzcizRoZnFTexYULBZLeltSoldP3wMNCuDrtxu tmRzZwTBCDbk+xBGSX2IpTW+T1WtNo+cfnbMaBiR7pmMgKm4Tq976hJ1o6W6 X-Google-Smtp-Source: AGHT+IG93Lg2klaCGcO7DjNThq0ydZSvs+5JdRPjrA5bwI0hvTX71Z3Rkai010VxEXhuQ6yEWbao0w== X-Received: by 2002:a05:6122:3122:b0:4da:e6ee:5533 with SMTP id 71dfb90a1353d-4e4f02fbb91mr10950095e0c.16.1716870035097; Mon, 27 May 2024 21:20:35 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abcc0f04sm350452185a.42.2024.05.27.21.20.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 May 2024 21:20:34 -0700 (PDT) Message-ID: Date: Mon, 27 May 2024 22:20:33 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection Content-Language: en-US To: Mariam Arutunian , GCC Patches References: From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,FREEMAIL_REPLY,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/24/24 2:42 AM, Mariam Arutunian wrote: > This patch adds a new compiler pass aimed at identifying naive CRC > implementations, > characterized by the presence of a loop calculating a CRC (polynomial > long division). > Upon detection of a potential CRC, the pass prints an informational message. > > Performs CRC optimization if optimization level is >= 2, > besides optimizations for size and if fno_gimple_crc_optimization given. > > This pass is added for the detection and optimization of naive CRC > implementations, > improving the efficiency of CRC-related computations. > > This patch includes only initial fast checks for filtering out non-CRCs, > detected possible CRCs verification and optimization parts will be > provided in subsequent patches. > >   gcc/ > >     * Makefile.in (OBJS): Add gimple-crc-optimization.o. >     * common.opt (fgimple-crc-optimization): New option. >     * doc/invoke.texi (-fgimple-crc-optimization): Add documentation. >     * gimple-crc-optimization.cc: New file. >     * gimple.cc (set_phi_stmts_not_visited): New function. >     (set_gimple_stmts_not_visited): Likewise. >     (set_bbs_stmts_not_visited): Likewise. >     * gimple.h (set_gimple_stmts_not_visited): New extern function > declaration. >     (set_phi_stmts_not_visited): New extern function declaration. >     (set_bbs_stmts_not_visited): New extern function declaration. >     * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization. >     (enable_fdo_optimizations): Enable gimple-crc-optimization. >     * passes.def (pass_crc_optimization): Add new pass. >     * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar. >     * tree-pass.h (make_pass_crc_optimization): New extern function > declaration. > > Signed-off-by: Mariam Arutunian > > diff --git a/gcc/common.opt b/gcc/common.opt > index 2c078fdd1f8..53f7ab255dd 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1757,6 +1757,12 @@ Common Var(flag_gcse_after_reload) Optimization > Perform global common subexpression elimination after register allocation has > finished. > > +fgimple-crc-optimization > +Common Var(flag_gimple_crc_optimization) Optimization > +Detect loops calculating CRC and replace with faster implementation. > +If the target supports carry-less-multiplication instruction, generate CRC using > +it; otherwise generate table-based CRC. This probably needs a minor update since your code can also generate a CRC instruction on x86 when it detects a CRC loop with the right polynomial. > + > + /* Returns true if there is only two conditional blocks in the loop > + (one may be for the CRC bit check and the other for the loop counter). > + This may filter out some real CRCs, where more than one condition > + is checked for the CRC calculation. */ > + static bool loop_contains_two_conditional_bb (basic_block *loop_bbs, > + unsigned loop_num_nodes); It's been a while, so if we're rehashing something we already worked through, I apologize. IIRC we looked at the problem of canonicalizing the loop into a form where we didn't necessarily have conditional blocks, instead we had branchless sequences for the conditional xor and dealing with the high bit in the crc. My recollection was that the coremark CRC loop would always canonicalize, but that in general we still saw multiple CRC implementations that did not canonicalize and thus we still needed the more complex matching. Correct? > + > + /* Checks whether found XOR_STMT is for calculating CRC. > + The function CRC_FUN calculates CRC only if there is a shift operation > + in the crc loop. */ > + bool xor_calculates_crc (function *crc_fun, basic_block *loop_bbs, > + const gimple *xor_stmt); So the second sentence in the comment doesn't really seem to relate to this function. It also seems like we potentially have two xors in a CRC loop. One which xors one bit of the data with one bit of the crc. The other is the conditional xor. Which does this refer to? I'm guesing the latter since the former can likely hoist out of the loop given a sufficiently smart loop optimizer. > + > + /* Checks that the variable used in the condition COND is the assumed CRC > + (or depends on the assumed CRC). > + Also sets data member m_phi_for_data if it isn't set and exists. */ > + bool is_crc_checked (gcond *cond, basic_block *loop_bbs); > + > + /* Returns true if condition COND checks MSB/LSB bit is 1. > + Otherwise, return false. */ > + static bool cond_true_is_checked_for_bit_one (const gcond *cond); > + > + /* Returns opposite block of the XOR_BB from PRED_BB's dest blocks. */ > + static basic_block get_xor_bb_opposite (basic_block pred_bb, > + basic_block xor_bb); > + > + /* Checks whether the pair of xor's shift exists in the opposite > + basic block (OPPOSITE_BB). > + If there is a shift and xor in the same block, > + then in the opposite block must be another shift. */ > + bool exists_shift_for_opp_xor_shift (basic_block opposite_bb); > + > + /* Goes down by def-use chain (within the CRC loop) and returns the statement > + where variable (dependent on xor-ed variable) is shifted with 1. > + Between xor and shift operations only phi statements are allowed. > + Otherwise, returns nullptr. */ > + gimple *find_shift_after_xor (tree xored_crc); > + > + /* Returns the statement which does shift 1 operation. > + If there is no such statement, returns nullptr. > + STMTS - are the statements within the loop before xor operation on > + which possible CRC variable depends. */ > + gimple *find_shift_before_xor (const auto_vec &stmts); > + > + /* Returns true if ASSIGN_STMT does shift with 1. > + Otherwise, returns false. */ > + bool can_be_crc_shift (gimple *assign_stmt); > + > + /* Returns true if the operation done in ASSIGN_STMT can occur during CRC > + calculation. Otherwise, returns false. */ > + bool can_not_be_crc_stmt (gimple *assign_stmt); > + > + /* Returns true if the statement with STMT_CODE may occur in CRC calculation. > + Otherwise, returns false. */ > + static bool is_acceptable_stmt_code (const tree_code &stmt_code); > + > + /* Prints extracted details of CRC calculation. */ > + void dump_crc_information (); > + > + public: > + unsigned int execute (function *fun); > +}; > + > + > +/* Goes down by def-use chain (within the CRC loop) and returns the statement > + where variable (dependent on xor-ed variable) is shifted with 1. > + Between xor and shift operations only phi statements are allowed. > + Otherwise, returns nullptr. */ Follow def-use chains of XORED_CRC and return the statement where XORED_CRC is shifted by one bit position. Only PHI statements are allowed between XORED_CRC and the shift in the def-use chain. If no such statement is found, return NULL. Seems like a better function comment here. > + > +gimple * > +crc_optimization::find_shift_after_xor (tree xored_crc) > +{ > + imm_use_iterator imm_iter; > + use_operand_p use_p; > + > + if (TREE_CODE (xored_crc) != SSA_NAME) > + return nullptr; If we always expect XORED_CRC to be an SSA_NAME, we might be able to use gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME); The difference is the assert means the property should always hold. Whereas the IF as written just says that if we don't have an SSA_NAME, give up trying to optimize. > +> +/* Returns opposite block of the XOR_BB from PRED_BB's dest blocks. */ > + > +basic_block > +crc_optimization::get_xor_bb_opposite (basic_block pred_bb, basic_block xor_bb) > +{ > + if (EDGE_COUNT (pred_bb->succs) != 2) > + return nullptr; > + > + if (EDGE_SUCC (pred_bb, 0)->dest != xor_bb) > + return EDGE_SUCC (pred_bb, 0)->dest; > + return EDGE_SUCC (pred_bb, 1)->dest; > +} Consider using a gcc_assert to ensure that one of the two successors is xor_bb. You might also want to reject edge->flags & EDGE_COMPLEX is nonzero.> + > +/* Returns true if condition COND checks MSB/LSB bit is 1. > + Otherwise, return false. */ > + > +bool > +crc_optimization::cond_true_is_checked_for_bit_one (const gcond *cond) > +{ > + if (!cond) > + return false; > + > + tree rhs = gimple_cond_rhs (cond); > + enum tree_code code = gimple_cond_code (cond); > + > + /* If the condition is something == 1 -> return true. */ > + if (integer_onep (rhs) && code == EQ_EXPR) > + return true; Generally we prefer to have the cheapest test first. So checking the code against EQ_EXPR first is marginally better. > + > + /* If the condition is something != 0 or something < 0 -> return true. */ > + if (integer_zerop (rhs) && (code == NE_EXPR || code == LT_EXPR)) > + return true; We generally discourage having && and || conditions on the same line. I'd rewrite as if ((code == NE_EXPR || CODE == LT_EXPR) && integer_zerop (rhs)) Do you need to check for EQ_EXPR with rhs having the value 0? What about NE_EXPR with the value 1? > +> +/* Checks that the condition is for checking CRC. > + Returns true if xor is done under the condition of MSB/LSB being 1, and > + the condition's variable and xor-ed variable depend on the same variable. > + Otherwise, returns false. > + XOR_BB is the basic block, where the xor operation is done. > + PRED_BB is the predecessor basic block of the XOR_BB, it is assumed that > + the last stmt of PRED_BB checks the condition under which xor is done. */ > + > +bool > +crc_optimization::crc_cond (basic_block pred_bb, basic_block xor_bb, > + basic_block *loop_bbs) > +{ > + /* Check whether PRED_BB contains condition. We will consider only those > + cases when xor is done immediately under the condition. */ > + gcond *cond = safe_dyn_cast (*gsi_last_bb (pred_bb)); Hmm. I think this works, but it's a bit obscure as it depends on operator* returning a pointer to the statement. I didn't even know that worked. Something like this might be more obvious: gsi_stmt (gsi_last_bb (pred_bb)) should give you the last statement. Then you can see if it's a gimple cond. > + > +/* Returns true if the statement with STMT_CODE may occur in CRC calculation. > + Otherwise, returns false. */ > + > +bool > +crc_optimization::is_acceptable_stmt_code (const tree_code &stmt_code) > +{ > + return stmt_code == BIT_IOR_EXPR > + || stmt_code == BIT_AND_EXPR > + || stmt_code == BIT_XOR_EXPR > + || stmt_code == MINUS_EXPR > + || stmt_code == PLUS_EXPR > + || stmt_code == RSHIFT_EXPR > + || stmt_code == LSHIFT_EXPR > + || TREE_CODE_CLASS (stmt_code) == tcc_unary; Go ahead and put parenthesis around all those tests :-) > + > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields. > + Returns false if there are more than two (as in CRC calculation only CRC's > + and data's phi may exist) or no phi statements in STMTS (at least there must > + be CRC's phi). > + Otherwise, returns true. */ > + > +bool > +crc_optimization::set_crc_and_data_phi (auto_vec &stmts) > +{ > + for (auto stmt_it = stmts.begin (); stmt_it != stmts.end (); stmt_it++) > + { > + if (is_a (*stmt_it) && bb_loop_header_p (gimple_bb (*stmt_it))) > + { > + if (!m_phi_for_crc) > + m_phi_for_crc = as_a (*stmt_it); > + else if (!m_phi_for_data) > + m_phi_for_data = as_a (*stmt_it); > + else > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "Xor-ed variable depends on more than 2 " > + "phis.\n"); > + return false; > + } > + } > + } > + return m_phi_for_crc; Hmm. For a given PHI, how do we know if it's for the data item or the crc item, or something else (like a loop counter) entirely? > diff --git a/gcc/gimple.cc b/gcc/gimple.cc > index a9f968cb038..76518d8897d 100644 > --- a/gcc/gimple.cc > +++ b/gcc/gimple.cc > @@ -3426,6 +3426,44 @@ gimple_or_expr_nonartificial_location (gimple *stmt, tree expr) > return expansion_point_location_if_in_system_header (loc); > } > > +/* Set GIMPLE_PHI statements of the BB not visited. */ > + > +void > +set_phi_stmts_not_visited (basic_block bb) [ ... ] Something tells me we probably need to convert some code to use these new API points. Not a requirement for this to go forward, but a useful cleanup. Overall it looks pretty good. I don't see anything that looks horribly wrong. My biggest worry is that set_crc_and_data_phi routine. But there's other small things noted above that we need to clean up. So let's get the questions answered and some cleanups done, then repost. Thanks, Jeff