[RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
@ 2024-05-24  8:42 Mariam Arutunian
  2024-05-28  4:20 ` Jeff Law
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Mariam Arutunian @ 2024-05-24  8:42 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1.1: Type: text/plain, Size: 1683 bytes --]

This patch adds a new compiler pass aimed at identifying naive CRC
implementations,
characterized by the presence of a loop calculating a CRC (polynomial long
division).
Upon detection of a potential CRC, the pass prints an informational message.

Performs CRC optimization if optimization level is >= 2,
besides optimizations for size and if fno_gimple_crc_optimization given.

This pass is added for the detection and optimization of naive CRC
implementations,
improving the efficiency of CRC-related computations.

This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be provided
in subsequent patches.

  gcc/

    * Makefile.in (OBJS): Add gimple-crc-optimization.o.
    * common.opt (fgimple-crc-optimization): New option.
    * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
    * gimple-crc-optimization.cc: New file.
    * gimple.cc (set_phi_stmts_not_visited): New function.
    (set_gimple_stmts_not_visited): Likewise.
    (set_bbs_stmts_not_visited): Likewise.
    * gimple.h (set_gimple_stmts_not_visited): New extern function
declaration.
    (set_phi_stmts_not_visited): New extern function declaration.
    (set_bbs_stmts_not_visited): New extern function declaration.
    * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
    (enable_fdo_optimizations): Enable gimple-crc-optimization.
    * passes.def (pass_crc_optimization): Add new pass.
    * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
    * tree-pass.h (make_pass_crc_optimization): New extern function
declaration.

Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>

[-- Attachment #2: 0008-Add-a-new-pass-for-naive-CRC-loops-detection.patch --]
[-- Type: application/x-patch, Size: 39685 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-24  8:42 [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection Mariam Arutunian
@ 2024-05-28  4:20 ` Jeff Law
  2024-05-29 11:12   ` Mariam Arutunian
  2024-05-28  7:01 ` Richard Biener
  2024-05-29 12:43 ` David Malcolm
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2024-05-28  4:20 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches



On 5/24/24 2:42 AM, Mariam Arutunian wrote:
> This patch adds a new compiler pass aimed at identifying naive CRC 
> implementations,
> characterized by the presence of a loop calculating a CRC (polynomial 
> long division).
> Upon detection of a potential CRC, the pass prints an informational message.
> 
> Performs CRC optimization if optimization level is >= 2,
> besides optimizations for size and if fno_gimple_crc_optimization given.
> 
> This pass is added for the detection and optimization of naive CRC 
> implementations,
> improving the efficiency of CRC-related computations.
> 
> This patch includes only initial fast checks for filtering out non-CRCs,
> detected possible CRCs verification and optimization parts will be 
> provided in subsequent patches.
> 
>    gcc/
> 
>      * Makefile.in (OBJS): Add gimple-crc-optimization.o.
>      * common.opt (fgimple-crc-optimization): New option.
>      * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
>      * gimple-crc-optimization.cc: New file.
>      * gimple.cc (set_phi_stmts_not_visited): New function.
>      (set_gimple_stmts_not_visited): Likewise.
>      (set_bbs_stmts_not_visited): Likewise.
>      * gimple.h (set_gimple_stmts_not_visited): New extern function 
> declaration.
>      (set_phi_stmts_not_visited): New extern function declaration.
>      (set_bbs_stmts_not_visited): New extern function declaration.
>      * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
>      (enable_fdo_optimizations): Enable gimple-crc-optimization.
>      * passes.def (pass_crc_optimization): Add new pass.
>      * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
>      * tree-pass.h (make_pass_crc_optimization): New extern function 
> declaration.
> 
> Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com 
> <mailto:mariamarutunian@gmail.com>>

> diff --git a/gcc/common.opt b/gcc/common.opt
> index 2c078fdd1f8..53f7ab255dd 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1757,6 +1757,12 @@ Common Var(flag_gcse_after_reload) Optimization
>  Perform global common subexpression elimination after register allocation has
>  finished.
>  
> +fgimple-crc-optimization
> +Common Var(flag_gimple_crc_optimization) Optimization
> +Detect loops calculating CRC and replace with faster implementation.
> +If the target supports carry-less-multiplication instruction, generate CRC using
> +it; otherwise generate table-based CRC.
This probably needs a minor update since your code can also generate a 
CRC instruction on x86 when it detects a CRC loop with the right polynomial.





> +
> +  /* Returns true if there is only two conditional blocks in the loop
> +     (one may be for the CRC bit check and the other for the loop counter).
> +     This may filter out some real CRCs, where more than one condition
> +     is checked for the CRC calculation.  */
> +  static bool loop_contains_two_conditional_bb (basic_block *loop_bbs,
> +						unsigned loop_num_nodes);
It's been a while, so if we're rehashing something we already worked 
through, I apologize.

IIRC we looked at the problem of canonicalizing the loop into a form 
where we didn't necessarily have conditional blocks, instead we had 
branchless sequences for the conditional xor and dealing with the high 
bit in the crc.  My recollection was that the coremark CRC loop would 
always canonicalize, but that in general we still saw multiple CRC 
implementations that did not canonicalize and thus we still needed the 
more complex matching.  Correct?


> +
> +  /* Checks whether found XOR_STMT is for calculating CRC.
> +     The function CRC_FUN calculates CRC only if there is a shift operation
> +     in the crc loop.  */
> +  bool xor_calculates_crc (function *crc_fun, basic_block *loop_bbs,
> +			   const gimple *xor_stmt);
So the second sentence in the comment doesn't really seem to relate to 
this function.  It also seems like we potentially have two xors in a CRC 
loop.  One which xors one bit of the data with one bit of the crc.  The 
other is the conditional xor.  Which does this refer to?  I'm guesing 
the latter since the former can likely hoist out of the loop given a 
sufficiently smart loop optimizer.



> +
> +  /* Checks that the variable used in the condition COND is the assumed CRC
> +     (or depends on the assumed CRC).
> +     Also sets data member m_phi_for_data if it isn't set and exists.  */
> +  bool is_crc_checked (gcond *cond, basic_block *loop_bbs);
> +
> +  /* Returns true if condition COND checks MSB/LSB bit is 1.
> +     Otherwise, return false.  */
> +  static bool cond_true_is_checked_for_bit_one (const gcond *cond);
> +
> +  /* Returns opposite block of the XOR_BB from PRED_BB's dest blocks.  */
> +  static basic_block get_xor_bb_opposite (basic_block pred_bb,
> +					  basic_block xor_bb);
> +
> +  /* Checks whether the pair of xor's shift exists in the opposite
> +     basic block (OPPOSITE_BB).
> +     If there is a shift and xor in the same block,
> +     then in the opposite block must be another shift.  */
> +  bool exists_shift_for_opp_xor_shift (basic_block opposite_bb);
> +
> +  /* Goes down by def-use chain (within the CRC loop) and returns the statement
> +     where variable (dependent on xor-ed variable) is shifted with 1.
> +     Between xor and shift operations only phi statements are allowed.
> +     Otherwise, returns nullptr.  */
> +  gimple *find_shift_after_xor (tree xored_crc);
> +
> +  /* Returns the statement which does shift 1 operation.
> +     If there is no such statement, returns nullptr.
> +     STMTS - are the statements within the loop before xor operation on
> +     which possible CRC variable depends.  */
> +  gimple *find_shift_before_xor (const auto_vec <gimple *> &stmts);
> +
> +  /* Returns true if ASSIGN_STMT does shift with 1.
> +     Otherwise, returns false.  */
> +  bool can_be_crc_shift (gimple *assign_stmt);
> +
> +  /* Returns true if the operation done in ASSIGN_STMT can occur during CRC
> +     calculation.  Otherwise, returns false.  */
> +  bool can_not_be_crc_stmt (gimple *assign_stmt);
> +
> +  /* Returns true if the statement with STMT_CODE may occur in CRC calculation.
> +     Otherwise, returns false.  */
> +  static bool is_acceptable_stmt_code (const tree_code &stmt_code);
> +
> +  /* Prints extracted details of CRC calculation.  */
> +  void dump_crc_information ();
> +
> + public:
> +  unsigned int execute (function *fun);
> +};
> +


> +
> +/* Goes down by def-use chain (within the CRC loop) and returns the statement
> +   where variable (dependent on xor-ed variable) is shifted with 1.
> +   Between xor and shift operations only phi statements are allowed.
> +   Otherwise, returns nullptr.  */
Follow def-use chains of XORED_CRC and return the statement where 
XORED_CRC is shifted by one bit position.  Only PHI statements are 
allowed between XORED_CRC and the shift in the def-use chain.

If no such statement is found, return NULL.

Seems like a better function comment here.


> +
> +gimple *
> +crc_optimization::find_shift_after_xor (tree xored_crc)
> +{
> +  imm_use_iterator imm_iter;
> +  use_operand_p use_p;
> +
> +  if (TREE_CODE (xored_crc) != SSA_NAME)
> +    return nullptr;
If we always expect XORED_CRC to be an SSA_NAME, we might be able to use
gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME);

The difference is the assert means the property should always hold. 
Whereas the IF as written just says that if we don't have an SSA_NAME, 
give up trying to optimize.
 > +> +/* Returns opposite block of the XOR_BB from PRED_BB's dest 
blocks.  */
> +
> +basic_block
> +crc_optimization::get_xor_bb_opposite (basic_block pred_bb, basic_block xor_bb)
> +{
> +  if (EDGE_COUNT (pred_bb->succs) != 2)
> +    return nullptr;
> +
> +  if (EDGE_SUCC (pred_bb, 0)->dest != xor_bb)
> +    return EDGE_SUCC (pred_bb, 0)->dest;
> +  return EDGE_SUCC (pred_bb, 1)->dest;
> +}
Consider using a gcc_assert to ensure that one of the two successors is 
xor_bb.

You might also want to reject edge->flags & EDGE_COMPLEX is nonzero.> +


> +/* Returns true if condition COND checks MSB/LSB bit is 1.
> +   Otherwise, return false.  */
> +
> +bool
> +crc_optimization::cond_true_is_checked_for_bit_one (const gcond *cond)
> +{
> +  if (!cond)
> +    return false;
> +
> +  tree rhs = gimple_cond_rhs (cond);
> +  enum tree_code code = gimple_cond_code (cond);
> +
> +  /* If the condition is something == 1 -> return true.  */
> +  if (integer_onep (rhs) && code == EQ_EXPR)
> +    return true;
Generally we prefer to have the cheapest test first.  So checking the 
code against EQ_EXPR first is marginally better.

> +
> +  /* If the condition is something != 0  or something < 0 -> return true.  */
> +  if (integer_zerop (rhs) && (code == NE_EXPR || code == LT_EXPR))
> +    return true;
We generally discourage having && and || conditions on the same line. 
I'd rewrite as
   if ((code == NE_EXPR || CODE == LT_EXPR)
       && integer_zerop (rhs))

Do you need to check for EQ_EXPR with rhs having the value 0?  What 
about NE_EXPR with the value 1?

 > +> +/* Checks that the condition is for checking CRC.
> +   Returns true if xor is done under the condition of MSB/LSB being 1, and
> +   the condition's variable and xor-ed variable depend on the same variable.
> +   Otherwise, returns false.
> +   XOR_BB is the basic block, where the xor operation is done.
> +   PRED_BB is the predecessor basic block of the XOR_BB, it is assumed that
> +   the last stmt of PRED_BB checks the condition under which xor is done.  */
> +
> +bool
> +crc_optimization::crc_cond (basic_block pred_bb, basic_block xor_bb,
> +			    basic_block *loop_bbs)
> +{
> +  /* Check whether PRED_BB contains condition.  We will consider only those
> +     cases when xor is done immediately under the condition.  */
> +  gcond *cond = safe_dyn_cast<gcond *> (*gsi_last_bb (pred_bb));
Hmm.  I think this works, but it's a bit obscure as it depends on 
operator* returning a pointer to the statement.  I didn't even know that 
worked.

Something like this might be more obvious:

gsi_stmt (gsi_last_bb (pred_bb)) should give you the last statement. 
Then you can see if it's a gimple cond.



> +
> +/* Returns true if the statement with STMT_CODE may occur in CRC calculation.
> +   Otherwise, returns false.  */
> +
> +bool
> +crc_optimization::is_acceptable_stmt_code (const tree_code &stmt_code)
> +{
> +  return stmt_code == BIT_IOR_EXPR
> +	 || stmt_code == BIT_AND_EXPR
> +	 || stmt_code == BIT_XOR_EXPR
> +	 || stmt_code == MINUS_EXPR
> +	 || stmt_code == PLUS_EXPR
> +	 || stmt_code == RSHIFT_EXPR
> +	 || stmt_code == LSHIFT_EXPR
> +	 || TREE_CODE_CLASS (stmt_code) == tcc_unary;
Go ahead and put parenthesis around all those tests :-)



> +
> +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields.
> +   Returns false if there are more than two (as in CRC calculation only CRC's
> +   and data's phi may exist) or no phi statements in STMTS (at least there must
> +   be CRC's phi).
> +   Otherwise, returns true.  */
> +
> +bool
> +crc_optimization::set_crc_and_data_phi (auto_vec<gimple *> &stmts)
> +{
> +  for (auto stmt_it = stmts.begin (); stmt_it != stmts.end (); stmt_it++)
> +    {
> +      if (is_a<gphi *> (*stmt_it) && bb_loop_header_p (gimple_bb (*stmt_it)))
> +	{
> +	  if (!m_phi_for_crc)
> +	    m_phi_for_crc = as_a<gphi *> (*stmt_it);
> +	  else if (!m_phi_for_data)
> +	    m_phi_for_data = as_a<gphi *> (*stmt_it);
> +	  else
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +		fprintf (dump_file, "Xor-ed variable depends on more than 2 "
> +				    "phis.\n");
> +	      return false;
> +	    }
> +	}
> +    }
> +  return m_phi_for_crc;
Hmm.  For a given PHI, how do we know if it's for the data item or the 
crc item, or something else (like a loop counter) entirely?



> diff --git a/gcc/gimple.cc b/gcc/gimple.cc
> index a9f968cb038..76518d8897d 100644
> --- a/gcc/gimple.cc
> +++ b/gcc/gimple.cc
> @@ -3426,6 +3426,44 @@ gimple_or_expr_nonartificial_location (gimple *stmt, tree expr)
>    return expansion_point_location_if_in_system_header (loc);
>  }
>  
> +/* Set GIMPLE_PHI statements of the BB not visited.  */
> +
> +void
> +set_phi_stmts_not_visited (basic_block bb)
[ ... ]
Something tells me we probably need to convert some code to use these 
new API points.  Not a requirement for this to go forward, but a useful 
cleanup.


Overall it looks pretty good.  I don't see anything that looks horribly 
wrong.  My biggest worry is that set_crc_and_data_phi routine.  But 
there's other small things noted above that we need to clean up.

So let's get the questions answered and some cleanups done, then repost.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-24  8:42 [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection Mariam Arutunian
  2024-05-28  4:20 ` Jeff Law
@ 2024-05-28  7:01 ` Richard Biener
  2024-05-29 22:30   ` Jeff Law
  2024-05-29 12:43 ` David Malcolm
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2024-05-28  7:01 UTC (permalink / raw)
  To: Mariam Arutunian; +Cc: GCC Patches

On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
<mariamarutunian@gmail.com> wrote:
>
> This patch adds a new compiler pass aimed at identifying naive CRC implementations,
> characterized by the presence of a loop calculating a CRC (polynomial long division).
> Upon detection of a potential CRC, the pass prints an informational message.
>
> Performs CRC optimization if optimization level is >= 2,
> besides optimizations for size and if fno_gimple_crc_optimization given.
>
> This pass is added for the detection and optimization of naive CRC implementations,
> improving the efficiency of CRC-related computations.
>
> This patch includes only initial fast checks for filtering out non-CRCs,
> detected possible CRCs verification and optimization parts will be provided in subsequent patches.

Just a few quick questions - I'm waiting for a revision with Jeffs
comments cleared before
having a closer look.  The patch does nothing but analyze right now,
correct?  I assume
a later patch will fill in stuff in ::execute and use the return value
of loop_may_calculate_crc
(it's a bit odd to review such a "split" thing).

I think what this does fits final value replacement which lives in
tree-scalar-evolution.cc
and works from the loop-closed PHIs, trying to replace those.  I'm not
sure we want to
have a separate pass for this.  Consider a loop calculating two or
four CRCs in parallel,
replacing LC PHIs one-by-one should be able to handle this.

Richard.

>   gcc/
>
>     * Makefile.in (OBJS): Add gimple-crc-optimization.o.
>     * common.opt (fgimple-crc-optimization): New option.
>     * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
>     * gimple-crc-optimization.cc: New file.
>     * gimple.cc (set_phi_stmts_not_visited): New function.
>     (set_gimple_stmts_not_visited): Likewise.
>     (set_bbs_stmts_not_visited): Likewise.
>     * gimple.h (set_gimple_stmts_not_visited): New extern function declaration.
>     (set_phi_stmts_not_visited): New extern function declaration.
>     (set_bbs_stmts_not_visited): New extern function declaration.
>     * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
>     (enable_fdo_optimizations): Enable gimple-crc-optimization.
>     * passes.def (pass_crc_optimization): Add new pass.
>     * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
>     * tree-pass.h (make_pass_crc_optimization): New extern function declaration.
>
> Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-28  4:20 ` Jeff Law
@ 2024-05-29 11:12   ` Mariam Arutunian
  2024-06-08 22:00     ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Mariam Arutunian @ 2024-05-29 11:12 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 15762 bytes --]

On Tue, May 28, 2024 at 8:20 AM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 5/24/24 2:42 AM, Mariam Arutunian wrote:
> > This patch adds a new compiler pass aimed at identifying naive CRC
> > implementations,
> > characterized by the presence of a loop calculating a CRC (polynomial
> > long division).
> > Upon detection of a potential CRC, the pass prints an informational
> message.
> >
> > Performs CRC optimization if optimization level is >= 2,
> > besides optimizations for size and if fno_gimple_crc_optimization given.
> >
> > This pass is added for the detection and optimization of naive CRC
> > implementations,
> > improving the efficiency of CRC-related computations.
> >
> > This patch includes only initial fast checks for filtering out non-CRCs,
> > detected possible CRCs verification and optimization parts will be
> > provided in subsequent patches.
> >
> >    gcc/
> >
> >      * Makefile.in (OBJS): Add gimple-crc-optimization.o.
> >      * common.opt (fgimple-crc-optimization): New option.
> >      * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
> >      * gimple-crc-optimization.cc: New file.
> >      * gimple.cc (set_phi_stmts_not_visited): New function.
> >      (set_gimple_stmts_not_visited): Likewise.
> >      (set_bbs_stmts_not_visited): Likewise.
> >      * gimple.h (set_gimple_stmts_not_visited): New extern function
> > declaration.
> >      (set_phi_stmts_not_visited): New extern function declaration.
> >      (set_bbs_stmts_not_visited): New extern function declaration.
> >      * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
> >      (enable_fdo_optimizations): Enable gimple-crc-optimization.
> >      * passes.def (pass_crc_optimization): Add new pass.
> >      * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
> >      * tree-pass.h (make_pass_crc_optimization): New extern function
> > declaration.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com
> > <mailto:mariamarutunian@gmail.com>>
>
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 2c078fdd1f8..53f7ab255dd 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1757,6 +1757,12 @@ Common Var(flag_gcse_after_reload) Optimization
> >  Perform global common subexpression elimination after register
> allocation has
> >  finished.
> >
> > +fgimple-crc-optimization
> > +Common Var(flag_gimple_crc_optimization) Optimization
> > +Detect loops calculating CRC and replace with faster implementation.
> > +If the target supports carry-less-multiplication instruction, generate
> CRC using
> > +it; otherwise generate table-based CRC.
> This probably needs a minor update since your code can also generate a
> CRC instruction on x86 when it detects a CRC loop with the right
> polynomial.
>


Thanks. I forgot to modify this part.


>
>

>
>
> > +
> > +  /* Returns true if there is only two conditional blocks in the loop
> > +     (one may be for the CRC bit check and the other for the loop
> counter).
> > +     This may filter out some real CRCs, where more than one condition
> > +     is checked for the CRC calculation.  */
> > +  static bool loop_contains_two_conditional_bb (basic_block *loop_bbs,
> > +                                             unsigned loop_num_nodes);
> It's been a while, so if we're rehashing something we already worked
> through, I apologize.
>
> IIRC we looked at the problem of canonicalizing the loop into a form
> where we didn't necessarily have conditional blocks, instead we had
> branchless sequences for the conditional xor and dealing with the high
> bit in the crc.  My recollection was that the coremark CRC loop would
> always canonicalize, but that in general we still saw multiple CRC
> implementations that did not canonicalize and thus we still needed the
> more complex matching.  Correct?
>
>
The loop in CoreMark is not fully canonicalized in that form,
as there are still branches present for the conditional XOR operation.
I checked that using the -O2 and -O3 flags.

>
> > +
> > +  /* Checks whether found XOR_STMT is for calculating CRC.
> > +     The function CRC_FUN calculates CRC only if there is a shift
> operation
> > +     in the crc loop.  */
> > +  bool xor_calculates_crc (function *crc_fun, basic_block *loop_bbs,
> > +                        const gimple *xor_stmt);
> So the second sentence in the comment doesn't really seem to relate to
> this function.  It also seems like we potentially have two xors in a CRC
> loop.  One which xors one bit of the data with one bit of the crc.  The
> other is the conditional xor.  Which does this refer to?  I'm guesing
> the latter since the former can likely hoist out of the loop given a
> sufficiently smart loop optimizer.
>
>

I'll try to provide a better description of this function.
Here, XOR refers to the conditional XOR.

>
>
> > +
> > +  /* Checks that the variable used in the condition COND is the assumed
> CRC
> > +     (or depends on the assumed CRC).
> > +     Also sets data member m_phi_for_data if it isn't set and exists.
> */
> > +  bool is_crc_checked (gcond *cond, basic_block *loop_bbs);
> > +
> > +  /* Returns true if condition COND checks MSB/LSB bit is 1.
> > +     Otherwise, return false.  */
> > +  static bool cond_true_is_checked_for_bit_one (const gcond *cond);
> > +
> > +  /* Returns opposite block of the XOR_BB from PRED_BB's dest blocks.
> */
> > +  static basic_block get_xor_bb_opposite (basic_block pred_bb,
> > +                                       basic_block xor_bb);
> > +
> > +  /* Checks whether the pair of xor's shift exists in the opposite
> > +     basic block (OPPOSITE_BB).
> > +     If there is a shift and xor in the same block,
> > +     then in the opposite block must be another shift.  */
> > +  bool exists_shift_for_opp_xor_shift (basic_block opposite_bb);
> > +
> > +  /* Goes down by def-use chain (within the CRC loop) and returns the
> statement
> > +     where variable (dependent on xor-ed variable) is shifted with 1.
> > +     Between xor and shift operations only phi statements are allowed.
> > +     Otherwise, returns nullptr.  */
> > +  gimple *find_shift_after_xor (tree xored_crc);
> > +
> > +  /* Returns the statement which does shift 1 operation.
> > +     If there is no such statement, returns nullptr.
> > +     STMTS - are the statements within the loop before xor operation on
> > +     which possible CRC variable depends.  */
> > +  gimple *find_shift_before_xor (const auto_vec <gimple *> &stmts);
> > +
> > +  /* Returns true if ASSIGN_STMT does shift with 1.
> > +     Otherwise, returns false.  */
> > +  bool can_be_crc_shift (gimple *assign_stmt);
> > +
> > +  /* Returns true if the operation done in ASSIGN_STMT can occur during
> CRC
> > +     calculation.  Otherwise, returns false.  */
> > +  bool can_not_be_crc_stmt (gimple *assign_stmt);
> > +
> > +  /* Returns true if the statement with STMT_CODE may occur in CRC
> calculation.
> > +     Otherwise, returns false.  */
> > +  static bool is_acceptable_stmt_code (const tree_code &stmt_code);
> > +
> > +  /* Prints extracted details of CRC calculation.  */
> > +  void dump_crc_information ();
> > +
> > + public:
> > +  unsigned int execute (function *fun);
> > +};
> > +
>
>
> > +
> > +/* Goes down by def-use chain (within the CRC loop) and returns the
> statement
> > +   where variable (dependent on xor-ed variable) is shifted with 1.
> > +   Between xor and shift operations only phi statements are allowed.
> > +   Otherwise, returns nullptr.  */
> Follow def-use chains of XORED_CRC and return the statement where
> XORED_CRC is shifted by one bit position.  Only PHI statements are
> allowed between XORED_CRC and the shift in the def-use chain.
>
> If no such statement is found, return NULL.
>
> Seems like a better function comment here.
>
>
Yes). Thanks.


>
> > +
> > +gimple *
> > +crc_optimization::find_shift_after_xor (tree xored_crc)
> > +{
> > +  imm_use_iterator imm_iter;
> > +  use_operand_p use_p;
> > +
> > +  if (TREE_CODE (xored_crc) != SSA_NAME)
> > +    return nullptr;
> If we always expect XORED_CRC to be an SSA_NAME, we might be able to use
> gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME);
>


I'm not sure that it always has to be an SSA_NAME.

The difference is the assert means the property should always hold.
> Whereas the IF as written just says that if we don't have an SSA_NAME,
> give up trying to optimize.
>

Yes, I assumed that if it's not an SSA_NAME, we should just skip the
optimization.

>  > +> +/* Returns opposite block of the XOR_BB from PRED_BB's dest
> blocks.  */
> > +
> > +basic_block
> > +crc_optimization::get_xor_bb_opposite (basic_block pred_bb, basic_block
> xor_bb)
> > +{
> > +  if (EDGE_COUNT (pred_bb->succs) != 2)
> > +    return nullptr;
> > +
> > +  if (EDGE_SUCC (pred_bb, 0)->dest != xor_bb)
> > +    return EDGE_SUCC (pred_bb, 0)->dest;
> > +  return EDGE_SUCC (pred_bb, 1)->dest;
> > +}
> Consider using a gcc_assert to ensure that one of the two successors is
> xor_bb.
>
> You might also want to reject edge->flags & EDGE_COMPLEX is nonzero.> +
>
>
Ok.


>
> > +/* Returns true if condition COND checks MSB/LSB bit is 1.
> > +   Otherwise, return false.  */
> > +
> > +bool
> > +crc_optimization::cond_true_is_checked_for_bit_one (const gcond *cond)
> > +{
> > +  if (!cond)
> > +    return false;
> > +
> > +  tree rhs = gimple_cond_rhs (cond);
> > +  enum tree_code code = gimple_cond_code (cond);
> > +
> > +  /* If the condition is something == 1 -> return true.  */
> > +  if (integer_onep (rhs) && code == EQ_EXPR)
> > +    return true;
> Generally we prefer to have the cheapest test first.  So checking the
> code against EQ_EXPR first is marginally better.
>
>
Ok.


> > +
> > +  /* If the condition is something != 0  or something < 0 -> return
> true.  */
> > +  if (integer_zerop (rhs) && (code == NE_EXPR || code == LT_EXPR))
> > +    return true;
> We generally discourage having && and || conditions on the same line.
> I'd rewrite as
>    if ((code == NE_EXPR || CODE == LT_EXPR)
>        && integer_zerop (rhs))
>
> Do you need to check for EQ_EXPR with rhs having the value 0?  What
> about NE_EXPR with the value 1?
>
>
I think there is no need to check those cases.
If the condition isn't one of the cases checked above, then it will be the
cases you mentioned (the opposite form).

With this function, I check whether the condition verifies if the MSB of
the lhs value is one.
If yes, the xor should occur on the true branch; otherwise, it should occur
on the false branch.

 > +> +/* Checks that the condition is for checking CRC.
> > +   Returns true if xor is done under the condition of MSB/LSB being 1,
> and
> > +   the condition's variable and xor-ed variable depend on the same
> variable.
> > +   Otherwise, returns false.
> > +   XOR_BB is the basic block, where the xor operation is done.
> > +   PRED_BB is the predecessor basic block of the XOR_BB, it is assumed
> that
> > +   the last stmt of PRED_BB checks the condition under which xor is
> done.  */
> > +
> > +bool
> > +crc_optimization::crc_cond (basic_block pred_bb, basic_block xor_bb,
> > +                         basic_block *loop_bbs)
> > +{
> > +  /* Check whether PRED_BB contains condition.  We will consider only
> those
> > +     cases when xor is done immediately under the condition.  */
> > +  gcond *cond = safe_dyn_cast<gcond *> (*gsi_last_bb (pred_bb));
> Hmm.  I think this works, but it's a bit obscure as it depends on
> operator* returning a pointer to the statement.  I didn't even know that
> worked.
>
> Something like this might be more obvious:
>
> gsi_stmt (gsi_last_bb (pred_bb)) should give you the last statement.
> Then you can see if it's a gimple cond.
>
>
Thanks.


>
> > +
> > +/* Returns true if the statement with STMT_CODE may occur in CRC
> calculation.
> > +   Otherwise, returns false.  */
> > +
> > +bool
> > +crc_optimization::is_acceptable_stmt_code (const tree_code &stmt_code)
> > +{
> > +  return stmt_code == BIT_IOR_EXPR
> > +      || stmt_code == BIT_AND_EXPR
> > +      || stmt_code == BIT_XOR_EXPR
> > +      || stmt_code == MINUS_EXPR
> > +      || stmt_code == PLUS_EXPR
> > +      || stmt_code == RSHIFT_EXPR
> > +      || stmt_code == LSHIFT_EXPR
> > +      || TREE_CODE_CLASS (stmt_code) == tcc_unary;
> Go ahead and put parenthesis around all those tests :-)
>
>
Ok)


>
> > +
> > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields.
> > +   Returns false if there are more than two (as in CRC calculation only
> CRC's
> > +   and data's phi may exist) or no phi statements in STMTS (at least
> there must
> > +   be CRC's phi).
> > +   Otherwise, returns true.  */
> > +
> > +bool
> > +crc_optimization::set_crc_and_data_phi (auto_vec<gimple *> &stmts)
> > +{
> > +  for (auto stmt_it = stmts.begin (); stmt_it != stmts.end ();
> stmt_it++)
> > +    {
> > +      if (is_a<gphi *> (*stmt_it) && bb_loop_header_p (gimple_bb
> (*stmt_it)))
> > +     {
> > +       if (!m_phi_for_crc)
> > +         m_phi_for_crc = as_a<gphi *> (*stmt_it);
> > +       else if (!m_phi_for_data)
> > +         m_phi_for_data = as_a<gphi *> (*stmt_it);
> > +       else
> > +         {
> > +           if (dump_file && (dump_flags & TDF_DETAILS))
> > +             fprintf (dump_file, "Xor-ed variable depends on more than
> 2 "
> > +                                 "phis.\n");
> > +           return false;
> > +         }
> > +     }
> > +    }
> > +  return m_phi_for_crc;
> Hmm.  For a given PHI, how do we know if it's for the data item or the
> crc item, or something else (like a loop counter) entirely?
>


I trace the def-use chain upwards from the XOR statement to determine which
PHI node corresponds to CRC and data.
Since we assume the loop calculates CRC, I expect only variables
representing data and CRC to participate in these operations.
In the implementations I support, the loop counter is used only for the
iteration.
Any misidentification of CRC and data would occur only if the loop doesn't
calculate CRC, in which case next checks would fail, leading the algorithm
to identify it as not CRC.

Here, the PHI nodes for CRC and data might be mixed in places.
I just assume that the first found PHI is CRC, second data.
I correctly determine them later with the *swap_crc_and_data_if_needed*
 function.


>
>

> > diff --git a/gcc/gimple.cc b/gcc/gimple.cc
> > index a9f968cb038..76518d8897d 100644
> > --- a/gcc/gimple.cc
> > +++ b/gcc/gimple.cc
> > @@ -3426,6 +3426,44 @@ gimple_or_expr_nonartificial_location (gimple
> *stmt, tree expr)
> >    return expansion_point_location_if_in_system_header (loc);
> >  }
> >
> > +/* Set GIMPLE_PHI statements of the BB not visited.  */
> > +
> > +void
> > +set_phi_stmts_not_visited (basic_block bb)
> [ ... ]
> Something tells me we probably need to convert some code to use these
> new API points.  Not a requirement for this to go forward, but a useful
> cleanup.
>

Sure, I don't recall finding such cases before, but I'll take a closer look
later.


>
> Overall it looks pretty good.  I don't see anything that looks horribly
> wrong.  My biggest worry is that set_crc_and_data_phi routine.  But
> there's other small things noted above that we need to clean up.
>
> So let's get the questions answered and some cleanups done, then repost.
>
>
Ok. Thanks,
Mariam


> Thanks,
> Jeff
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-24  8:42 [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection Mariam Arutunian
  2024-05-28  4:20 ` Jeff Law
  2024-05-28  7:01 ` Richard Biener
@ 2024-05-29 12:43 ` David Malcolm
  2 siblings, 0 replies; 12+ messages in thread
From: David Malcolm @ 2024-05-29 12:43 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches

On Fri, 2024-05-24 at 12:42 +0400, Mariam Arutunian wrote:
> This patch adds a new compiler pass aimed at identifying naive CRC
> implementations,
> characterized by the presence of a loop calculating a CRC (polynomial
> long
> division).
> Upon detection of a potential CRC, the pass prints an informational
> message.
> 
> Performs CRC optimization if optimization level is >= 2,
> besides optimizations for size and if fno_gimple_crc_optimization
> given.
> 
> This pass is added for the detection and optimization of naive CRC
> implementations,
> improving the efficiency of CRC-related computations.
> 
> This patch includes only initial fast checks for filtering out non-
> CRCs,
> detected possible CRCs verification and optimization parts will be
> provided
> in subsequent patches.
> 
>   gcc/
> 
>     * Makefile.in (OBJS): Add gimple-crc-optimization.o.
>     * common.opt (fgimple-crc-optimization): New option.
>     * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.

A minor nitpick: patches that add new options (and their documentation)
ought to affect the corresponding .opt.urls, so that we can map from
the new option to the URL of its documentation.

Running "make regenerate-opt-urls" in the build/gcc subdirectory ought
to update common.opt.urls for you (provided you've done a "make html").

[...snip...]

Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-28  7:01 ` Richard Biener
@ 2024-05-29 22:30   ` Jeff Law
  2024-05-30  7:33     ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2024-05-29 22:30 UTC (permalink / raw)
  To: gcc-patches



On 5/28/24 1:01 AM, Richard Biener wrote:
> On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
> <mariamarutunian@gmail.com> wrote:
>>
>> This patch adds a new compiler pass aimed at identifying naive CRC implementations,
>> characterized by the presence of a loop calculating a CRC (polynomial long division).
>> Upon detection of a potential CRC, the pass prints an informational message.
>>
>> Performs CRC optimization if optimization level is >= 2,
>> besides optimizations for size and if fno_gimple_crc_optimization given.
>>
>> This pass is added for the detection and optimization of naive CRC implementations,
>> improving the efficiency of CRC-related computations.
>>
>> This patch includes only initial fast checks for filtering out non-CRCs,
>> detected possible CRCs verification and optimization parts will be provided in subsequent patches.
> 
> Just a few quick questions - I'm waiting for a revision with Jeffs 
> comments cleared before having a closer look.  The patch does
> nothing but analyze right now, correct?  I assume a later patch will
> fill in stuff in ::execute and use the return value of
> loop_may_calculate_crc (it's a bit odd to review such a "split"
> thing).
We split it up on functional chunks.  I think if it gets approved it 
probably should go in atomically since it makes no sense to commit the 
first pass recognition filter without the validation step or the 
validation step without the codegen step.

So consider the break down strictly for review convenience.


> 
> I think what this does fits final value replacement which lives in 
> tree-scalar-evolution.cc and works from the loop-closed PHIs, trying
> to replace those.  I'm not sure we want to have a separate pass for
> this.  Consider a loop calculating two or four CRCs in parallel, 
> replacing LC PHIs one-by-one should be able to handle this.
I suspect that'll be quite hard for both the "does this generally look 
like a CRC loop" code as well as the "validate this is a CRC loop" code.

Mariam, your thoughts on whether or not those two phases could handle a 
loop with two CRC calculations inside, essentially creating two calls to 
our new builtins?

Jeff



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-29 22:30   ` Jeff Law
@ 2024-05-30  7:33     ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2024-05-30  7:33 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches



> Am 30.05.2024 um 00:31 schrieb Jeff Law <jeffreyalaw@gmail.com>:
> 
> 
> 
>> On 5/28/24 1:01 AM, Richard Biener wrote:
>>> On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
>>> <mariamarutunian@gmail.com> wrote:
>>> 
>>> This patch adds a new compiler pass aimed at identifying naive CRC implementations,
>>> characterized by the presence of a loop calculating a CRC (polynomial long division).
>>> Upon detection of a potential CRC, the pass prints an informational message.
>>> 
>>> Performs CRC optimization if optimization level is >= 2,
>>> besides optimizations for size and if fno_gimple_crc_optimization given.
>>> 
>>> This pass is added for the detection and optimization of naive CRC implementations,
>>> improving the efficiency of CRC-related computations.
>>> 
>>> This patch includes only initial fast checks for filtering out non-CRCs,
>>> detected possible CRCs verification and optimization parts will be provided in subsequent patches.
>> Just a few quick questions - I'm waiting for a revision with Jeffs comments cleared before having a closer look.  The patch does
>> nothing but analyze right now, correct?  I assume a later patch will
>> fill in stuff in ::execute and use the return value of
>> loop_may_calculate_crc (it's a bit odd to review such a "split"
>> thing).
> We split it up on functional chunks.  I think if it gets approved it probably should go in atomically since it makes no sense to commit the first pass recognition filter without the validation step or the validation step without the codegen step.
> 
> So consider the break down strictly for review convenience.
> 
> 
>> I think what this does fits final value replacement which lives in tree-scalar-evolution.cc and works from the loop-closed PHIs, trying
>> to replace those.  I'm not sure we want to have a separate pass for
>> this.  Consider a loop calculating two or four CRCs in parallel, replacing LC PHIs one-by-one should be able to handle this.
> I suspect that'll be quite hard for both the "does this generally look like a CRC loop" code as well as the "validate this is a CRC loop" code.
> 
> Mariam, your thoughts on whether or not those two phases could handle a loop with two CRC calculations inside, essentially creating two calls to our new builtins?

The key would be to only simulate the use-def cycle from the loop-closed PHI (plus the loop control of course, but miter/SCEV should be enough there) and just replace that LC PHI, leaving loop DCE to DCE.

If we really want a separate pass (or utility to work on a single loop) then we might consider moving some of the final value replacement code that doesn’t work with only SCEV there as well.  There’s also special code in loop distribution for strlen recognition now, not exactly fitting in.

Note I had patches to do final value replacement on demand from CD-DCE when it figures a loop has no side effects besides of its reduction outputs (still want to pick this up at some point again).

Richard 

> Jeff
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-05-29 11:12   ` Mariam Arutunian
@ 2024-06-08 22:00     ` Jeff Law
  2024-06-19 15:27       ` Mariam Arutunian
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2024-06-08 22:00 UTC (permalink / raw)
  To: Mariam Arutunian; +Cc: GCC Patches



On 5/29/24 5:12 AM, Mariam Arutunian wrote:

> 
>     IIRC we looked at the problem of canonicalizing the loop into a form
>     where we didn't necessarily have conditional blocks, instead we had
>     branchless sequences for the conditional xor and dealing with the high
>     bit in the crc.  My recollection was that the coremark CRC loop would
>     always canonicalize, but that in general we still saw multiple CRC
>     implementations that did not canonicalize and thus we still needed the
>     more complex matching.  Correct?
> 
> 
> The loop in CoreMark is not fully canonicalized in that form,
> as there are still branches present for the conditional XOR operation.
> I checked that using the -O2 and -O3 flags.
A bit of a surprise.  Though it may be the case that some of the 
canonicalization steps are happening later in the pipeline.  No worries 
as I think we'd already concluded that we'd see at least some CRC 
implementations that wouldn't canonicalize down to branchless sequences 
for the conditional xor.


> 
> 
>      > +
>      > +gimple *
>      > +crc_optimization::find_shift_after_xor (tree xored_crc)
>      > +{
>      > +  imm_use_iterator imm_iter;
>      > +  use_operand_p use_p;
>      > +
>      > +  if (TREE_CODE (xored_crc) != SSA_NAME)
>      > +    return nullptr;
>     If we always expect XORED_CRC to be an SSA_NAME, we might be able to use
>     gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME);
> 
> I'm not sure that it always has to be an SSA_NAME.
For a logical operation like XOR it should always have the form

SSA_NAME = SSA_NAME ^ (SSA_NAME | CONSTANT)

The constant might be a vector  constant, but the basic form won't 
change.  It's one of the nicer properties of gimple.  In contrast RTL 
would allow a variety of lvalues and rvalues, including MEMs, REGs, 
SUBREGs, extensions, other binary ops, etc etc.


>      > +
>      > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields.
>      > +   Returns false if there are more than two (as in CRC
>     calculation only CRC's
>      > +   and data's phi may exist) or no phi statements in STMTS (at
>     least there must
>      > +   be CRC's phi).
>      > +   Otherwise, returns true.  */
>      > +
>      > +bool
>      > +crc_optimization::set_crc_and_data_phi (auto_vec<gimple *> &stmts)
>      > +{
>      > +  for (auto stmt_it = stmts.begin (); stmt_it != stmts.end ();
>     stmt_it++)
>      > +    {
>      > +      if (is_a<gphi *> (*stmt_it) && bb_loop_header_p (gimple_bb
>     (*stmt_it)))
>      > +     {
>      > +       if (!m_phi_for_crc)
>      > +         m_phi_for_crc = as_a<gphi *> (*stmt_it);
>      > +       else if (!m_phi_for_data)
>      > +         m_phi_for_data = as_a<gphi *> (*stmt_it);
>      > +       else
>      > +         {
>      > +           if (dump_file && (dump_flags & TDF_DETAILS))
>      > +             fprintf (dump_file, "Xor-ed variable depends on
>     more than 2 "
>      > +                                 "phis.\n");
>      > +           return false;
>      > +         }
>      > +     }
>      > +    }
>      > +  return m_phi_for_crc;
>     Hmm.  For a given PHI, how do we know if it's for the data item or the
>     crc item, or something else (like a loop counter) entirely?
> 
> 
> 
> I trace the def-use chain upwards from the XOR statement to determine 
> which PHI node corresponds to CRC and data.
> Since we assume the loop calculates CRC, I expect only variables 
> representing data and CRC to participate in these operations.
> In the implementations I support, the loop counter is used only for the 
> iteration.
> Any misidentification of CRC and data would occur only if the loop 
> doesn't calculate CRC, in which case next checks would fail, leading the 
> algorithm to identify it as not CRC.
> 
> Here, the PHI nodes for CRC and data might be mixed in places.
> I just assume that the first found PHI is CRC, second data.
> I correctly determine them later with the | 
> *swap_crc_and_data_if_needed*| function.
Ah, OK.  That probably deserves a comment in this code.


jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-06-08 22:00     ` Jeff Law
@ 2024-06-19 15:27       ` Mariam Arutunian
  0 siblings, 0 replies; 12+ messages in thread
From: Mariam Arutunian @ 2024-06-19 15:27 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4825 bytes --]

On Sun, Jun 9, 2024 at 2:00 AM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 5/29/24 5:12 AM, Mariam Arutunian wrote:
>
> >
> >     IIRC we looked at the problem of canonicalizing the loop into a form
> >     where we didn't necessarily have conditional blocks, instead we had
> >     branchless sequences for the conditional xor and dealing with the
> high
> >     bit in the crc.  My recollection was that the coremark CRC loop would
> >     always canonicalize, but that in general we still saw multiple CRC
> >     implementations that did not canonicalize and thus we still needed
> the
> >     more complex matching.  Correct?
> >
> >
> > The loop in CoreMark is not fully canonicalized in that form,
> > as there are still branches present for the conditional XOR operation.
> > I checked that using the -O2 and -O3 flags.
> A bit of a surprise.  Though it may be the case that some of the
> canonicalization steps are happening later in the pipeline.  No worries
> as I think we'd already concluded that we'd see at least some CRC
> implementations that wouldn't canonicalize down to branchless sequences
> for the conditional xor.
>

Sorry, I had checked incorrectly. I had checked my added GCC test
(crc-5.c), where I call both the optimized and non-optimized versions, so I
mistakenly checked the non-optimized function.
But, yes, in my pass, the function isn't canonicalized; it is canonicalized
later.


>
>
> >
> >
> >      > +
> >      > +gimple *
> >      > +crc_optimization::find_shift_after_xor (tree xored_crc)
> >      > +{
> >      > +  imm_use_iterator imm_iter;
> >      > +  use_operand_p use_p;
> >      > +
> >      > +  if (TREE_CODE (xored_crc) != SSA_NAME)
> >      > +    return nullptr;
> >     If we always expect XORED_CRC to be an SSA_NAME, we might be able to
> use
> >     gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME);
> >
> > I'm not sure that it always has to be an SSA_NAME.
> For a logical operation like XOR it should always have the form
>
> SSA_NAME = SSA_NAME ^ (SSA_NAME | CONSTANT)
>
> The constant might be a vector  constant, but the basic form won't
> change.  It's one of the nicer properties of gimple.  In contrast RTL
> would allow a variety of lvalues and rvalues, including MEMs, REGs,
> SUBREGs, extensions, other binary ops, etc etc.
>

Ok. Thanks for the explanation.


>
>
> >      > +
> >      > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields.
> >      > +   Returns false if there are more than two (as in CRC
> >     calculation only CRC's
> >      > +   and data's phi may exist) or no phi statements in STMTS (at
> >     least there must
> >      > +   be CRC's phi).
> >      > +   Otherwise, returns true.  */
> >      > +
> >      > +bool
> >      > +crc_optimization::set_crc_and_data_phi (auto_vec<gimple *>
> &stmts)
> >      > +{
> >      > +  for (auto stmt_it = stmts.begin (); stmt_it != stmts.end ();
> >     stmt_it++)
> >      > +    {
> >      > +      if (is_a<gphi *> (*stmt_it) && bb_loop_header_p (gimple_bb
> >     (*stmt_it)))
> >      > +     {
> >      > +       if (!m_phi_for_crc)
> >      > +         m_phi_for_crc = as_a<gphi *> (*stmt_it);
> >      > +       else if (!m_phi_for_data)
> >      > +         m_phi_for_data = as_a<gphi *> (*stmt_it);
> >      > +       else
> >      > +         {
> >      > +           if (dump_file && (dump_flags & TDF_DETAILS))
> >      > +             fprintf (dump_file, "Xor-ed variable depends on
> >     more than 2 "
> >      > +                                 "phis.\n");
> >      > +           return false;
> >      > +         }
> >      > +     }
> >      > +    }
> >      > +  return m_phi_for_crc;
> >     Hmm.  For a given PHI, how do we know if it's for the data item or
> the
> >     crc item, or something else (like a loop counter) entirely?
> >
> >
> >
> > I trace the def-use chain upwards from the XOR statement to determine
> > which PHI node corresponds to CRC and data.
> > Since we assume the loop calculates CRC, I expect only variables
> > representing data and CRC to participate in these operations.
> > In the implementations I support, the loop counter is used only for the
> > iteration.
> > Any misidentification of CRC and data would occur only if the loop
> > doesn't calculate CRC, in which case next checks would fail, leading the
> > algorithm to identify it as not CRC.
> >
> > Here, the PHI nodes for CRC and data might be mixed in places.
> > I just assume that the first found PHI is CRC, second data.
> > I correctly determine them later with the |
> > *swap_crc_and_data_if_needed*| function.
> Ah, OK.  That probably deserves a comment in this code.
>
>
Ok. I'll add a comment.


Thanks,
Mariam


>
> jeff
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-06-08 21:47 ` Jeff Law
@ 2024-06-19 14:29   ` Mariam Arutunian
  0 siblings, 0 replies; 12+ messages in thread
From: Mariam Arutunian @ 2024-06-19 14:29 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3753 bytes --]

On Sun, Jun 9, 2024 at 1:48 AM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 6/4/24 7:41 AM, Mariam Arutunian wrote:
> >/Mariam, your thoughts on whether or not those two phases could handle a
> > loop with two CRC calculations inside, essentially creating two calls to
> > our new builtins? /
> >
> > /
> > /
> >
> > It is feasible, but it would likely demand considerable effort and
> > additional work to implement effectively.
> Thanks for the confirmation.  I suspect it likely doesn't come up often
> in practice either.
>
>
> >
> >>The key would be to only simulate the use-def cycle from the loop-closed
> PHI (plus the loop control of course, but miter/SCEV should be enough
> there) and just replace that LC PHI, leaving loop DCE to DCE.
> >
> > Thank you, this is a good idea to just replace the PHI and leave the
> loop to DCE to remove only single CRC parts.
> It does seem like replacing the PHI when we have an optimizable case
> might simplify that aspect of the implementation.
>

Yes.


>
>
> >
> > The current pass only verifies cases where a single CRC calculation is
> performed within the loop. During the verification phase,
> > I ensure that there are no other calculations aside from those necessary
> for the considered CRC computation.
> >
> > Also, when I was investigating the bitwise CRC implementations used in
> different software, in all cases the loop was calculating just one CRC and
> no other calculations were done.
> > Thus, in almost all cases, the first phase will filter out non-CRCs, and
> during the second phase, only real CRCs with no other calculations will be
> executed.
> > This ensures that unnecessary statements won't be executed in most cases.
> But we may have had a degree of sampling bias here.  If I remember
> correctly I used the initial filtering pass as the "trigger" to report a
> potential CRC case.  If that initial filtering pass rejected cases with
> other calculations in the loop, then we never would have seen those.


Yes, you used initial filtering, but in that process, I do checks with
use-def chains. So, if there were separate computations not connected with
each other, the initial filtering most probably wouldn't reject those.


>
>
> > Leaving the loop to DCE will simplify the process of removing parts
> connected to a single CRC calculation.
> > However, since now we detect a loop that only calculates a single CRC,
> we can entirely remove it at this stage without additional checks.
> Let's evaluate this option as we get to the later patches in the series.
>   What I like about Richard's suggestion is that it "just works" and it
> will continue to work, even as the overall infrastructure changes.  In
> contrast a bespoke loop removal implementation in a specific pass may
> need adjustment if other aspects of our infrastructure change.
>

Ok.


>
> >If we really want a separate pass (or utility to work on a single
> > loop) then we might consider moving some of the final value replacement
> > code that doesn’t work with only SCEV there as well. There’s also
> > special code in loop distribution for strlen recognition now, not
> > exactly fitting in. >
> >
> >>Note I had patches to do final value replacement on demand from CD-DCE
> when it figures a loop has no side effects besides of its reduction outputs
> (still want to pick this up at some point again).
> >
> > Oh, this could provide useful insights for our implementation.
> Are you thinking of reusing that on-demand analysis to reduce the set of
> loops we analyze?
>

I thought its parts would be helpful in future changes of the verification
algorithm.

Thanks,
Mariam


> Jeff
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
  2024-06-04 13:41 Mariam Arutunian
@ 2024-06-08 21:47 ` Jeff Law
  2024-06-19 14:29   ` Mariam Arutunian
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2024-06-08 21:47 UTC (permalink / raw)
  To: Mariam Arutunian, Richard Biener; +Cc: GCC Patches



On 6/4/24 7:41 AM, Mariam Arutunian wrote:
>/Mariam, your thoughts on whether or not those two phases could handle a 
> loop with two CRC calculations inside, essentially creating two calls to 
> our new builtins? /
> 
> /
> /
> 
> It is feasible, but it would likely demand considerable effort and 
> additional work to implement effectively.
Thanks for the confirmation.  I suspect it likely doesn't come up often 
in practice either.


> 
>>The key would be to only simulate the use-def cycle from the loop-closed PHI (plus the loop control of course, but miter/SCEV should be enough there) and just replace that LC PHI, leaving loop DCE to DCE.
> 
> Thank you, this is a good idea to just replace the PHI and leave the loop to DCE to remove only single CRC parts.
It does seem like replacing the PHI when we have an optimizable case 
might simplify that aspect of the implementation.


> 
> The current pass only verifies cases where a single CRC calculation is performed within the loop. During the verification phase,
> I ensure that there are no other calculations aside from those necessary for the considered CRC computation.
> 
> Also, when I was investigating the bitwise CRC implementations used in different software, in all cases the loop was calculating just one CRC and no other calculations were done.
> Thus, in almost all cases, the first phase will filter out non-CRCs, and during the second phase, only real CRCs with no other calculations will be executed.
> This ensures that unnecessary statements won't be executed in most cases.
But we may have had a degree of sampling bias here.  If I remember 
correctly I used the initial filtering pass as the "trigger" to report a 
potential CRC case.  If that initial filtering pass rejected cases with 
other calculations in the loop, then we never would have seen those.

> 
> Leaving the loop to DCE will simplify the process of removing parts connected to a single CRC calculation.
> However, since now we detect a loop that only calculates a single CRC, we can entirely remove it at this stage without additional checks.
Let's evaluate this option as we get to the later patches in the series. 
  What I like about Richard's suggestion is that it "just works" and it 
will continue to work, even as the overall infrastructure changes.  In 
contrast a bespoke loop removal implementation in a specific pass may 
need adjustment if other aspects of our infrastructure change.





>If we really want a separate pass (or utility to work on a single 
> loop) then we might consider moving some of the final value replacement 
> code that doesn’t work with only SCEV there as well. There’s also 
> special code in loop distribution for strlen recognition now, not 
> exactly fitting in. >
> 
>>Note I had patches to do final value replacement on demand from CD-DCE when it figures a loop has no side effects besides of its reduction outputs (still want to pick this up at some point again).
> 
> Oh, this could provide useful insights for our implementation.
Are you thinking of reusing that on-demand analysis to reduce the set of 
loops we analyze?

Jeff


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection
@ 2024-06-04 13:41 Mariam Arutunian
  2024-06-08 21:47 ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Mariam Arutunian @ 2024-06-04 13:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 4958 bytes --]

Sorry for the late response; somehow, I didn't receive the last few messages.

>>* Am 30.05.2024 um 00:31 schrieb Jeff Law <jeffreyalaw@gmail.com <jeffreyalaw@gmail.com>>:
*>> >>* 
*>> >>>* On 5/28/24 1:01 AM, Richard Biener wrote:
*>>>>* On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
*>>>>* <mariamarutunian@gmail.com <mariamarutunian@gmail.com>> wrote:
*>>>> >>>>* This patch adds a new compiler pass aimed at identifying
naive CRC implementations,
*>>>>* characterized by the presence of a loop calculating a CRC
(polynomial long division).
*>>>>* Upon detection of a potential CRC, the pass prints an
informational message.
*>>>> >>>>* Performs CRC optimization if optimization level is >= 2,
*>>>>* besides optimizations for size and if fno_gimple_crc_optimization given.
*>>>> >>>>* This pass is added for the detection and optimization of
naive CRC implementations,
*>>>>* improving the efficiency of CRC-related computations.
*>>>> >>>>* This patch includes only initial fast checks for filtering
out non-CRCs,
*>>>>* detected possible CRCs verification and optimization parts will
be provided in subsequent patches.
*>>>* Just a few quick questions - I'm waiting for a revision with
Jeffs comments cleared before having a closer look.  The patch does
*>>>* nothing but analyze right now, correct?  I assume a later patch will
*>>>* fill in stuff in ::execute and use the return value of
*>>>* loop_may_calculate_crc (it's a bit odd to review such a "split"
*>>>* thing).
*>>* We split it up on functional chunks.  I think if it gets approved
it probably should go in atomically since it makes no sense to commit
the first pass recognition filter without the validation step or the
validation step without the codegen step.
*>> >>* So consider the break down strictly for review convenience.
*>> >> >>>* I think what this does fits final value replacement which
lives in tree-scalar-evolution.cc and works from the loop-closed PHIs,
trying
*>>>* to replace those.  I'm not sure we want to have a separate pass for
*>>>* this.  Consider a loop calculating two or four CRCs in parallel,
replacing LC PHIs one-by-one should be able to handle this.
*>>* I suspect that'll be quite hard for both the "does this generally
look like a CRC loop" code as well as the "validate this is a CRC
loop" code.
*>> >>* Mariam, your thoughts on whether or not those two phases could
handle a loop with two CRC calculations inside, essentially creating
two calls to our new builtins?
*

It is feasible, but it would likely demand considerable effort and
additional work to implement effectively.

>The key would be to only simulate the use-def cycle from the loop-closed PHI (plus the loop control of course, but miter/SCEV should be enough there) and just replace that LC PHI, leaving loop DCE to DCE.

Thank you, this is a good idea to just replace the PHI and leave the
loop to DCE to remove only single CRC parts.
However, if you mean using the symbolic executor to only simulate the
use-def cycle and loop control, instead of the whole control flow of
the loop, it won't be sufficient.
Some calculations may occur under certain conditions, which must also
be taken into account.

This situation is solvable, but it requires more complex work to
create a graph of the statements that must be executed.
We can consider this as a potential improvement for later.

The current pass only verifies cases where a single CRC calculation is
performed within the loop. During the verification phase,
I ensure that there are no other calculations aside from those
necessary for the considered CRC computation.

Also, when I was investigating the bitwise CRC implementations used in
different software, in all cases the loop was calculating just one CRC
and no other calculations were done.
Thus, in almost all cases, the first phase will filter out non-CRCs,
and during the second phase, only real CRCs with no other calculations
will be executed.
This ensures that unnecessary statements won't be executed in most cases.

Leaving the loop to DCE will simplify the process of removing parts
connected to a single CRC calculation.
However, since now we detect a loop that only calculates a single CRC,
we can entirely remove it at this stage without additional checks.

>If we really want a separate pass (or utility to work on a single loop) then we might consider moving some of the final value replacement code that doesn’t work with only SCEV there as well.  There’s also special code in loop distribution for strlen recognition now, not exactly fitting in.
>

>Note I had patches to do final value replacement on demand from CD-DCE when it figures a loop has no side effects besides of its reduction outputs (still want to pick this up at some point again).

Oh, this could provide useful insights for our implementation.

Thanks,
Mariam

>Richard > >>* Jeff *>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-06-19 15:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-24  8:42 [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection Mariam Arutunian
2024-05-28  4:20 ` Jeff Law
2024-05-29 11:12   ` Mariam Arutunian
2024-06-08 22:00     ` Jeff Law
2024-06-19 15:27       ` Mariam Arutunian
2024-05-28  7:01 ` Richard Biener
2024-05-29 22:30   ` Jeff Law
2024-05-30  7:33     ` Richard Biener
2024-05-29 12:43 ` David Malcolm
2024-06-04 13:41 Mariam Arutunian
2024-06-08 21:47 ` Jeff Law
2024-06-19 14:29   ` Mariam Arutunian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).