public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation
@ 2024-05-24  8:41 Mariam Arutunian
  2024-05-25 17:40 ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Mariam Arutunian @ 2024-05-24  8:41 UTC (permalink / raw)
  To: GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 1578 bytes --]

Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>

gcc/

   * doc/md.texi (crc@var{m}@var{n}4,
   crc_rev@var{m}@var{n}4): Document.
   * expr.cc (generate_crc_table): New function.
   (calculate_table_based_CRC): Likewise.
   (expand_crc_table_based): Likewise.
   (gen_common_operation_to_reflect): Likewise.
   (reflect_64_bit_value): Likewise.
   (reflect_32_bit_value): Likewise.
   (reflect_16_bit_value): Likewise.
   (reflect_8_bit_value): Likewise.
   (generate_reflecting_code_standard): Likewise.
   (expand_reversed_crc_table_based): Likewise.
   * expr.h (generate_reflecting_code_standard): New function declaration.
   (expand_crc_table_based): Likewise.
   (expand_reversed_crc_table_based): Likewise.
   * internal-fn.cc: (crc_direct): Define.
   (direct_crc_optab_supported_p): Likewise.
   (expand_crc_optab_fn): New function
   * internal-fn.def (CRC, CRC_REV): New internal functions.
   * optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>

[-- Attachment #2: 0001-Implement-internal-functions-for-efficient-CRC-compu.patch --]
[-- Type: application/x-patch, Size: 18695 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation
  2024-05-24  8:41 [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation Mariam Arutunian
@ 2024-05-25 17:40 ` Jeff Law
  2024-05-27 13:51   ` Mariam Arutunian
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Law @ 2024-05-25 17:40 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches



On 5/24/24 2:41 AM, Mariam Arutunian wrote:
> Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster 
> CRC generation.
> One performs bit-forward and the other bit-reversed CRC computation.
> If CRC optabs are supported, they are used for the CRC computation.
> Otherwise, table-based CRC is generated.
> The supported data and CRC sizes are 8, 16, 32, and 64 bits.
> The polynomial is without the leading 1.
> A table with 256 elements is used to store precomputed CRCs.
> For the reflection of inputs and the output, a simple algorithm involving
> SHIFT, AND, and OR operations is used.
> 
> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com 
> <mailto:joern.rennecke@embecosm.com>>
> 
> gcc/
> 
>     * doc/md.texi (crc@var{m}@var{n}4,
>     crc_rev@var{m}@var{n}4): Document.
>     * expr.cc (generate_crc_table): New function.
>     (calculate_table_based_CRC): Likewise.
>     (expand_crc_table_based): Likewise.
>     (gen_common_operation_to_reflect): Likewise.
>     (reflect_64_bit_value): Likewise.
>     (reflect_32_bit_value): Likewise.
>     (reflect_16_bit_value): Likewise.
>     (reflect_8_bit_value): Likewise.
>     (generate_reflecting_code_standard): Likewise.
>     (expand_reversed_crc_table_based): Likewise.
>     * expr.h (generate_reflecting_code_standard): New function declaration.
>     (expand_crc_table_based): Likewise.
>     (expand_reversed_crc_table_based): Likewise.
>     * internal-fn.cc: (crc_direct): Define.
>     (direct_crc_optab_supported_p): Likewise.
>     (expand_crc_optab_fn): New function
>     * internal-fn.def (CRC, CRC_REV): New internal functions.
>     * optabs.def (crc_optab, crc_rev_optab): New optabs.
> 
> Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com 
> <mailto:mariamarutunian@gmail.com>>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..be68ef860f9 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with operand 2.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{crc@var{m}@var{n}4} instruction pattern
> +@item @samp{crc@var{m}@var{n}4}
> +Calculate a bit-forward CRC using operands 1, 2 and 3,
> +then store the result in operand 0.
> +Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
> +polynomial without leading 1.
> +Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
> +both modes are integers.  The size of CRC to be calculated is determined by the
> +mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
> +
> +@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
> +@item @samp{crc_rev@var{m}@var{n}4}
> +Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
> +
So just to be clear, this is a case where the input (operand 2) may have 
a different mode than the output (operand 0).  That scenario is 
generally discouraged, with a few exceptions (the most common being 
shift counts which are often QImode objects while the 
value-to-be-shifted and the output value are potentially any scalar 
integer mode.

So I don't think this is a problem, just wanted to point it out to 
anyone else that may be looking at this code.


>  @end table
>  
>  @end ifset
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 1baa39b98eb..18368ae6b6c 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)
>  
>    return tree_to_shwi (size);
>  }
> +
> +/* Calculate CRC for the initial CRC and given POLYNOMIAL.
> +   CRC_BITS is CRC size.  */
> +
> +static unsigned HOST_WIDE_INT
> +calculate_crc (unsigned HOST_WIDE_INT crc,
> +	      unsigned HOST_WIDE_INT polynomial,
> +	      unsigned crc_bits)
Just a nit.  Line up the polynomial & crc_bits declarations with the crc 
declaration.


> +{
> +  crc = crc << (crc_bits - 8);
> +  for (int i = 8; i > 0; --i)
> +    {
> +      if ((crc >> (crc_bits - 1)) & 1)
> +	crc = (crc << 1) ^ polynomial;
> +      else
> +	crc <<= 1;
> +    }
> +
> +  crc <<=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
> +  crc >>=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
Another nit.  Just once space after the <<= or >>= operators.


> +
> +  return crc;
> +}
> +
> +/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
> +   given ID.
> +   ID is the identifier of the table, the name of the table is unique,
> +   contains CRC size and the polynomial.
> +   POLYNOM is the polynomial used to calculate the CRC table's elements.
> +   CRC_BITS is the size of CRC, may be 8, 16, ... . */
> +
> +rtx
> +assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
> +{
> +  unsigned table_el_n = 0x100;
> +  tree ar = build_array_type (make_unsigned_type (crc_bits),
> +			      build_index_type (size_int (table_el_n - 1)));
Nit.  Line up build_index_type at the same indention as make_unsigned_type.

Note that with TREE_READONLY set, there is at least some chance that the 
linker will find identical tables and merge them.  I haven't tested 
this, but I know it happens for other objects in the constant pools.


> +  sprintf (buf, "crc_table_for_crc_%u_polynomial_" HOST_WIDE_INT_PRINT_HEX,
> +	   crc_bits, polynom);
Another formatting nit.  Line up the arguments when you need to wrap a 
function call.  ie

foo (arg1, arg2....
      arg3, arg4....)

> +
> +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
> +   POLYNOMIAL (without leading 1).
> +
> +   First, using POLYNOMIAL's value generates CRC table of 256 elements,
> +   then generates the assembly for the following code,
> +   where crc_size and data_size may be 8, 16, 32, 64, depending on CRC:
> +
> +     for (int i = 0; i < data_size / 8; i++)
> +       crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8))
> +				^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
> +
> +   So to take values from the table, we need 8-bit data.
> +   If input data size is not 8, then first we extract upper 8 bits,
> +   then the other 8 bits, and so on.  */
> +
> +void
> +calculate_table_based_CRC (rtx *crc, const rtx &input_data,
> +			   const rtx &polynomial,
> +			   machine_mode crc_mode, machine_mode data_mode)
Same nit as before.  Line up the parameters.   It looks like we need to 
check for that throughout this function when you wrapped function 
argument lists.



> +
> +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
> +   POLYNOMIAL (without leading 1).
> +
> +   CRC is OP1, data is OP2 and the polynomial is OP3.
> +   This must generate a CRC table and an assembly for the following code,
> +   where crc_size and data_size may be 8, 16, 32, 64:
> +   uint_crc_size_t
> +   crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data, size_t size)
> +   {
> +     uint_crc_size_t crc = crc_init;
> +     for (int i = 0; i < data_size / 8; i++)
> +       crc = (crc << 8)
> +       ^ crc_table[(crc >> (crc_size - 8))
> +			^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
> +     return crc;
> +   }  */
> +
> +void
> +expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3,
> +			machine_mode data_mode)
> +{
> +  gcc_assert (!CONST_INT_P (op0));
> +  gcc_assert (CONST_INT_P (op3));
> +  machine_mode crc_mode = GET_MODE (op0);
> +  rtx crc = gen_reg_rtx (word_mode);
> +  convert_move (crc, op1, 0);
> +  calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode);
> +  if (crc_mode == SImode && word_mode == DImode)
> +    {
> +      rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
> +      crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
> +    }
> +  rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
The explicit checks for SI/DI mode don't look right here.  What I think 
you're looking for is crc_mode < word_mode.  If you wanted to be even 
omre precise you could check the sizes of the two modes.

Along the same lines, I don't think you want/need the call to 
simplify_gen_subreg when word_mode == crc_mode.  ISTM that you only want 
the subreg when crc_mode < word_mode.

Presumably we rejected the optimization earlier if crc_mode > word_mode?





> +  emit_move_insn (tgt, crc);
> +}
> +
> +/* Generate the common operation for reflecting values:
> +   *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >> SHIFT_VAL;  */
> +
> +void
> +gen_common_operation_to_reflect (rtx *op,
> +				 unsigned HOST_WIDE_INT and1_value,
> +				 unsigned HOST_WIDE_INT and2_value,
> +				 unsigned shift_val)
> +{
> +  rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value, word_mode),
> +			NULL_RTX);
> +  op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0);
> +  rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value, word_mode),
> +			NULL_RTX);
> +  op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1);
> +  *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0, OPTAB_DIRECT);
> +}
Same nits as elsewhere.  Line up the parameters & arguments when you 
need to wrap lines.



> +
> +/* Reflect 64-bit value for the 64-bit target.  */
> +
> +void
> +reflect_64_bit_value (rtx *op)
> +{
> +  gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF, 0xFFFFFFFF00000000,
> +				   32);
> +  gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF, 0xFFFF0000FFFF0000,
> +				   16);
> +  gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF, 0xFF00FF00FF00FF00,
> +				   8);
> +  gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F, 0xF0F0F0F0F0F0F0F0,
> +				   4);
> +  gen_common_operation_to_reflect (op, 0x3333333333333333, 0xCCCCCCCCCCCCCCCC,
> +				   2);
> +  gen_common_operation_to_reflect (op, 0x5555555555555555, 0xAAAAAAAAAAAAAAAA,
> +				   1);
Another class of nits.  In general, if you're wrapping a long line like 
these above, consider "balancing" the length of the lines a bit better. 
so instead of bringing down just the final constant, bring down the 
final two constants to a new line.  I think that's what we generally do 
elsehwere.



> +
> +  gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size);
> +  if (crc_mode == SImode && word_mode == DImode)
> +    {
> +      rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
> +      crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
> +    }
> +  rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
> +  emit_move_insn (tgt, crc);
Same concerns as in the bit-forward implementation WRT testing SI/DI 
explicitly.  Testing crc_mode < word_mode or testing their bitsizes 
seems safer.

> +/* Expand CRC call STMT.  */
> +
> +static void
> +expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab)
> +{
> +    tree lhs = gimple_call_lhs (stmt);
> +    tree rhs1 = gimple_call_arg (stmt, 0); // crc
> +    tree rhs2 = gimple_call_arg (stmt, 1); // data
> +    tree rhs3 = gimple_call_arg (stmt, 2); // polynomial
> +
> +    tree result_type = TREE_TYPE (lhs);
> +    tree data_type = TREE_TYPE (rhs2);
> +
> +    gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type));
> +    gcc_assert (word_mode >= TYPE_MODE (result_type));
> +
> +    rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +    rtx crc = expand_normal (rhs1);
> +    rtx data = expand_normal (rhs2);
> +    gcc_assert (TREE_CODE (rhs3) == INTEGER_CST);
> +    rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type),
> +				 TREE_INT_CST_LOW (rhs3));
Nit.  Looks like this code got over-indented.  Just two spaces of 
indention after the open-curly.


In general, most of the concerns with this patch are formatting nits. 
The functional concerns are with those mode checks as mentioned in a 
couple places above.  Repost once you've got an update fixing these issues.

Thanks,
Jeff



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation
  2024-05-25 17:40 ` Jeff Law
@ 2024-05-27 13:51   ` Mariam Arutunian
  2024-06-08 21:50     ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Mariam Arutunian @ 2024-05-27 13:51 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 13999 bytes --]

On Sat, May 25, 2024 at 9:40 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 5/24/24 2:41 AM, Mariam Arutunian wrote:
> > Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
> > CRC generation.
> > One performs bit-forward and the other bit-reversed CRC computation.
> > If CRC optabs are supported, they are used for the CRC computation.
> > Otherwise, table-based CRC is generated.
> > The supported data and CRC sizes are 8, 16, 32, and 64 bits.
> > The polynomial is without the leading 1.
> > A table with 256 elements is used to store precomputed CRCs.
> > For the reflection of inputs and the output, a simple algorithm involving
> > SHIFT, AND, and OR operations is used.
> >
> > Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com
> > <mailto:joern.rennecke@embecosm.com>>
> >
> > gcc/
> >
> >     * doc/md.texi (crc@var{m}@var{n}4,
> >     crc_rev@var{m}@var{n}4): Document.
> >     * expr.cc (generate_crc_table): New function.
> >     (calculate_table_based_CRC): Likewise.
> >     (expand_crc_table_based): Likewise.
> >     (gen_common_operation_to_reflect): Likewise.
> >     (reflect_64_bit_value): Likewise.
> >     (reflect_32_bit_value): Likewise.
> >     (reflect_16_bit_value): Likewise.
> >     (reflect_8_bit_value): Likewise.
> >     (generate_reflecting_code_standard): Likewise.
> >     (expand_reversed_crc_table_based): Likewise.
> >     * expr.h (generate_reflecting_code_standard): New function
> declaration.
> >     (expand_crc_table_based): Likewise.
> >     (expand_reversed_crc_table_based): Likewise.
> >     * internal-fn.cc: (crc_direct): Define.
> >     (direct_crc_optab_supported_p): Likewise.
> >     (expand_crc_optab_fn): New function
> >     * internal-fn.def (CRC, CRC_REV): New internal functions.
> >     * optabs.def (crc_optab, crc_rev_optab): New optabs.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com
> > <mailto:mariamarutunian@gmail.com>>
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 5730bda80dc..be68ef860f9 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered
> with operand 2.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{crc@var{m}@var{n}4} instruction pattern
> > +@item @samp{crc@var{m}@var{n}4}
> > +Calculate a bit-forward CRC using operands 1, 2 and 3,
> > +then store the result in operand 0.
> > +Operands 1 is the initial CRC, operands 2 is the data and operands 3 is
> the
> > +polynomial without leading 1.
> > +Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m},
> where
> > +both modes are integers.  The size of CRC to be calculated is
> determined by the
> > +mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
> > +
> > +@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
> > +@item @samp{crc_rev@var{m}@var{n}4}
> > +Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed
> CRC.
> > +
> So just to be clear, this is a case where the input (operand 2) may have
> a different mode than the output (operand 0).  That scenario is
> generally discouraged, with a few exceptions (the most common being
> shift counts which are often QImode objects while the
> value-to-be-shifted and the output value are potentially any scalar
> integer mode.
>
> So I don't think this is a problem, just wanted to point it out to
> anyone else that may be looking at this code.
>
>
> >  @end table
> >
> >  @end ifset
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 1baa39b98eb..18368ae6b6c 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)
> >
> >    return tree_to_shwi (size);
> >  }
> > +
> > +/* Calculate CRC for the initial CRC and given POLYNOMIAL.
> > +   CRC_BITS is CRC size.  */
> > +
> > +static unsigned HOST_WIDE_INT
> > +calculate_crc (unsigned HOST_WIDE_INT crc,
> > +           unsigned HOST_WIDE_INT polynomial,
> > +           unsigned crc_bits)
> Just a nit.  Line up the polynomial & crc_bits declarations with the crc
> declaration.
>
> I carefully reviewed the indentation of the code using different editors
and viewers, and everything appeared correct.
I double-checked the specific sections mentioned, and they also looked
right.
In this reply message I see that it's not correct. I'll try to fix it.

>
> > +{
> > +  crc = crc << (crc_bits - 8);
> > +  for (int i = 8; i > 0; --i)
> > +    {
> > +      if ((crc >> (crc_bits - 1)) & 1)
> > +     crc = (crc << 1) ^ polynomial;
> > +      else
> > +     crc <<= 1;
> > +    }
> > +
> > +  crc <<=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
> > +  crc >>=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
> Another nit.  Just once space after the <<= or >>= operators.
>
> Yes, this one hadn't noticed. Thanks.

> > +
> > +  return crc;
> > +}
> > +
> > +/* Assemble CRC table with 256 elements for the given POLYNOM and
> CRC_BITS with
> > +   given ID.
> > +   ID is the identifier of the table, the name of the table is unique,
> > +   contains CRC size and the polynomial.
> > +   POLYNOM is the polynomial used to calculate the CRC table's elements.
> > +   CRC_BITS is the size of CRC, may be 8, 16, ... . */
> > +
> > +rtx
> > +assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned
> crc_bits)
> > +{
> > +  unsigned table_el_n = 0x100;
> > +  tree ar = build_array_type (make_unsigned_type (crc_bits),
> > +                           build_index_type (size_int (table_el_n -
> 1)));
> Nit.  Line up build_index_type at the same indention as make_unsigned_type.
>
> Note that with TREE_READONLY set, there is at least some chance that the
> linker will find identical tables and merge them.  I haven't tested
> this, but I know it happens for other objects in the constant pools.
>
> Considering that identical tables would likely have the same name, their
merging shouldn't pose a problem in principle.
However, it's worth noting the potential issue if someone intentionally
creates different tables with the same name.

>
> > +  sprintf (buf, "crc_table_for_crc_%u_polynomial_"
> HOST_WIDE_INT_PRINT_HEX,
> > +        crc_bits, polynom);
> Another formatting nit.  Line up the arguments when you need to wrap a
> function call.  ie
>
> foo (arg1, arg2....
>       arg3, arg4....)
>
> > +
> > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
> > +   POLYNOMIAL (without leading 1).
> > +
> > +   First, using POLYNOMIAL's value generates CRC table of 256 elements,
> > +   then generates the assembly for the following code,
> > +   where crc_size and data_size may be 8, 16, 32, 64, depending on CRC:
> > +
> > +     for (int i = 0; i < data_size / 8; i++)
> > +       crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8))
> > +                             ^ (data >> (data_size - (i + 1) * 8) &
> 0xFF))];
> > +
> > +   So to take values from the table, we need 8-bit data.
> > +   If input data size is not 8, then first we extract upper 8 bits,
> > +   then the other 8 bits, and so on.  */
> > +
> > +void
> > +calculate_table_based_CRC (rtx *crc, const rtx &input_data,
> > +                        const rtx &polynomial,
> > +                        machine_mode crc_mode, machine_mode data_mode)
> Same nit as before.  Line up the parameters.   It looks like we need to
> check for that throughout this function when you wrapped function
> argument lists.
>
>

> > +
> > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
> > +   POLYNOMIAL (without leading 1).
> > +
> > +   CRC is OP1, data is OP2 and the polynomial is OP3.
> > +   This must generate a CRC table and an assembly for the following
> code,
> > +   where crc_size and data_size may be 8, 16, 32, 64:
> > +   uint_crc_size_t
> > +   crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data,
> size_t size)
> > +   {
> > +     uint_crc_size_t crc = crc_init;
> > +     for (int i = 0; i < data_size / 8; i++)
> > +       crc = (crc << 8)
> > +       ^ crc_table[(crc >> (crc_size - 8))
> > +                     ^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
> > +     return crc;
> > +   }  */
> > +
> > +void
> > +expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3,
> > +                     machine_mode data_mode)
> > +{
> > +  gcc_assert (!CONST_INT_P (op0));
> > +  gcc_assert (CONST_INT_P (op3));
> > +  machine_mode crc_mode = GET_MODE (op0);
> > +  rtx crc = gen_reg_rtx (word_mode);
> > +  convert_move (crc, op1, 0);
> > +  calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode);
> > +  if (crc_mode == SImode && word_mode == DImode)
> > +    {
> > +      rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
> > +      crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
> > +    }
> > +  rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
> The explicit checks for SI/DI mode don't look right here.  What I think
> you're looking for is crc_mode < word_mode.  If you wanted to be even
> omre precise you could check the sizes of the two modes.
>

Thanks. I'll check the sizes then, as I need to check that crc_mode size is
32 and word_mode is 64, only this one case.

Along the same lines, I don't think you want/need the call to
> simplify_gen_subreg when word_mode == crc_mode.  ISTM that you only want
> the subreg when crc_mode < word_mode.
>
> Presumably we rejected the optimization earlier if crc_mode > word_mode?
>
> Yes.

>
>
>
> > +  emit_move_insn (tgt, crc);
> > +}
> > +
> > +/* Generate the common operation for reflecting values:
> > +   *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >>
> SHIFT_VAL;  */
> > +
> > +void
> > +gen_common_operation_to_reflect (rtx *op,
> > +                              unsigned HOST_WIDE_INT and1_value,
> > +                              unsigned HOST_WIDE_INT and2_value,
> > +                              unsigned shift_val)
> > +{
> > +  rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value,
> word_mode),
> > +                     NULL_RTX);
> > +  op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0);
> > +  rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value,
> word_mode),
> > +                     NULL_RTX);
> > +  op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1);
> > +  *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0,
> OPTAB_DIRECT);
> > +}
> Same nits as elsewhere.  Line up the parameters & arguments when you
> need to wrap lines.
>
>
>
> > +
> > +/* Reflect 64-bit value for the 64-bit target.  */
> > +
> > +void
> > +reflect_64_bit_value (rtx *op)
> > +{
> > +  gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF,
> 0xFFFFFFFF00000000,
> > +                                32);
> > +  gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF,
> 0xFFFF0000FFFF0000,
> > +                                16);
> > +  gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF,
> 0xFF00FF00FF00FF00,
> > +                                8);
> > +  gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F,
> 0xF0F0F0F0F0F0F0F0,
> > +                                4);
> > +  gen_common_operation_to_reflect (op, 0x3333333333333333,
> 0xCCCCCCCCCCCCCCCC,
> > +                                2);
> > +  gen_common_operation_to_reflect (op, 0x5555555555555555,
> 0xAAAAAAAAAAAAAAAA,
> > +                                1);
> Another class of nits.  In general, if you're wrapping a long line like
> these above, consider "balancing" the length of the lines a bit better.
> so instead of bringing down just the final constant, bring down the
> final two constants to a new line.  I think that's what we generally do
> elsehwere.
>
> Ok.

>
> > +
> > +  gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size);
> > +  if (crc_mode == SImode && word_mode == DImode)
> > +    {
> > +      rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
> > +      crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
> > +    }
> > +  rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
> > +  emit_move_insn (tgt, crc);
> Same concerns as in the bit-forward implementation WRT testing SI/DI
> explicitly.  Testing crc_mode < word_mode or testing their bitsizes
> seems safer.
>
> > +/* Expand CRC call STMT.  */
> > +
> > +static void
> > +expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab)
> > +{
> > +    tree lhs = gimple_call_lhs (stmt);
> > +    tree rhs1 = gimple_call_arg (stmt, 0); // crc
> > +    tree rhs2 = gimple_call_arg (stmt, 1); // data
> > +    tree rhs3 = gimple_call_arg (stmt, 2); // polynomial
> > +
> > +    tree result_type = TREE_TYPE (lhs);
> > +    tree data_type = TREE_TYPE (rhs2);
> > +
> > +    gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type));
> > +    gcc_assert (word_mode >= TYPE_MODE (result_type));
> > +
> > +    rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> > +    rtx crc = expand_normal (rhs1);
> > +    rtx data = expand_normal (rhs2);
> > +    gcc_assert (TREE_CODE (rhs3) == INTEGER_CST);
> > +    rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type),
> > +                              TREE_INT_CST_LOW (rhs3));
> Nit.  Looks like this code got over-indented.  Just two spaces of
> indention after the open-curly.
>
>
> In general, most of the concerns with this patch are formatting nits.
> The functional concerns are with those mode checks as mentioned in a
> couple places above.  Repost once you've got an update fixing these issues.


>
Thanks,
> Jeff


I guess the indentation problem is because of the tab size.
I'll fix all the suggestions.
Thank you very much.
Mariam

>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation
  2024-05-27 13:51   ` Mariam Arutunian
@ 2024-06-08 21:50     ` Jeff Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Law @ 2024-06-08 21:50 UTC (permalink / raw)
  To: Mariam Arutunian; +Cc: GCC Patches



On 5/27/24 7:51 AM, Mariam Arutunian wrote:

> 
> I carefully reviewed the indentation of the code using different editors 
> and viewers, and everything appeared correct.
> I double-checked the specific sections mentioned, and they also looked 
> right.
> In this reply message I see that it's not correct. I'll try to fix it.
Thanks for double-checking.  It's one of the downsides of email based flows.

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-08 21:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-24  8:41 [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation Mariam Arutunian
2024-05-25 17:40 ` Jeff Law
2024-05-27 13:51   ` Mariam Arutunian
2024-06-08 21:50     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).