* [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation @ 2024-05-24 8:41 Mariam Arutunian 2024-05-25 17:40 ` Jeff Law 0 siblings, 1 reply; 4+ messages in thread From: Mariam Arutunian @ 2024-05-24 8:41 UTC (permalink / raw) To: GCC Patches [-- Attachment #1.1: Type: text/plain, Size: 1578 bytes --] Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster CRC generation. One performs bit-forward and the other bit-reversed CRC computation. If CRC optabs are supported, they are used for the CRC computation. Otherwise, table-based CRC is generated. The supported data and CRC sizes are 8, 16, 32, and 64 bits. The polynomial is without the leading 1. A table with 256 elements is used to store precomputed CRCs. For the reflection of inputs and the output, a simple algorithm involving SHIFT, AND, and OR operations is used. Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com> gcc/ * doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): Document. * expr.cc (generate_crc_table): New function. (calculate_table_based_CRC): Likewise. (expand_crc_table_based): Likewise. (gen_common_operation_to_reflect): Likewise. (reflect_64_bit_value): Likewise. (reflect_32_bit_value): Likewise. (reflect_16_bit_value): Likewise. (reflect_8_bit_value): Likewise. (generate_reflecting_code_standard): Likewise. (expand_reversed_crc_table_based): Likewise. * expr.h (generate_reflecting_code_standard): New function declaration. (expand_crc_table_based): Likewise. (expand_reversed_crc_table_based): Likewise. * internal-fn.cc: (crc_direct): Define. (direct_crc_optab_supported_p): Likewise. (expand_crc_optab_fn): New function * internal-fn.def (CRC, CRC_REV): New internal functions. * optabs.def (crc_optab, crc_rev_optab): New optabs. Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> [-- Attachment #2: 0001-Implement-internal-functions-for-efficient-CRC-compu.patch --] [-- Type: application/x-patch, Size: 18695 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation 2024-05-24 8:41 [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation Mariam Arutunian @ 2024-05-25 17:40 ` Jeff Law 2024-05-27 13:51 ` Mariam Arutunian 0 siblings, 1 reply; 4+ messages in thread From: Jeff Law @ 2024-05-25 17:40 UTC (permalink / raw) To: Mariam Arutunian, GCC Patches On 5/24/24 2:41 AM, Mariam Arutunian wrote: > Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster > CRC generation. > One performs bit-forward and the other bit-reversed CRC computation. > If CRC optabs are supported, they are used for the CRC computation. > Otherwise, table-based CRC is generated. > The supported data and CRC sizes are 8, 16, 32, and 64 bits. > The polynomial is without the leading 1. > A table with 256 elements is used to store precomputed CRCs. > For the reflection of inputs and the output, a simple algorithm involving > SHIFT, AND, and OR operations is used. > > Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com > <mailto:joern.rennecke@embecosm.com>> > > gcc/ > > * doc/md.texi (crc@var{m}@var{n}4, > crc_rev@var{m}@var{n}4): Document. > * expr.cc (generate_crc_table): New function. > (calculate_table_based_CRC): Likewise. > (expand_crc_table_based): Likewise. > (gen_common_operation_to_reflect): Likewise. > (reflect_64_bit_value): Likewise. > (reflect_32_bit_value): Likewise. > (reflect_16_bit_value): Likewise. > (reflect_8_bit_value): Likewise. > (generate_reflecting_code_standard): Likewise. > (expand_reversed_crc_table_based): Likewise. > * expr.h (generate_reflecting_code_standard): New function declaration. > (expand_crc_table_based): Likewise. > (expand_reversed_crc_table_based): Likewise. > * internal-fn.cc: (crc_direct): Define. > (direct_crc_optab_supported_p): Likewise. > (expand_crc_optab_fn): New function > * internal-fn.def (CRC, CRC_REV): New internal functions. > * optabs.def (crc_optab, crc_rev_optab): New optabs. > > Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com > <mailto:mariamarutunian@gmail.com>> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 5730bda80dc..be68ef860f9 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with operand 2. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{crc@var{m}@var{n}4} instruction pattern > +@item @samp{crc@var{m}@var{n}4} > +Calculate a bit-forward CRC using operands 1, 2 and 3, > +then store the result in operand 0. > +Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the > +polynomial without leading 1. > +Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where > +both modes are integers. The size of CRC to be calculated is determined by the > +mode; for example, if @var{n} is 'hi', a CRC16 is calculated. > + > +@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern > +@item @samp{crc_rev@var{m}@var{n}4} > +Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC. > + So just to be clear, this is a case where the input (operand 2) may have a different mode than the output (operand 0). That scenario is generally discouraged, with a few exceptions (the most common being shift counts which are often QImode objects while the value-to-be-shifted and the output value are potentially any scalar integer mode. So I don't think this is a problem, just wanted to point it out to anyone else that may be looking at this code. > @end table > > @end ifset > diff --git a/gcc/expr.cc b/gcc/expr.cc > index 1baa39b98eb..18368ae6b6c 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp) > > return tree_to_shwi (size); > } > + > +/* Calculate CRC for the initial CRC and given POLYNOMIAL. > + CRC_BITS is CRC size. */ > + > +static unsigned HOST_WIDE_INT > +calculate_crc (unsigned HOST_WIDE_INT crc, > + unsigned HOST_WIDE_INT polynomial, > + unsigned crc_bits) Just a nit. Line up the polynomial & crc_bits declarations with the crc declaration. > +{ > + crc = crc << (crc_bits - 8); > + for (int i = 8; i > 0; --i) > + { > + if ((crc >> (crc_bits - 1)) & 1) > + crc = (crc << 1) ^ polynomial; > + else > + crc <<= 1; > + } > + > + crc <<= (sizeof (crc) * BITS_PER_UNIT - crc_bits); > + crc >>= (sizeof (crc) * BITS_PER_UNIT - crc_bits); Another nit. Just once space after the <<= or >>= operators. > + > + return crc; > +} > + > +/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with > + given ID. > + ID is the identifier of the table, the name of the table is unique, > + contains CRC size and the polynomial. > + POLYNOM is the polynomial used to calculate the CRC table's elements. > + CRC_BITS is the size of CRC, may be 8, 16, ... . */ > + > +rtx > +assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits) > +{ > + unsigned table_el_n = 0x100; > + tree ar = build_array_type (make_unsigned_type (crc_bits), > + build_index_type (size_int (table_el_n - 1))); Nit. Line up build_index_type at the same indention as make_unsigned_type. Note that with TREE_READONLY set, there is at least some chance that the linker will find identical tables and merge them. I haven't tested this, but I know it happens for other objects in the constant pools. > + sprintf (buf, "crc_table_for_crc_%u_polynomial_" HOST_WIDE_INT_PRINT_HEX, > + crc_bits, polynom); Another formatting nit. Line up the arguments when you need to wrap a function call. ie foo (arg1, arg2.... arg3, arg4....) > + > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the > + POLYNOMIAL (without leading 1). > + > + First, using POLYNOMIAL's value generates CRC table of 256 elements, > + then generates the assembly for the following code, > + where crc_size and data_size may be 8, 16, 32, 64, depending on CRC: > + > + for (int i = 0; i < data_size / 8; i++) > + crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8)) > + ^ (data >> (data_size - (i + 1) * 8) & 0xFF))]; > + > + So to take values from the table, we need 8-bit data. > + If input data size is not 8, then first we extract upper 8 bits, > + then the other 8 bits, and so on. */ > + > +void > +calculate_table_based_CRC (rtx *crc, const rtx &input_data, > + const rtx &polynomial, > + machine_mode crc_mode, machine_mode data_mode) Same nit as before. Line up the parameters. It looks like we need to check for that throughout this function when you wrapped function argument lists. > + > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the > + POLYNOMIAL (without leading 1). > + > + CRC is OP1, data is OP2 and the polynomial is OP3. > + This must generate a CRC table and an assembly for the following code, > + where crc_size and data_size may be 8, 16, 32, 64: > + uint_crc_size_t > + crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data, size_t size) > + { > + uint_crc_size_t crc = crc_init; > + for (int i = 0; i < data_size / 8; i++) > + crc = (crc << 8) > + ^ crc_table[(crc >> (crc_size - 8)) > + ^ (data >> (data_size - (i + 1) * 8) & 0xFF))]; > + return crc; > + } */ > + > +void > +expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3, > + machine_mode data_mode) > +{ > + gcc_assert (!CONST_INT_P (op0)); > + gcc_assert (CONST_INT_P (op3)); > + machine_mode crc_mode = GET_MODE (op0); > + rtx crc = gen_reg_rtx (word_mode); > + convert_move (crc, op1, 0); > + calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode); > + if (crc_mode == SImode && word_mode == DImode) > + { > + rtx a_low = gen_lowpart_SUBREG (crc_mode, crc); > + crc = gen_rtx_SIGN_EXTEND (word_mode, a_low); > + } > + rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0); The explicit checks for SI/DI mode don't look right here. What I think you're looking for is crc_mode < word_mode. If you wanted to be even omre precise you could check the sizes of the two modes. Along the same lines, I don't think you want/need the call to simplify_gen_subreg when word_mode == crc_mode. ISTM that you only want the subreg when crc_mode < word_mode. Presumably we rejected the optimization earlier if crc_mode > word_mode? > + emit_move_insn (tgt, crc); > +} > + > +/* Generate the common operation for reflecting values: > + *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >> SHIFT_VAL; */ > + > +void > +gen_common_operation_to_reflect (rtx *op, > + unsigned HOST_WIDE_INT and1_value, > + unsigned HOST_WIDE_INT and2_value, > + unsigned shift_val) > +{ > + rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value, word_mode), > + NULL_RTX); > + op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0); > + rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value, word_mode), > + NULL_RTX); > + op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1); > + *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0, OPTAB_DIRECT); > +} Same nits as elsewhere. Line up the parameters & arguments when you need to wrap lines. > + > +/* Reflect 64-bit value for the 64-bit target. */ > + > +void > +reflect_64_bit_value (rtx *op) > +{ > + gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF, 0xFFFFFFFF00000000, > + 32); > + gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF, 0xFFFF0000FFFF0000, > + 16); > + gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF, 0xFF00FF00FF00FF00, > + 8); > + gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F, 0xF0F0F0F0F0F0F0F0, > + 4); > + gen_common_operation_to_reflect (op, 0x3333333333333333, 0xCCCCCCCCCCCCCCCC, > + 2); > + gen_common_operation_to_reflect (op, 0x5555555555555555, 0xAAAAAAAAAAAAAAAA, > + 1); Another class of nits. In general, if you're wrapping a long line like these above, consider "balancing" the length of the lines a bit better. so instead of bringing down just the final constant, bring down the final two constants to a new line. I think that's what we generally do elsehwere. > + > + gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size); > + if (crc_mode == SImode && word_mode == DImode) > + { > + rtx a_low = gen_lowpart_SUBREG (crc_mode, crc); > + crc = gen_rtx_SIGN_EXTEND (word_mode, a_low); > + } > + rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0); > + emit_move_insn (tgt, crc); Same concerns as in the bit-forward implementation WRT testing SI/DI explicitly. Testing crc_mode < word_mode or testing their bitsizes seems safer. > +/* Expand CRC call STMT. */ > + > +static void > +expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab) > +{ > + tree lhs = gimple_call_lhs (stmt); > + tree rhs1 = gimple_call_arg (stmt, 0); // crc > + tree rhs2 = gimple_call_arg (stmt, 1); // data > + tree rhs3 = gimple_call_arg (stmt, 2); // polynomial > + > + tree result_type = TREE_TYPE (lhs); > + tree data_type = TREE_TYPE (rhs2); > + > + gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type)); > + gcc_assert (word_mode >= TYPE_MODE (result_type)); > + > + rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > + rtx crc = expand_normal (rhs1); > + rtx data = expand_normal (rhs2); > + gcc_assert (TREE_CODE (rhs3) == INTEGER_CST); > + rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type), > + TREE_INT_CST_LOW (rhs3)); Nit. Looks like this code got over-indented. Just two spaces of indention after the open-curly. In general, most of the concerns with this patch are formatting nits. The functional concerns are with those mode checks as mentioned in a couple places above. Repost once you've got an update fixing these issues. Thanks, Jeff ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation 2024-05-25 17:40 ` Jeff Law @ 2024-05-27 13:51 ` Mariam Arutunian 2024-06-08 21:50 ` Jeff Law 0 siblings, 1 reply; 4+ messages in thread From: Mariam Arutunian @ 2024-05-27 13:51 UTC (permalink / raw) To: Jeff Law; +Cc: GCC Patches [-- Attachment #1: Type: text/plain, Size: 13999 bytes --] On Sat, May 25, 2024 at 9:40 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > On 5/24/24 2:41 AM, Mariam Arutunian wrote: > > Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster > > CRC generation. > > One performs bit-forward and the other bit-reversed CRC computation. > > If CRC optabs are supported, they are used for the CRC computation. > > Otherwise, table-based CRC is generated. > > The supported data and CRC sizes are 8, 16, 32, and 64 bits. > > The polynomial is without the leading 1. > > A table with 256 elements is used to store precomputed CRCs. > > For the reflection of inputs and the output, a simple algorithm involving > > SHIFT, AND, and OR operations is used. > > > > Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com > > <mailto:joern.rennecke@embecosm.com>> > > > > gcc/ > > > > * doc/md.texi (crc@var{m}@var{n}4, > > crc_rev@var{m}@var{n}4): Document. > > * expr.cc (generate_crc_table): New function. > > (calculate_table_based_CRC): Likewise. > > (expand_crc_table_based): Likewise. > > (gen_common_operation_to_reflect): Likewise. > > (reflect_64_bit_value): Likewise. > > (reflect_32_bit_value): Likewise. > > (reflect_16_bit_value): Likewise. > > (reflect_8_bit_value): Likewise. > > (generate_reflecting_code_standard): Likewise. > > (expand_reversed_crc_table_based): Likewise. > > * expr.h (generate_reflecting_code_standard): New function > declaration. > > (expand_crc_table_based): Likewise. > > (expand_reversed_crc_table_based): Likewise. > > * internal-fn.cc: (crc_direct): Define. > > (direct_crc_optab_supported_p): Likewise. > > (expand_crc_optab_fn): New function > > * internal-fn.def (CRC, CRC_REV): New internal functions. > > * optabs.def (crc_optab, crc_rev_optab): New optabs. > > > > Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com > > <mailto:mariamarutunian@gmail.com>> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index 5730bda80dc..be68ef860f9 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered > with operand 2. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{crc@var{m}@var{n}4} instruction pattern > > +@item @samp{crc@var{m}@var{n}4} > > +Calculate a bit-forward CRC using operands 1, 2 and 3, > > +then store the result in operand 0. > > +Operands 1 is the initial CRC, operands 2 is the data and operands 3 is > the > > +polynomial without leading 1. > > +Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, > where > > +both modes are integers. The size of CRC to be calculated is > determined by the > > +mode; for example, if @var{n} is 'hi', a CRC16 is calculated. > > + > > +@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern > > +@item @samp{crc_rev@var{m}@var{n}4} > > +Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed > CRC. > > + > So just to be clear, this is a case where the input (operand 2) may have > a different mode than the output (operand 0). That scenario is > generally discouraged, with a few exceptions (the most common being > shift counts which are often QImode objects while the > value-to-be-shifted and the output value are potentially any scalar > integer mode. > > So I don't think this is a problem, just wanted to point it out to > anyone else that may be looking at this code. > > > > @end table > > > > @end ifset > > diff --git a/gcc/expr.cc b/gcc/expr.cc > > index 1baa39b98eb..18368ae6b6c 100644 > > --- a/gcc/expr.cc > > +++ b/gcc/expr.cc > > @@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp) > > > > return tree_to_shwi (size); > > } > > + > > +/* Calculate CRC for the initial CRC and given POLYNOMIAL. > > + CRC_BITS is CRC size. */ > > + > > +static unsigned HOST_WIDE_INT > > +calculate_crc (unsigned HOST_WIDE_INT crc, > > + unsigned HOST_WIDE_INT polynomial, > > + unsigned crc_bits) > Just a nit. Line up the polynomial & crc_bits declarations with the crc > declaration. > > I carefully reviewed the indentation of the code using different editors and viewers, and everything appeared correct. I double-checked the specific sections mentioned, and they also looked right. In this reply message I see that it's not correct. I'll try to fix it. > > > +{ > > + crc = crc << (crc_bits - 8); > > + for (int i = 8; i > 0; --i) > > + { > > + if ((crc >> (crc_bits - 1)) & 1) > > + crc = (crc << 1) ^ polynomial; > > + else > > + crc <<= 1; > > + } > > + > > + crc <<= (sizeof (crc) * BITS_PER_UNIT - crc_bits); > > + crc >>= (sizeof (crc) * BITS_PER_UNIT - crc_bits); > Another nit. Just once space after the <<= or >>= operators. > > Yes, this one hadn't noticed. Thanks. > > + > > + return crc; > > +} > > + > > +/* Assemble CRC table with 256 elements for the given POLYNOM and > CRC_BITS with > > + given ID. > > + ID is the identifier of the table, the name of the table is unique, > > + contains CRC size and the polynomial. > > + POLYNOM is the polynomial used to calculate the CRC table's elements. > > + CRC_BITS is the size of CRC, may be 8, 16, ... . */ > > + > > +rtx > > +assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned > crc_bits) > > +{ > > + unsigned table_el_n = 0x100; > > + tree ar = build_array_type (make_unsigned_type (crc_bits), > > + build_index_type (size_int (table_el_n - > 1))); > Nit. Line up build_index_type at the same indention as make_unsigned_type. > > Note that with TREE_READONLY set, there is at least some chance that the > linker will find identical tables and merge them. I haven't tested > this, but I know it happens for other objects in the constant pools. > > Considering that identical tables would likely have the same name, their merging shouldn't pose a problem in principle. However, it's worth noting the potential issue if someone intentionally creates different tables with the same name. > > > + sprintf (buf, "crc_table_for_crc_%u_polynomial_" > HOST_WIDE_INT_PRINT_HEX, > > + crc_bits, polynom); > Another formatting nit. Line up the arguments when you need to wrap a > function call. ie > > foo (arg1, arg2.... > arg3, arg4....) > > > + > > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the > > + POLYNOMIAL (without leading 1). > > + > > + First, using POLYNOMIAL's value generates CRC table of 256 elements, > > + then generates the assembly for the following code, > > + where crc_size and data_size may be 8, 16, 32, 64, depending on CRC: > > + > > + for (int i = 0; i < data_size / 8; i++) > > + crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8)) > > + ^ (data >> (data_size - (i + 1) * 8) & > 0xFF))]; > > + > > + So to take values from the table, we need 8-bit data. > > + If input data size is not 8, then first we extract upper 8 bits, > > + then the other 8 bits, and so on. */ > > + > > +void > > +calculate_table_based_CRC (rtx *crc, const rtx &input_data, > > + const rtx &polynomial, > > + machine_mode crc_mode, machine_mode data_mode) > Same nit as before. Line up the parameters. It looks like we need to > check for that throughout this function when you wrapped function > argument lists. > > > > + > > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the > > + POLYNOMIAL (without leading 1). > > + > > + CRC is OP1, data is OP2 and the polynomial is OP3. > > + This must generate a CRC table and an assembly for the following > code, > > + where crc_size and data_size may be 8, 16, 32, 64: > > + uint_crc_size_t > > + crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data, > size_t size) > > + { > > + uint_crc_size_t crc = crc_init; > > + for (int i = 0; i < data_size / 8; i++) > > + crc = (crc << 8) > > + ^ crc_table[(crc >> (crc_size - 8)) > > + ^ (data >> (data_size - (i + 1) * 8) & 0xFF))]; > > + return crc; > > + } */ > > + > > +void > > +expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3, > > + machine_mode data_mode) > > +{ > > + gcc_assert (!CONST_INT_P (op0)); > > + gcc_assert (CONST_INT_P (op3)); > > + machine_mode crc_mode = GET_MODE (op0); > > + rtx crc = gen_reg_rtx (word_mode); > > + convert_move (crc, op1, 0); > > + calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode); > > + if (crc_mode == SImode && word_mode == DImode) > > + { > > + rtx a_low = gen_lowpart_SUBREG (crc_mode, crc); > > + crc = gen_rtx_SIGN_EXTEND (word_mode, a_low); > > + } > > + rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0); > The explicit checks for SI/DI mode don't look right here. What I think > you're looking for is crc_mode < word_mode. If you wanted to be even > omre precise you could check the sizes of the two modes. > Thanks. I'll check the sizes then, as I need to check that crc_mode size is 32 and word_mode is 64, only this one case. Along the same lines, I don't think you want/need the call to > simplify_gen_subreg when word_mode == crc_mode. ISTM that you only want > the subreg when crc_mode < word_mode. > > Presumably we rejected the optimization earlier if crc_mode > word_mode? > > Yes. > > > > > + emit_move_insn (tgt, crc); > > +} > > + > > +/* Generate the common operation for reflecting values: > > + *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >> > SHIFT_VAL; */ > > + > > +void > > +gen_common_operation_to_reflect (rtx *op, > > + unsigned HOST_WIDE_INT and1_value, > > + unsigned HOST_WIDE_INT and2_value, > > + unsigned shift_val) > > +{ > > + rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value, > word_mode), > > + NULL_RTX); > > + op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0); > > + rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value, > word_mode), > > + NULL_RTX); > > + op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1); > > + *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0, > OPTAB_DIRECT); > > +} > Same nits as elsewhere. Line up the parameters & arguments when you > need to wrap lines. > > > > > + > > +/* Reflect 64-bit value for the 64-bit target. */ > > + > > +void > > +reflect_64_bit_value (rtx *op) > > +{ > > + gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF, > 0xFFFFFFFF00000000, > > + 32); > > + gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF, > 0xFFFF0000FFFF0000, > > + 16); > > + gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF, > 0xFF00FF00FF00FF00, > > + 8); > > + gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F, > 0xF0F0F0F0F0F0F0F0, > > + 4); > > + gen_common_operation_to_reflect (op, 0x3333333333333333, > 0xCCCCCCCCCCCCCCCC, > > + 2); > > + gen_common_operation_to_reflect (op, 0x5555555555555555, > 0xAAAAAAAAAAAAAAAA, > > + 1); > Another class of nits. In general, if you're wrapping a long line like > these above, consider "balancing" the length of the lines a bit better. > so instead of bringing down just the final constant, bring down the > final two constants to a new line. I think that's what we generally do > elsehwere. > > Ok. > > > + > > + gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size); > > + if (crc_mode == SImode && word_mode == DImode) > > + { > > + rtx a_low = gen_lowpart_SUBREG (crc_mode, crc); > > + crc = gen_rtx_SIGN_EXTEND (word_mode, a_low); > > + } > > + rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0); > > + emit_move_insn (tgt, crc); > Same concerns as in the bit-forward implementation WRT testing SI/DI > explicitly. Testing crc_mode < word_mode or testing their bitsizes > seems safer. > > > +/* Expand CRC call STMT. */ > > + > > +static void > > +expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab) > > +{ > > + tree lhs = gimple_call_lhs (stmt); > > + tree rhs1 = gimple_call_arg (stmt, 0); // crc > > + tree rhs2 = gimple_call_arg (stmt, 1); // data > > + tree rhs3 = gimple_call_arg (stmt, 2); // polynomial > > + > > + tree result_type = TREE_TYPE (lhs); > > + tree data_type = TREE_TYPE (rhs2); > > + > > + gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type)); > > + gcc_assert (word_mode >= TYPE_MODE (result_type)); > > + > > + rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > + rtx crc = expand_normal (rhs1); > > + rtx data = expand_normal (rhs2); > > + gcc_assert (TREE_CODE (rhs3) == INTEGER_CST); > > + rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type), > > + TREE_INT_CST_LOW (rhs3)); > Nit. Looks like this code got over-indented. Just two spaces of > indention after the open-curly. > > > In general, most of the concerns with this patch are formatting nits. > The functional concerns are with those mode checks as mentioned in a > couple places above. Repost once you've got an update fixing these issues. > Thanks, > Jeff I guess the indentation problem is because of the tab size. I'll fix all the suggestions. Thank you very much. Mariam > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation 2024-05-27 13:51 ` Mariam Arutunian @ 2024-06-08 21:50 ` Jeff Law 0 siblings, 0 replies; 4+ messages in thread From: Jeff Law @ 2024-06-08 21:50 UTC (permalink / raw) To: Mariam Arutunian; +Cc: GCC Patches On 5/27/24 7:51 AM, Mariam Arutunian wrote: > > I carefully reviewed the indentation of the code using different editors > and viewers, and everything appeared correct. > I double-checked the specific sections mentioned, and they also looked > right. > In this reply message I see that it's not correct. I'll try to fix it. Thanks for double-checking. It's one of the downsides of email based flows. Jeff ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-08 21:50 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-05-24 8:41 [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation Mariam Arutunian 2024-05-25 17:40 ` Jeff Law 2024-05-27 13:51 ` Mariam Arutunian 2024-06-08 21:50 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).