[RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.
@ 2024-05-24  8:41 Mariam Arutunian
  2024-05-25 18:32 ` Jeff Law
  2024-06-08  7:53 ` Richard Sandiford
  0 siblings, 2 replies; 6+ messages in thread
From: Mariam Arutunian @ 2024-05-24  8:41 UTC (permalink / raw)
  To: GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 1967 bytes --]

If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation.
Otherwise, if the target is ZBKB, generates table-based CRC,
but for reversing inputs and the output uses bswap and brev8 instructions.
Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets.

  gcc/

     * expr.cc (gf2n_poly_long_div_quotient): New function.
     (reflect): Likewise.
     * expr.h (gf2n_poly_long_div_quotient): New function declaration.
     (reflect): Likewise.

  gcc/config/riscv/

     * bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New expander for
reversed CRC.
     (crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
     (SUBX1, ANYI1): New iterators.
     * riscv-protos.h (generate_reflecting_code_using_brev): New function
declaration.
     (expand_crc_using_clmul): Likewise.
     (expand_reversed_crc_using_clmul): Likewise.
     * riscv.cc (generate_reflecting_code_using_brev): New function.
     (expand_crc_using_clmul): Likewise.
     (expand_reversed_crc_using_clmul): Likewise.
     * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

  gcc/testsuite/gcc.target/riscv/

        * crc-1-zbc.c: New test.
        * crc-10-zbc.c: Likewise.
        * crc-12-zbc.c: Likewise.
        * crc-13-zbc.c: Likewise.
        * crc-14-zbc.c: Likewise.
        * crc-17-zbc.c: Likewise.
        * crc-18-zbc.c: Likewise.
        * crc-21-zbc.c: Likewise.
        * crc-22-rv64-zbc.c: Likewise.
        * crc-22-zbkb.c: Likewise.
        * crc-23-zbc.c: Likewise.
        * crc-4-zbc.c: Likewise.
        * crc-5-zbc.c: Likewise.
        * crc-5-zbkb.c: Likewise.
        * crc-6-zbc.c: Likewise.
        * crc-7-zbc.c: Likewise.
        * crc-8-zbc.c: Likewise.
        * crc-8-zbkb.c: Likewise.
        * crc-9-zbc.c: Likewise.
        * crc-CCIT-data16-zbc.c: Likewise.
        * crc-CCIT-data8-zbc.c: Likewise.
        * crc-coremark-16bitdata-zbc.c: Likewise.

Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>

[-- Attachment #2: 0003-RISC-V-Add-CRC-expander-to-generate-faster-CRC.patch --]
[-- Type: application/x-patch, Size: 48144 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.
  2024-05-24  8:41 [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC Mariam Arutunian
@ 2024-05-25 18:32 ` Jeff Law
  2024-05-27 15:48   ` Mariam Arutunian
  2024-06-08  7:53 ` Richard Sandiford
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Law @ 2024-05-25 18:32 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches



On 5/24/24 2:41 AM, Mariam Arutunian wrote:
> If the target is ZBC or ZBKC, it uses clmul instruction for the CRC 
> calculation.
> Otherwise, if the target is ZBKB, generates table-based CRC,
> but for reversing inputs and the output uses bswap and brev8 instructions.
> Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets.
> 
>    gcc/
> 
>       * expr.cc (gf2n_poly_long_div_quotient): New function.
>       (reflect): Likewise.
>       * expr.h (gf2n_poly_long_div_quotient): New function declaration.
>       (reflect): Likewise.
> 
>    gcc/config/riscv/
> 
>       * bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New expander for 
> reversed CRC.
>       (crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
>       (SUBX1, ANYI1): New iterators.
>       * riscv-protos.h (generate_reflecting_code_using_brev): New 
> function declaration.
>       (expand_crc_using_clmul): Likewise.
>       (expand_reversed_crc_using_clmul): Likewise.
>       * riscv.cc (generate_reflecting_code_using_brev): New function.
>       (expand_crc_using_clmul): Likewise.
>       (expand_reversed_crc_using_clmul): Likewise.
>       * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
> 
>    gcc/testsuite/gcc.target/riscv/
> 
>          * crc-1-zbc.c: New test.
>          * crc-10-zbc.c: Likewise.
>          * crc-12-zbc.c: Likewise.
>          * crc-13-zbc.c: Likewise.
>          * crc-14-zbc.c: Likewise.
>          * crc-17-zbc.c: Likewise.
>          * crc-18-zbc.c: Likewise.
>          * crc-21-zbc.c: Likewise.
>          * crc-22-rv64-zbc.c: Likewise.
>          * crc-22-zbkb.c: Likewise.
>          * crc-23-zbc.c: Likewise.
>          * crc-4-zbc.c: Likewise.
>          * crc-5-zbc.c: Likewise.
>          * crc-5-zbkb.c: Likewise.
>          * crc-6-zbc.c: Likewise.
>          * crc-7-zbc.c: Likewise.
>          * crc-8-zbc.c: Likewise.
>          * crc-8-zbkb.c: Likewise.
>          * crc-9-zbc.c: Likewise.
>          * crc-CCIT-data16-zbc.c: Likewise.
>          * crc-CCIT-data8-zbc.c: Likewise.
>          * crc-coremark-16bitdata-zbc.c: Likewise.
> 
> Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com 
> <mailto:mariamarutunian@gmail.com>>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 8769a6b818b..c98d451f404 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -973,3 +973,66 @@
>    "TARGET_ZBC"
>    "clmulr\t%0,%1,%2"
>    [(set_attr "type" "clmul")])
> +
> +
> +;; Iterator for hardware integer modes narrower than XLEN, same as SUBX
> +(define_mode_iterator SUBX1 [QI HI (SI "TARGET_64BIT")])
> +
> +;; Iterator for hardware integer modes narrower than XLEN, same as ANYI
> +(define_mode_iterator ANYI1 [QI HI SI (DI "TARGET_64BIT")])
If these iterators are the same as existing ones, let's just using the 
existing ones. unless we need both SUBX and SUBX1 in the same pattern or 
ANYI/ANYI1.



> +
> +;; Reversed CRC 8, 16, 32 for TARGET_64
> +(define_expand "crc_rev<ANYI1:mode><ANYI:mode>4"
> +	;; return value (calculated CRC)
> +  [(set (match_operand:ANYI 0 "register_operand" "=r")
> +		      ;; initial CRC
> +	(unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
> +		      ;; data
> +		      (match_operand:ANYI1 2 "register_operand" "r")
> +		      ;; polynomial without leading 1
> +		      (match_operand:ANYI 3)]
> +		      UNSPEC_CRC_REV))]
So the preferred formatting for .md files has operands of a given 
operator at the same indention level.  So in this case SET is the 
operator, with two operands (destination/source).  Indent the source and 
destination at the same level.   so

   [(set (match_operand:ANYI 0 ...0)
         (unspec: ANYI ...)

Similarly for the reversed expander.


> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 85df5b7ab49..123695033a6 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -11394,7 +11394,7 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
>    if (mode == HImode || mode == QImode)
>      {
>        int shift_bits = GET_MODE_BITSIZE (Xmode)
> -	- GET_MODE_BITSIZE (mode).to_constant ();
> +		       - GET_MODE_BITSIZE (mode).to_constant ();
>  
>        gcc_assert (shift_bits > 0);
Looks like an unrelated spurious change.  Drop.


>  
> @@ -11415,6 +11415,188 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
>    emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
>  }
>  
> +/* Generate instruction sequence
> +   which reflects the value of the OP using bswap and brev8 instructions.
> +   OP's mode may be less than word_mode, to get the correct number,
> +   after reflecting we shift right the value by SHIFT_VAL.
> +   E.g. we have 1111 0001, after reflection (target 32-bit) we will get
> +   1000 1111 0000 0000, if we shift-out 16 bits,
> +   we will get the desired one: 1000 1111.  */
> +
> +void
> +generate_reflecting_code_using_brev (rtx *op, int shift_val)
> +{
> +
> +  riscv_expand_op (BSWAP, word_mode, *op, *op, *op);
> +  riscv_expand_op (LSHIFTRT, word_mode, *op, *op,
> +		   gen_int_mode (shift_val, word_mode));
Formatting nit with the gen_int_mode (...) argument.  It should line up 
with the LSHIFTRT argument.



> +
> +void
> +expand_crc_using_clmul (rtx *operands)
> +{
> +  /* Check and keep arguments.  */
> +  gcc_assert (!CONST_INT_P (operands[0]));
> +  gcc_assert (CONST_INT_P (operands[3]));
> +  unsigned HOST_WIDE_INT
> +      crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
> +  gcc_assert (crc_size <= 32);
> +  unsigned HOST_WIDE_INT
> +      data_size = GET_MODE_BITSIZE (GET_MODE (operands[2])).to_constant ();
> +
> +  /* Calculate the quotient.  */
> +  unsigned HOST_WIDE_INT
> +      q = gf2n_poly_long_div_quotient (UINTVAL (operands[3]), crc_size + 1);
> +
> +  rtx crc_extended = gen_rtx_ZERO_EXTEND (word_mode, operands[1]);
> +  rtx crc = gen_reg_rtx (word_mode);
> +  if (crc_size > data_size)
> +    riscv_expand_op (LSHIFTRT, word_mode, crc, crc_extended,
> +		     gen_int_mode (crc_size - data_size, word_mode));
> +  else
> +    crc = gen_rtx_ZERO_EXTEND (word_mode, operands[1]);
> +  rtx t0 = gen_reg_rtx (word_mode);
> +  riscv_emit_move (t0, gen_int_mode (q, word_mode));
> +  rtx t1 = gen_reg_rtx (word_mode);
> +  riscv_emit_move (t1, operands[3]);
> +
> +  rtx a0 = gen_reg_rtx (word_mode);
> +  rtx data = gen_rtx_ZERO_EXTEND (word_mode, operands[2]);
> +  riscv_expand_op (XOR, word_mode, a0, crc, data);
> +
> +  if (TARGET_64BIT)
> +    emit_insn (gen_riscv_clmul_di (a0, a0, t0));
> +  else
> +    emit_insn (gen_riscv_clmul_si (a0, a0, t0));
> +
> +  riscv_expand_op (LSHIFTRT, word_mode, a0, a0, gen_int_mode (crc_size,
> +							      word_mode));
Formatting nit on word_size argument.


> +
> +  if (crc_size == 32 && word_mode == DImode)
Rather than hard coding these cases, would it be better to check if 
crc_size < GET_MODE_BITSIZE (word_mode)?  Similarly for the reversed case.

Overall it looks pretty good as well.  Please repost after fixing the 
minor issues noted above.

Thanks,
jeff



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.
  2024-05-25 18:32 ` Jeff Law
@ 2024-05-27 15:48   ` Mariam Arutunian
  0 siblings, 0 replies; 6+ messages in thread
From: Mariam Arutunian @ 2024-05-27 15:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4841 bytes --]

On Sat, May 25, 2024 at 10:32 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 5/24/24 2:41 AM, Mariam Arutunian wrote:
> > If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
> > calculation.
> > Otherwise, if the target is ZBKB, generates table-based CRC,
> > but for reversing inputs and the output uses bswap and brev8
> instructions.
> > Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets.
> >
> >    gcc/
> >
> >       * expr.cc (gf2n_poly_long_div_quotient): New function.
> >       (reflect): Likewise.
> >       * expr.h (gf2n_poly_long_div_quotient): New function declaration.
> >       (reflect): Likewise.
> >
> >    gcc/config/riscv/
> >
> >       * bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New expander for
> > reversed CRC.
> >       (crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
> >       (SUBX1, ANYI1): New iterators.
> >       * riscv-protos.h (generate_reflecting_code_using_brev): New
> > function declaration.
> >       (expand_crc_using_clmul): Likewise.
> >       (expand_reversed_crc_using_clmul): Likewise.
> >       * riscv.cc (generate_reflecting_code_using_brev): New function.
> >       (expand_crc_using_clmul): Likewise.
> >       (expand_reversed_crc_using_clmul): Likewise.
> >       * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
> >
> >    gcc/testsuite/gcc.target/riscv/
> >
> >          * crc-1-zbc.c: New test.
> >          * crc-10-zbc.c: Likewise.
> >          * crc-12-zbc.c: Likewise.
> >          * crc-13-zbc.c: Likewise.
> >          * crc-14-zbc.c: Likewise.
> >          * crc-17-zbc.c: Likewise.
> >          * crc-18-zbc.c: Likewise.
> >          * crc-21-zbc.c: Likewise.
> >          * crc-22-rv64-zbc.c: Likewise.
> >          * crc-22-zbkb.c: Likewise.
> >          * crc-23-zbc.c: Likewise.
> >          * crc-4-zbc.c: Likewise.
> >          * crc-5-zbc.c: Likewise.
> >          * crc-5-zbkb.c: Likewise.
> >          * crc-6-zbc.c: Likewise.
> >          * crc-7-zbc.c: Likewise.
> >          * crc-8-zbc.c: Likewise.
> >          * crc-8-zbkb.c: Likewise.
> >          * crc-9-zbc.c: Likewise.
> >          * crc-CCIT-data16-zbc.c: Likewise.
> >          * crc-CCIT-data8-zbc.c: Likewise.
> >          * crc-coremark-16bitdata-zbc.c: Likewise.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com
> > <mailto:mariamarutunian@gmail.com>>
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 8769a6b818b..c98d451f404 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -973,3 +973,66 @@
> >    "TARGET_ZBC"
> >    "clmulr\t%0,%1,%2"
> >    [(set_attr "type" "clmul")])
> > +
> > +
> > +;; Iterator for hardware integer modes narrower than XLEN, same as SUBX
> > +(define_mode_iterator SUBX1 [QI HI (SI "TARGET_64BIT")])
> > +
> > +;; Iterator for hardware integer modes narrower than XLEN, same as ANYI
> > +(define_mode_iterator ANYI1 [QI HI SI (DI "TARGET_64BIT")])
> If these iterators are the same as existing ones, let's just using the
> existing ones. unless we need both SUBX and SUBX1 in the same pattern or
> ANYI/ANYI1.
>

I need both, as I need to get all the combinations.
For example crc_revsihi4, crc_revsiqi4, etc.
If I write "crc_rev<*ANYI*:mode><ANYI:mode>4" instead of  "crc_rev<*ANYI1*
:mode><ANYI:mode>4",
it will generate only crc_revsisi4, crc_revqiqi4, etc.



> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 85df5b7ab49..123695033a6 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -11394,7 +11394,7 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
> >    if (mode == HImode || mode == QImode)
> >      {
> >        int shift_bits = GET_MODE_BITSIZE (Xmode)
> > -     - GET_MODE_BITSIZE (mode).to_constant ();
> > +                    - GET_MODE_BITSIZE (mode).to_constant ();
> >
> >        gcc_assert (shift_bits > 0);
> Looks like an unrelated spurious change.  Drop.
>
>
>

Oh! Thanks for catching that.


> +  riscv_expand_op (LSHIFTRT, word_mode, a0, a0, gen_int_mode (crc_size,
> > +                                                           word_mode));
> Formatting nit on word_size argument.
>
>
> > +
> > +  if (crc_size == 32 && word_mode == DImode)
> Rather than hard coding these cases, would it be better to check if
> crc_size < GET_MODE_BITSIZE (word_mode)?  Similarly for the reversed
> case.

Overall it looks pretty good as well.  Please repost after fixing the
> minor issues noted above.
>
> Thanks,
> jeff
>


The misalignment was due to incorrect settings on my side, which I have now
corrected.
I will apply all the suggestions and repost. Thank you.

Best regards,
Mariam

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.
  2024-05-24  8:41 [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC Mariam Arutunian
  2024-05-25 18:32 ` Jeff Law
@ 2024-06-08  7:53 ` Richard Sandiford
  2024-06-08 13:49   ` Jeff Law
  2024-06-09  7:14   ` Mariam Arutunian
  1 sibling, 2 replies; 6+ messages in thread
From: Richard Sandiford @ 2024-06-08  7:53 UTC (permalink / raw)
  To: Mariam Arutunian; +Cc: GCC Patches

Thanks a lot for doing this!  It's a really nice series.

Just had a comment on the long division helper:

Mariam Arutunian <mariamarutunian@gmail.com> writes:
> +/* Return the quotient of polynomial long division of x^2N by POLYNOMIAL
> +   in GF (2^N).  */

It looks like there might be an off-by-one discrepancy between the comment
and the code.  The comment suggests that N is the degree of the polynomial
(crc_size), whereas the callers seem to pass crc_size + 1.  This doesn't
matter in practice since...

> +
> +unsigned HOST_WIDE_INT
> +gf2n_poly_long_div_quotient (unsigned HOST_WIDE_INT polynomial, size_t n)
> +{
> +  vec<short> x2n;
> +  vec<bool> pol, q;
> +  /* Create vector of bits, for the polynomial.  */
> +  pol.create (n + 1);
> +  for (size_t i = 0; i < n; i++)
> +    {
> +      pol.quick_push (polynomial & 1);
> +      polynomial >>= 1;
> +    }
> +  pol.quick_push (1);
> +
> +  /* Create vector for x^2n polynomial.  */
> +  x2n.create (2 * n - 1);
> +  for (size_t i = 0; i < 2 * (n - 1); i++)
> +    x2n.safe_push (0);
> +  x2n.safe_push (1);

...this compensates by setting the dividend to x^(2N-2).  And although
the first loop reads crc_size+1 bits from polynomial before adding the
implicit leading 1, only the low crc_size elements of poly affect the
result.

If we do pass crc_size as N, a simpler way of writing the routine might be:

{
  /* The result has degree N, so needs N + 1 bits.  */
  gcc_assert (n < 64);

  /* Perform a division step for the x^2N coefficient.  At this point the
     quotient and remainder have N implicit trailing zeros.  */
  unsigned HOST_WIDE_INT quotient = 1;
  unsigned HOST_WIDE_INT remainder = polynomial;

  /* Process the coefficients for x^(2N-1) down to x^N, with each step
     reducing the number of implicit trailing zeros by one.  */
  for (unsigned int i = 0; i < n; ++i)
    {
      bool coeff = remainder & (HOST_WIDE_INT_1U << (n - 1));
      quotient = (quotient << 1) | coeff;
      remainder = (remainder << 1) ^ (coeff ? polynomial : 0);
    }
  return quotient;
}

I realise there are many ways of writing this out there though,
so that's just a suggestion.  (And only lightly tested.)

FWIW, we could easily extend the interface to work on wide_ints if we
ever need it for N>63.  

Thanks,
Richard

> +
> +  q.create (n);
> +  for (size_t i = 0; i < n; i++)
> +    q.quick_push (0);
> +
> +  /* Calculate the quotient of x^2n/polynomial.  */
> +  for (int i = n - 1; i >= 0; i--)
> +    {
> +      int d = x2n[i + n - 1];
> +      if (d == 0)
> +	continue;
> +      for (int j = i + n - 1; j >= i; j--)
> +	x2n[j] ^= (pol[j - i]);
> +      q[i] = 1;
> +    }
> +
> +  /* Get the number from the vector of 0/1s.  */
> +  unsigned HOST_WIDE_INT quotient = 0;
> +  for (size_t i = 0; i < q.length (); i++)
> +    {
> +      quotient <<= 1;
> +      quotient = quotient | q[q.length () - i - 1];
> +    }
> +  return quotient;
> +}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.
  2024-06-08  7:53 ` Richard Sandiford
@ 2024-06-08 13:49   ` Jeff Law
  2024-06-09  7:14   ` Mariam Arutunian
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Law @ 2024-06-08 13:49 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches, richard.sandiford



On 6/8/24 1:53 AM, Richard Sandiford wrote:

> 
> I realise there are many ways of writing this out there though,
> so that's just a suggestion.  (And only lightly tested.)
> 
> FWIW, we could easily extend the interface to work on wide_ints if we
> ever need it for N>63.
I think there's constraints elsewhere that keep us in the N<=63 range. 
If we extended things elsewhere to include TI then we could fully 
support 64bit CRCs.

I don't *think* it's that hard, but we haven't actually tried.

Jeff



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.
  2024-06-08  7:53 ` Richard Sandiford
  2024-06-08 13:49   ` Jeff Law
@ 2024-06-09  7:14   ` Mariam Arutunian
  1 sibling, 0 replies; 6+ messages in thread
From: Mariam Arutunian @ 2024-06-09  7:14 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3799 bytes --]

On Sat, Jun 8, 2024, 09:53 Richard Sandiford <richard.sandiford@arm.com>
wrote:

> Thanks a lot for doing this!  It's a really nice series.
>


Thank you for your positive feedback and for your review and suggestions on
the patch series.

Just had a comment on the long division helper:
>
> Mariam Arutunian <mariamarutunian@gmail.com> writes:
> > +/* Return the quotient of polynomial long division of x^2N by POLYNOMIAL
> > +   in GF (2^N).  */
>
> It looks like there might be an off-by-one discrepancy between the comment
> and the code.  The comment suggests that N is the degree of the polynomial
> (crc_size), whereas the callers seem to pass crc_size + 1.  This doesn't
> matter in practice since...
>
> > +
> > +unsigned HOST_WIDE_INT
> > +gf2n_poly_long_div_quotient (unsigned HOST_WIDE_INT polynomial, size_t
> n)
> > +{
> > +  vec<short> x2n;
> > +  vec<bool> pol, q;
> > +  /* Create vector of bits, for the polynomial.  */
> > +  pol.create (n + 1);
> > +  for (size_t i = 0; i < n; i++)
> > +    {
> > +      pol.quick_push (polynomial & 1);
> > +      polynomial >>= 1;
> > +    }
> > +  pol.quick_push (1);
> > +
> > +  /* Create vector for x^2n polynomial.  */
> > +  x2n.create (2 * n - 1);
> > +  for (size_t i = 0; i < 2 * (n - 1); i++)
> > +    x2n.safe_push (0);
> > +  x2n.safe_push (1);
>
> ...this compensates by setting the dividend to x^(2N-2).  And although
> the first loop reads crc_size+1 bits from polynomial before adding the
> implicit leading 1, only the low crc_size elements of poly affect the
> result.
>


Yes. Initially, I implemented it quickly using an implementation I found
with the intention of refining it later.


> If we do pass crc_size as N, a simpler way of writing the routine might be:
>
> {
>   /* The result has degree N, so needs N + 1 bits.  */
>   gcc_assert (n < 64);
>
>   /* Perform a division step for the x^2N coefficient.  At this point the
>      quotient and remainder have N implicit trailing zeros.  */
>   unsigned HOST_WIDE_INT quotient = 1;
>   unsigned HOST_WIDE_INT remainder = polynomial;
>
>   /* Process the coefficients for x^(2N-1) down to x^N, with each step
>      reducing the number of implicit trailing zeros by one.  */
>   for (unsigned int i = 0; i < n; ++i)
>     {
>       bool coeff = remainder & (HOST_WIDE_INT_1U << (n - 1));
>       quotient = (quotient << 1) | coeff;
>       remainder = (remainder << 1) ^ (coeff ? polynomial : 0);
>     }
>   return quotient;
> }
>
> I realise there are many ways of writing this out there though,
> so that's just a suggestion.  (And only lightly tested.)
>

Thanks, I appreciate your input.
I'm currently on vacation, but after I return, I'll apply all the changes
and send a new version.


> FWIW, we could easily extend the interface to work on wide_ints if we
> ever need it for N>63.



 The problem is keeping the whole quotient. For 64 degree, it may need 65
bits. Jeff already answered this. Alternatively, there might be a method to
perform the calculation without retaining that extra bit, but I haven't
explored it yet.


Thank you again,
Mariam


Thanks,
> Richard
>
> > +
> > +  q.create (n);
> > +  for (size_t i = 0; i < n; i++)
> > +    q.quick_push (0);
> > +
> > +  /* Calculate the quotient of x^2n/polynomial.  */
> > +  for (int i = n - 1; i >= 0; i--)
> > +    {
> > +      int d = x2n[i + n - 1];
> > +      if (d == 0)
> > +     continue;
> > +      for (int j = i + n - 1; j >= i; j--)
> > +     x2n[j] ^= (pol[j - i]);
> > +      q[i] = 1;
> > +    }
> > +
> > +  /* Get the number from the vector of 0/1s.  */
> > +  unsigned HOST_WIDE_INT quotient = 0;
> > +  for (size_t i = 0; i < q.length (); i++)
> > +    {
> > +      quotient <<= 1;
> > +      quotient = quotient | q[q.length () - i - 1];
> > +    }
> > +  return quotient;
> > +}
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-09  7:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-24  8:41 [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC Mariam Arutunian
2024-05-25 18:32 ` Jeff Law
2024-05-27 15:48   ` Mariam Arutunian
2024-06-08  7:53 ` Richard Sandiford
2024-06-08 13:49   ` Jeff Law
2024-06-09  7:14   ` Mariam Arutunian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).