public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation
@ 2024-05-24  8:42 Mariam Arutunian
  2024-06-08 11:41 ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Mariam Arutunian @ 2024-05-24  8:42 UTC (permalink / raw)
  To: GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 2437 bytes --]

This patch introduces two new expanders for the aarch64 backend,
dedicated to generate optimized code for CRC computations.
The new expanders are designed to leverage specific hardware capabilities
to achieve faster CRC calculations,
particularly using the pmul or crc32 instructions when supported by the
target architecture.

Expander 1: Bit-Forward CRC (crc<ALLI:mode><ALLX:mode>4)
For targets that support pmul instruction (TARGET_AES),
the expander will generate code that uses the pmul (crypto_pmulldi)
instruction for CRC computation.

Expander 2: Bit-Reversed CRC (crc_rev<ALLI:mode><ALLX:mode>4)
The expander first checks if the target supports the CRC32 instruction set
(TARGET_CRC32)
and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are met,
it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
crc32w, or crc32x depending on the data size).
If the target does not support crc32 but supports pmul, it then uses the
pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.

Otherwise table-based CRC is generated.

  gcc/config/aarch64/

    * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
function declaration.
    (aarch64_expand_reversed_crc_using_clmul):  Likewise.
    * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
    (aarch64_expand_reversed_crc_using_clmul):  Likewise.
    * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
    (crc_rev<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
    (crc<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
    * iterators.md (crc_data_type): New mode attribute.

  gcc/testsuite/gcc.target/aarch64/

    * crc-1-pmul.c: Likewise.
    * crc-10-pmul.c: Likewise.
    * crc-12-pmul.c: Likewise.
    * crc-13-pmul.c: Likewise.
    * crc-14-pmul.c: Likewise.
    * crc-17-pmul.c: Likewise.
    * crc-18-pmul.c: Likewise.
    * crc-21-pmul.c: Likewise.
    * crc-22-pmul.c: Likewise.
    * crc-23-pmul.c: Likewise.
    * crc-4-pmul.c: Likewise.
    * crc-5-pmul.c: Likewise.
    * crc-6-pmul.c: Likewise.
    * crc-7-pmul.c: Likewise.
    * crc-8-pmul.c: Likewise.
    * crc-9-pmul.c: Likewise.
    * crc-CCIT-data16-pmul.c: Likewise.
    * crc-CCIT-data8-pmul.c: Likewise.
    * crc-coremark-16bitdata-pmul.c: Likewise.
    * crc-crc32-data16.c: New test.
    * crc-crc32-data32.c: Likewise.
    * crc-crc32-data8.c: Likewise.

Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com

[-- Attachment #2: 0006-aarch64-Implement-new-expander-for-efficient-CRC-com.patch --]
[-- Type: application/x-patch, Size: 25626 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation
  2024-05-24  8:42 [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation Mariam Arutunian
@ 2024-06-08 11:41 ` Richard Sandiford
  2024-06-19 15:20   ` Mariam Arutunian
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2024-06-08 11:41 UTC (permalink / raw)
  To: Mariam Arutunian; +Cc: GCC Patches

Mariam Arutunian <mariamarutunian@gmail.com> writes:
> This patch introduces two new expanders for the aarch64 backend,
> dedicated to generate optimized code for CRC computations.
> The new expanders are designed to leverage specific hardware capabilities
> to achieve faster CRC calculations,
> particularly using the pmul or crc32 instructions when supported by the
> target architecture.

Thanks for porting this to aarch64!

> Expander 1: Bit-Forward CRC (crc<ALLI:mode><ALLX:mode>4)
> For targets that support pmul instruction (TARGET_AES),
> the expander will generate code that uses the pmul (crypto_pmulldi)
> instruction for CRC computation.
>
> Expander 2: Bit-Reversed CRC (crc_rev<ALLI:mode><ALLX:mode>4)
> The expander first checks if the target supports the CRC32 instruction set
> (TARGET_CRC32)
> and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are met,
> it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
> crc32w, or crc32x depending on the data size).
> If the target does not support crc32 but supports pmul, it then uses the
> pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.
>
> Otherwise table-based CRC is generated.
>
>   gcc/config/aarch64/
>
>     * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
> function declaration.
>     (aarch64_expand_reversed_crc_using_clmul):  Likewise.
>     * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
>     (aarch64_expand_reversed_crc_using_clmul):  Likewise.
>     * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
>     (crc_rev<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
>     (crc<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
>     * iterators.md (crc_data_type): New mode attribute.
>
>   gcc/testsuite/gcc.target/aarch64/
>
>     * crc-1-pmul.c: Likewise.
>     * crc-10-pmul.c: Likewise.
>     * crc-12-pmul.c: Likewise.
>     * crc-13-pmul.c: Likewise.
>     * crc-14-pmul.c: Likewise.
>     * crc-17-pmul.c: Likewise.
>     * crc-18-pmul.c: Likewise.
>     * crc-21-pmul.c: Likewise.
>     * crc-22-pmul.c: Likewise.
>     * crc-23-pmul.c: Likewise.
>     * crc-4-pmul.c: Likewise.
>     * crc-5-pmul.c: Likewise.
>     * crc-6-pmul.c: Likewise.
>     * crc-7-pmul.c: Likewise.
>     * crc-8-pmul.c: Likewise.
>     * crc-9-pmul.c: Likewise.
>     * crc-CCIT-data16-pmul.c: Likewise.
>     * crc-CCIT-data8-pmul.c: Likewise.
>     * crc-coremark-16bitdata-pmul.c: Likewise.
>     * crc-crc32-data16.c: New test.
>     * crc-crc32-data32.c: Likewise.
>     * crc-crc32-data8.c: Likewise.
>
> Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 1d3f94c813e..167e1140f0d 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree, rtx, int);
>  
>  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
>  void aarch64_restore_za (rtx);
> +void aarch64_expand_crc_using_clmul (rtx *);
> +void aarch64_expand_reversed_crc_using_clmul (rtx *);
> +
>  
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index ee12d8897a8..05cd0296d38 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname, bool write_p, bool is128op)
>    return sysreg->encoding;
>  }
>  
> +/* Generate assembly to calculate CRC
> +   using carry-less multiplication instruction.
> +   OPERANDS[1] is input CRC,
> +   OPERANDS[2] is data (message),
> +   OPERANDS[3] is the polynomial without the leading 1.  */
> +
> +void
> +aarch64_expand_crc_using_clmul (rtx *operands)

This should probably be pmul rather than clmul.

> +{
> +  /* Check and keep arguments.  */
> +  gcc_assert (!CONST_INT_P (operands[0]));
> +  gcc_assert (CONST_INT_P (operands[3]));
> +  rtx crc = operands[1];
> +  rtx data = operands[2];
> +  rtx polynomial = operands[3];
> +
> +  unsigned HOST_WIDE_INT
> +      crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
> +  gcc_assert (crc_size <= 32);
> +  unsigned HOST_WIDE_INT
> +      data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();

We could instead make the interface:

void
aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode,
			       rtx *operands)

so that the lines above don't need the to_constant.  This should "just
work" on the .md file side, since the modes being passed are naturally
scalar_mode.

I think it'd be worth asserting also that data_size <= crc_size.
(Although we could handle any MAX (data_size, crc_size) <= 32
with some adjustment.)

> +
> +  /* Calculate the quotient.  */
> +  unsigned HOST_WIDE_INT
> +      q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size + 1);
> +
> +  /* CRC calculation's main part.  */
> +  if (crc_size > data_size)
> +    crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
> +			NULL_RTX, 1);
> +
> +  rtx t0 = gen_reg_rtx (DImode);
> +  aarch64_emit_move (t0, gen_int_mode (q, DImode));

It's only a minor simplification, but this could instead be:

  rtx t0 = force_reg (DImode, gen_int_mode (q, DImode));

> +  rtx t1 = gen_reg_rtx (DImode);
> +  aarch64_emit_move (t1, polynomial);

If polynomial is a constant operand of mode crc_mode, GCC's standard
CONST_INT representation is to sign-extend the constant to 64 bits.
E.g. a QImode value of 0b1000_0000 would be represented as -128.

I think here we want the zero-extended form, so it might be safer to do:

  polynomial = simplify_gen_unary (ZERO_EXTEND, DImode, polynomial, crc_mode);
  rtx t1 = force_reg (DImode, polynomial);

> +
> +  rtx a0 = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1,
> +			 OPTAB_WIDEN);
> +
> +  rtx clmul_res = gen_reg_rtx (TImode);
> +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t0));
> +  a0 = gen_lowpart (DImode, clmul_res);
> +
> +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, crc_size, NULL_RTX, 1);
> +
> +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t1));
> +  a0 = gen_lowpart (DImode, clmul_res);
> +
> +  if (crc_size > data_size)
> +    {
> +      rtx crc_part = expand_shift (LSHIFT_EXPR, DImode, operands[1], data_size,
> +				   NULL_RTX, 0);
> +      a0 =  expand_binop (DImode, xor_optab, a0, crc_part, NULL_RTX, 1,
> +			  OPTAB_DIRECT);

Formatting nit: extra space after "a0 = "

> +    }
> +  /* Zero upper bits beyond crc_size.  */
> +  rtx num_shift = gen_int_mode (64 - crc_size, DImode);
> +  a0 = expand_shift (LSHIFT_EXPR, DImode, a0, 64 - crc_size,  NULL_RTX, 0);
> +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, 64 - crc_size,  NULL_RTX, 1);

Rather than shift left and then right, I think we should just AND:

  rtx mask = gen_int_mode (GET_MODE_MASK (crc_mode), DImode);
  a0 = expand_binop (DImode, and_optab, a0, mask, NULL_RTX, 1, OPTAB_DIRECT);

That said, it looks like operands[0] has crc_mode.  The register bits
above crc_size therefore shouldn't matter, since they're undefined on read.
E.g. even though (reg:SI R) is stored in an X register, only the low 32
bits are defined; the upper 32 bits can be any value.

So I'd expect we could replace this and...

> +
> +  rtx tgt = simplify_gen_subreg (DImode, operands[0],
> +				 GET_MODE (operands[0]), 0);
> +  aarch64_emit_move (tgt, a0);

...this with just:

  aarch64_emitmove (operands[0], gen_lowpart (crc_mode, a0));

Perhaps that would break down if operands[0] is a subreg with
SUBREG_PROMOTED_VAR_P set, but I think it's up to target-independent
code to handle that case.

> @@ -4543,6 +4545,63 @@
>    [(set_attr "type" "crc")]
>  )
>  
> +;; Reversed CRC
> +(define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
> +	 ;; return value (calculated CRC)
> +  [(set (match_operand:ALLX 0 "register_operand" "=r")
> +		      ;; initial CRC
> +	(unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r")
> +		      ;; data
> +		      (match_operand:ALLI 2 "register_operand" "r")
> +		      ;; polynomial without leading 1
> +		      (match_operand:ALLX 3)]
> +	UNSPEC_CRC_REV))]

Since we (rightly) never generate the RTL above, I think this can just be:

(define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
  [;; return value (calculated CRC)
   (match_operand:ALLX 0 "register_operand")
   ;; initial CRC
   (match_operand:ALLX 1 "register_operand")
   ;; data
   (match_operand:ALLI 2 "register_operand")
   ;; polynomial without leading 1
   (match_operand:ALLX 3)]

without the unspec and constraints.

> +  ""
> +  {
> +    /* If the polynomial is the same as the polynomial of crc32 instruction,
> +       put that instruction.  crc32 uses iSCSI polynomial (0x1EDC6F41).  */
> +    if (TARGET_CRC32 && INTVAL (operands[3]) == 517762881)

The hex constant feels a little easier to read.  I think it'd also
be worth checking <ALLX:MODE>mode == SImode, even though it's currently
redundant (given that no other choice would allow that polynomial).

> +      {
> +	rtx crc_result = gen_reg_rtx (SImode);
> +	rtx crc = operands[1];
> +	rtx data = operands[2];
> +	emit_insn (gen_aarch64_crc32c<ALLI:crc_data_type> (crc_result, crc,
> +							   data));
> +	emit_move_insn (operands[0],
> +			gen_lowpart (GET_MODE (operands[0]), crc_result));

If operands[0] has ALLX mode (== SImode), it looks like we should be
able to use operands[0] directly as the result of the CRC32C.

FWIW, there's also CRC32 for the HDLC etc. polynomial 0x04C11DB7.

> +      }
> +    else if (TARGET_AES)

I think we also need to check <ALLI:sizen> <= <ALLX:sizen> for this.
Similarly for the unreversed CRC pattern.

Thanks again for doing this.  I realise RISC-V is the lead target for
this work, so you've gone above and beyond by doing a full AArch64
port too.  It'd be perfectly valid to ask Arm developers to deal
with the comments above, so please let me know if you'd prefer that.
The patch looks close to ready to me though.

Richard

> +      aarch64_expand_reversed_crc_using_clmul (operands);
> +    else
> +      {
> +	/* Otherwise, generate table-based CRC.  */
> +	expand_reversed_crc_table_based (operands[0], operands[1], operands[2],
> +					 operands[3], GET_MODE (operands[2]),
> +					 generate_reflecting_code_standard);
> +      }
> +    DONE;
> +  }
> +)
> +
> +;; Bit-forward CRC
> +(define_expand "crc<ALLI:mode><ALLX:mode>4"
> +	 ;; return value (calculated CRC)
> +  [(set (match_operand:ALLX 0 "register_operand" "=r")
> +		      ;; initial CRC
> +	(unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r")
> +		      ; data
> +		      (match_operand:ALLI 2 "register_operand" "r")
> +		      ;; polynomial without leading 1
> +		      (match_operand:ALLX 3)]
> +	UNSPEC_CRC))]
> +  "TARGET_AES"
> +  {
> +    aarch64_expand_crc_using_clmul (operands);
> +    DONE;
> +  }
> +)
> +
> +
>  (define_insn "*csinc2<mode>_insn"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>          (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 99cde46f1ba..86e4863d684 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -1276,6 +1276,10 @@
>  ;; Map a mode to a specific constraint character.
>  (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
>  
> +;; Map a mode to a specific constraint character for calling
> +;; appropriate version of crc.
> +(define_mode_attr crc_data_type [(QI "b") (HI "h") (SI "w") (DI "x")])
> +
>  ;; Map modes to Usg and Usj constraints for SISD right shifts
>  (define_mode_attr cmode_simd [(SI "g") (DI "j")])
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> new file mode 100644
> index 00000000000..2bea6280762
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> @@ -0,0 +1,8 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */
> +
> +#include "../../gcc.c-torture/execute/crc-1.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> new file mode 100644
> index 00000000000..846eecbaa85
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-10.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> new file mode 100644
> index 00000000000..0eea6aa6741
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-12.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> new file mode 100644
> index 00000000000..7ff8fbcb665
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-13.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> new file mode 100644
> index 00000000000..80766daf487
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-14.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> new file mode 100644
> index 00000000000..0e32fffa0b6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-17.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> new file mode 100644
> index 00000000000..87f4c63b5ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-18.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> new file mode 100644
> index 00000000000..6eeac8cf97f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-21.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> new file mode 100644
> index 00000000000..76e3c00ce9f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-22.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> new file mode 100644
> index 00000000000..e3a5e99ffba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-23.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> new file mode 100644
> index 00000000000..528006c0099
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-4.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> new file mode 100644
> index 00000000000..41e1f8202bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -w -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-5.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> new file mode 100644
> index 00000000000..83db99ccb8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-6.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> new file mode 100644
> index 00000000000..7ad777aac8c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-7.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> new file mode 100644
> index 00000000000..da1b619c418
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-8.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> new file mode 100644
> index 00000000000..33bbe0bfb26
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-9.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> new file mode 100644
> index 00000000000..0c452c1c0f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-CCIT-data16.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> new file mode 100644
> index 00000000000..87a0b4489a2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +
> +#include "../../gcc.c-torture/execute/crc-CCIT-data8.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> new file mode 100644
> index 00000000000..75ed5aff80b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include "../../gcc.c-torture/execute/crc-coremark16-data16.c"
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> new file mode 100644
> index 00000000000..d5aeee7c0c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +
> +__attribute__ ((noinline,optimize(0)))
> +uint32_t _crc32_O0 (uint32_t crc, uint16_t data) {
> +  int i;
> +  crc = crc ^ data;
> +
> +  for (i = 0; i < 8; i++) {
> +      if (crc & 1)
> +	crc = (crc >> 1) ^ 0x82F63B78;
> +      else
> +	crc = (crc >> 1);
> +    }
> +
> +  return crc;
> +}
> +
> +uint32_t _crc32 (uint32_t crc, uint16_t data) {
> +  int i;
> +  crc = crc ^ data;
> +
> +  for (i = 0; i < 8; i++) {
> +      if (crc & 1)
> +	crc = (crc >> 1) ^ 0x82F63B78;
> +      else
> +	crc = (crc >> 1);
> +    }
> +
> +  return crc;
> +}
> +
> +int main ()
> +{
> +  uint32_t crc = 0x0D800D80;
> +  for (uint16_t i = 0; i < 0xffff; i++)
> +    {
> +      uint32_t res1 = _crc32_O0 (crc, i);
> +      uint32_t res2 = _crc32 (crc, i);
> +      if (res1 != res2)
> +	abort ();
> +      crc = res1;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> new file mode 100644
> index 00000000000..f0e319b3ab8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> @@ -0,0 +1,52 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +__attribute__ ((noinline,optimize(0)))
> +uint32_t _crc32_O0 (uint32_t crc, uint32_t data) {
> +  int i;
> +  crc = crc ^ data;
> +
> +  for (i = 0; i < 32; i++) {
> +      if (crc & 1)
> +	crc = (crc >> 1) ^ 0x82F63B78;
> +      else
> +	crc = (crc >> 1);
> +    }
> +
> +  return crc;
> +}
> +
> +uint32_t _crc32 (uint32_t crc, uint32_t data) {
> +  int i;
> +  crc = crc ^ data;
> +
> +  for (i = 0; i < 32; i++) {
> +      if (crc & 1)
> +	crc = (crc >> 1) ^ 0x82F63B78;
> +      else
> +	crc = (crc >> 1);
> +    }
> +
> +  return crc;
> +}
> +
> +int main ()
> +{
> +  uint32_t crc = 0x0D800D80;
> +  for (uint8_t i = 0; i < 0xff; i++)
> +    {
> +      uint32_t res1 = _crc32_O0 (crc, i);
> +      uint32_t res2 = _crc32 (crc, i);
> +      if (res1 != res2)
> +	abort ();
> +      crc = res1;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> new file mode 100644
> index 00000000000..95ffde6a9d2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish -fdump-tree-crc-details" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +
> +__attribute__ ((noinline,optimize(0)))
> +uint32_t _crc32_O0 (uint32_t crc, uint8_t data) {
> +  int i;
> +  crc = crc ^ data;
> +
> +  for (i = 0; i < 8; i++) {
> +      if (crc & 1)
> +	crc = (crc >> 1) ^ 0x82F63B78;
> +      else
> +	crc = (crc >> 1);
> +    }
> +
> +  return crc;
> +}
> +
> +uint32_t _crc32 (uint32_t crc, uint8_t data) {
> +  int i;
> +  crc = crc ^ data;
> +
> +  for (i = 0; i < 8; i++) {
> +      if (crc & 1)
> +	crc = (crc >> 1) ^ 0x82F63B78;
> +      else
> +	crc = (crc >> 1);
> +    }
> +
> +  return crc;
> +}
> +
> +int main ()
> +{
> +  uint32_t crc = 0x0D800D80;
> +  for (uint8_t i = 0; i < 0xff; i++)
> +    {
> +      uint32_t res1 = _crc32_O0 (crc, i);
> +      uint32_t res2 = _crc32 (crc, i);
> +      if (res1 != res2)
> +	abort ();
> +      crc = res1;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC code." 0 "crc"} } */
> +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation
  2024-06-08 11:41 ` Richard Sandiford
@ 2024-06-19 15:20   ` Mariam Arutunian
  0 siblings, 0 replies; 3+ messages in thread
From: Mariam Arutunian @ 2024-06-19 15:20 UTC (permalink / raw)
  To: Mariam Arutunian, GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 33258 bytes --]

On Sat, Jun 8, 2024 at 3:41 PM Richard Sandiford <richard.sandiford@arm.com>
wrote:

> Mariam Arutunian <mariamarutunian@gmail.com> writes:
> > This patch introduces two new expanders for the aarch64 backend,
> > dedicated to generate optimized code for CRC computations.
> > The new expanders are designed to leverage specific hardware capabilities
> > to achieve faster CRC calculations,
> > particularly using the pmul or crc32 instructions when supported by the
> > target architecture.
>
> Thanks for porting this to aarch64!
>
> > Expander 1: Bit-Forward CRC (crc<ALLI:mode><ALLX:mode>4)
> > For targets that support pmul instruction (TARGET_AES),
> > the expander will generate code that uses the pmul (crypto_pmulldi)
> > instruction for CRC computation.
> >
> > Expander 2: Bit-Reversed CRC (crc_rev<ALLI:mode><ALLX:mode>4)
> > The expander first checks if the target supports the CRC32 instruction
> set
> > (TARGET_CRC32)
> > and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are
> met,
> > it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
> > crc32w, or crc32x depending on the data size).
> > If the target does not support crc32 but supports pmul, it then uses the
> > pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.
> >
> > Otherwise table-based CRC is generated.
> >
> >   gcc/config/aarch64/
> >
> >     * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
> > function declaration.
> >     (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> >     * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
> >     (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> >     * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
> >     (crc_rev<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
> >     (crc<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
> >     * iterators.md (crc_data_type): New mode attribute.
> >
> >   gcc/testsuite/gcc.target/aarch64/
> >
> >     * crc-1-pmul.c: Likewise.
> >     * crc-10-pmul.c: Likewise.
> >     * crc-12-pmul.c: Likewise.
> >     * crc-13-pmul.c: Likewise.
> >     * crc-14-pmul.c: Likewise.
> >     * crc-17-pmul.c: Likewise.
> >     * crc-18-pmul.c: Likewise.
> >     * crc-21-pmul.c: Likewise.
> >     * crc-22-pmul.c: Likewise.
> >     * crc-23-pmul.c: Likewise.
> >     * crc-4-pmul.c: Likewise.
> >     * crc-5-pmul.c: Likewise.
> >     * crc-6-pmul.c: Likewise.
> >     * crc-7-pmul.c: Likewise.
> >     * crc-8-pmul.c: Likewise.
> >     * crc-9-pmul.c: Likewise.
> >     * crc-CCIT-data16-pmul.c: Likewise.
> >     * crc-CCIT-data8-pmul.c: Likewise.
> >     * crc-coremark-16bitdata-pmul.c: Likewise.
> >     * crc-crc32-data16.c: New test.
> >     * crc-crc32-data32.c: Likewise.
> >     * crc-crc32-data8.c: Likewise.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > index 1d3f94c813e..167e1140f0d 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree,
> rtx, int);
> >
> >  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
> >  void aarch64_restore_za (rtx);
> > +void aarch64_expand_crc_using_clmul (rtx *);
> > +void aarch64_expand_reversed_crc_using_clmul (rtx *);
> > +
> >
> >  #endif /* GCC_AARCH64_PROTOS_H */
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index ee12d8897a8..05cd0296d38 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname,
> bool write_p, bool is128op)
> >    return sysreg->encoding;
> >  }
> >
> > +/* Generate assembly to calculate CRC
> > +   using carry-less multiplication instruction.
> > +   OPERANDS[1] is input CRC,
> > +   OPERANDS[2] is data (message),
> > +   OPERANDS[3] is the polynomial without the leading 1.  */
> > +
> > +void
> > +aarch64_expand_crc_using_clmul (rtx *operands)
>
> This should probably be pmul rather than clmul.
>
> > +{
> > +  /* Check and keep arguments.  */
> > +  gcc_assert (!CONST_INT_P (operands[0]));
> > +  gcc_assert (CONST_INT_P (operands[3]));
> > +  rtx crc = operands[1];
> > +  rtx data = operands[2];
> > +  rtx polynomial = operands[3];
> > +
> > +  unsigned HOST_WIDE_INT
> > +      crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant
> ();
> > +  gcc_assert (crc_size <= 32);
> > +  unsigned HOST_WIDE_INT
> > +      data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();
>
> We could instead make the interface:
>
> void
> aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode,
>                                rtx *operands)
>
> so that the lines above don't need the to_constant.  This should "just
> work" on the .md file side, since the modes being passed are naturally
> scalar_mode.
>
> I think it'd be worth asserting also that data_size <= crc_size.
> (Although we could handle any MAX (data_size, crc_size) <= 32
> with some adjustment.)
>
> > +
> > +  /* Calculate the quotient.  */
> > +  unsigned HOST_WIDE_INT
> > +      q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size +
> 1);
> > +
> > +  /* CRC calculation's main part.  */
> > +  if (crc_size > data_size)
> > +    crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
> > +                     NULL_RTX, 1);
> > +
> > +  rtx t0 = gen_reg_rtx (DImode);
> > +  aarch64_emit_move (t0, gen_int_mode (q, DImode));
>
> It's only a minor simplification, but this could instead be:
>
>   rtx t0 = force_reg (DImode, gen_int_mode (q, DImode));
>
> > +  rtx t1 = gen_reg_rtx (DImode);
> > +  aarch64_emit_move (t1, polynomial);
>
> If polynomial is a constant operand of mode crc_mode, GCC's standard
> CONST_INT representation is to sign-extend the constant to 64 bits.
> E.g. a QImode value of 0b1000_0000 would be represented as -128.
>
> I think here we want the zero-extended form, so it might be safer to do:
>
>   polynomial = simplify_gen_unary (ZERO_EXTEND, DImode, polynomial,
> crc_mode);
>   rtx t1 = force_reg (DImode, polynomial);
>
> > +
> > +  rtx a0 = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1,
> > +                      OPTAB_WIDEN);
> > +
> > +  rtx clmul_res = gen_reg_rtx (TImode);
> > +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t0));
> > +  a0 = gen_lowpart (DImode, clmul_res);
> > +
> > +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, crc_size, NULL_RTX, 1);
> > +
> > +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t1));
> > +  a0 = gen_lowpart (DImode, clmul_res);
> > +
> > +  if (crc_size > data_size)
> > +    {
> > +      rtx crc_part = expand_shift (LSHIFT_EXPR, DImode, operands[1],
> data_size,
> > +                                NULL_RTX, 0);
> > +      a0 =  expand_binop (DImode, xor_optab, a0, crc_part, NULL_RTX, 1,
> > +                       OPTAB_DIRECT);
>
> Formatting nit: extra space after "a0 = "
>
> > +    }
> > +  /* Zero upper bits beyond crc_size.  */
> > +  rtx num_shift = gen_int_mode (64 - crc_size, DImode);
> > +  a0 = expand_shift (LSHIFT_EXPR, DImode, a0, 64 - crc_size,  NULL_RTX,
> 0);
> > +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, 64 - crc_size,  NULL_RTX,
> 1);
>
> Rather than shift left and then right, I think we should just AND:
>
>   rtx mask = gen_int_mode (GET_MODE_MASK (crc_mode), DImode);
>   a0 = expand_binop (DImode, and_optab, a0, mask, NULL_RTX, 1,
> OPTAB_DIRECT);
>
> That said, it looks like operands[0] has crc_mode.  The register bits
> above crc_size therefore shouldn't matter, since they're undefined on read.
> E.g. even though (reg:SI R) is stored in an X register, only the low 32
> bits are defined; the upper 32 bits can be any value.
>
> So I'd expect we could replace this and...
>
> > +
> > +  rtx tgt = simplify_gen_subreg (DImode, operands[0],
> > +                              GET_MODE (operands[0]), 0);
> > +  aarch64_emit_move (tgt, a0);
>
> ...this with just:
>
>   aarch64_emitmove (operands[0], gen_lowpart (crc_mode, a0));
>
> Perhaps that would break down if operands[0] is a subreg with
> SUBREG_PROMOTED_VAR_P set, but I think it's up to target-independent
> code to handle that case.
>
> > @@ -4543,6 +4545,63 @@
> >    [(set_attr "type" "crc")]
> >  )
> >
> > +;; Reversed CRC
> > +(define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
> > +      ;; return value (calculated CRC)
> > +  [(set (match_operand:ALLX 0 "register_operand" "=r")
> > +                   ;; initial CRC
> > +     (unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r")
> > +                   ;; data
> > +                   (match_operand:ALLI 2 "register_operand" "r")
> > +                   ;; polynomial without leading 1
> > +                   (match_operand:ALLX 3)]
> > +     UNSPEC_CRC_REV))]
>
> Since we (rightly) never generate the RTL above, I think this can just be:
>
> (define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
>   [;; return value (calculated CRC)
>    (match_operand:ALLX 0 "register_operand")
>    ;; initial CRC
>    (match_operand:ALLX 1 "register_operand")
>    ;; data
>    (match_operand:ALLI 2 "register_operand")
>    ;; polynomial without leading 1
>    (match_operand:ALLX 3)]
>
> without the unspec and constraints.
>
> > +  ""
> > +  {
> > +    /* If the polynomial is the same as the polynomial of crc32
> instruction,
> > +       put that instruction.  crc32 uses iSCSI polynomial
> (0x1EDC6F41).  */
> > +    if (TARGET_CRC32 && INTVAL (operands[3]) == 517762881)
>
> The hex constant feels a little easier to read.  I think it'd also
> be worth checking <ALLX:MODE>mode == SImode, even though it's currently
> redundant (given that no other choice would allow that polynomial).
>
> > +      {
> > +     rtx crc_result = gen_reg_rtx (SImode);
> > +     rtx crc = operands[1];
> > +     rtx data = operands[2];
> > +     emit_insn (gen_aarch64_crc32c<ALLI:crc_data_type> (crc_result, crc,
> > +                                                        data));
> > +     emit_move_insn (operands[0],
> > +                     gen_lowpart (GET_MODE (operands[0]), crc_result));
>
> If operands[0] has ALLX mode (== SImode), it looks like we should be
> able to use operands[0] directly as the result of the CRC32C.
>
> FWIW, there's also CRC32 for the HDLC etc. polynomial 0x04C11DB7.
>
> > +      }
> > +    else if (TARGET_AES)
>
> I think we also need to check <ALLI:sizen> <= <ALLX:sizen> for this.
> Similarly for the unreversed CRC pattern.
>
> Thanks again for doing this.  I realise RISC-V is the lead target for
> this work, so you've gone above and beyond by doing a full AArch64
> port too.  It'd be perfectly valid to ask Arm developers to deal
> with the comments above, so please let me know if you'd prefer that.
> The patch looks close to ready to me though.
>

Thanks for your suggestions and explanations, and thank you for recognizing
my work. I'll resolve all the comments.


Best regards,
Mariam


> Richard
>
> > +      aarch64_expand_reversed_crc_using_clmul (operands);
> > +    else
> > +      {
> > +     /* Otherwise, generate table-based CRC.  */
> > +     expand_reversed_crc_table_based (operands[0], operands[1],
> operands[2],
> > +                                      operands[3], GET_MODE
> (operands[2]),
> > +
> generate_reflecting_code_standard);
> > +      }
> > +    DONE;
> > +  }
> > +)
> > +
> > +;; Bit-forward CRC
> > +(define_expand "crc<ALLI:mode><ALLX:mode>4"
> > +      ;; return value (calculated CRC)
> > +  [(set (match_operand:ALLX 0 "register_operand" "=r")
> > +                   ;; initial CRC
> > +     (unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r")
> > +                   ; data
> > +                   (match_operand:ALLI 2 "register_operand" "r")
> > +                   ;; polynomial without leading 1
> > +                   (match_operand:ALLX 3)]
> > +     UNSPEC_CRC))]
> > +  "TARGET_AES"
> > +  {
> > +    aarch64_expand_crc_using_clmul (operands);
> > +    DONE;
> > +  }
> > +)
> > +
> > +
> >  (define_insn "*csinc2<mode>_insn"
> >    [(set (match_operand:GPI 0 "register_operand" "=r")
> >          (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
> > diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> > index 99cde46f1ba..86e4863d684 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -1276,6 +1276,10 @@
> >  ;; Map a mode to a specific constraint character.
> >  (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
> >
> > +;; Map a mode to a specific constraint character for calling
> > +;; appropriate version of crc.
> > +(define_mode_attr crc_data_type [(QI "b") (HI "h") (SI "w") (DI "x")])
> > +
> >  ;; Map modes to Usg and Usj constraints for SISD right shifts
> >  (define_mode_attr cmode_simd [(SI "g") (DI "j")])
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> > new file mode 100644
> > index 00000000000..2bea6280762
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> > @@ -0,0 +1,8 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-1.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> > new file mode 100644
> > index 00000000000..846eecbaa85
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-10.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> > new file mode 100644
> > index 00000000000..0eea6aa6741
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-12.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> > new file mode 100644
> > index 00000000000..7ff8fbcb665
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-13.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> > new file mode 100644
> > index 00000000000..80766daf487
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-14.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> > new file mode 100644
> > index 00000000000..0e32fffa0b6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-17.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> > new file mode 100644
> > index 00000000000..87f4c63b5ea
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-18.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> > new file mode 100644
> > index 00000000000..6eeac8cf97f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-21.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> > new file mode 100644
> > index 00000000000..76e3c00ce9f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-22.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> > new file mode 100644
> > index 00000000000..e3a5e99ffba
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-23.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> > new file mode 100644
> > index 00000000000..528006c0099
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-4.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> > new file mode 100644
> > index 00000000000..41e1f8202bc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -w -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-5.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> > new file mode 100644
> > index 00000000000..83db99ccb8b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-6.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> > new file mode 100644
> > index 00000000000..7ad777aac8c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-7.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> > new file mode 100644
> > index 00000000000..da1b619c418
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-8.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> > new file mode 100644
> > index 00000000000..33bbe0bfb26
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-9.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> > new file mode 100644
> > index 00000000000..0c452c1c0f4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-CCIT-data16.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> > new file mode 100644
> > index 00000000000..87a0b4489a2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-CCIT-data8.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git
> a/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> > new file mode 100644
> > index 00000000000..75ed5aff80b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-coremark16-data16.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> > new file mode 100644
> > index 00000000000..d5aeee7c0c4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> > @@ -0,0 +1,53 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +
> > +__attribute__ ((noinline,optimize(0)))
> > +uint32_t _crc32_O0 (uint32_t crc, uint16_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +uint32_t _crc32 (uint32_t crc, uint16_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +int main ()
> > +{
> > +  uint32_t crc = 0x0D800D80;
> > +  for (uint16_t i = 0; i < 0xffff; i++)
> > +    {
> > +      uint32_t res1 = _crc32_O0 (crc, i);
> > +      uint32_t res2 = _crc32 (crc, i);
> > +      if (res1 != res2)
> > +     abort ();
> > +      crc = res1;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> > new file mode 100644
> > index 00000000000..f0e319b3ab8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> > @@ -0,0 +1,52 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +__attribute__ ((noinline,optimize(0)))
> > +uint32_t _crc32_O0 (uint32_t crc, uint32_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 32; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +uint32_t _crc32 (uint32_t crc, uint32_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 32; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +int main ()
> > +{
> > +  uint32_t crc = 0x0D800D80;
> > +  for (uint8_t i = 0; i < 0xff; i++)
> > +    {
> > +      uint32_t res1 = _crc32_O0 (crc, i);
> > +      uint32_t res2 = _crc32 (crc, i);
> > +      if (res1 != res2)
> > +     abort ();
> > +      crc = res1;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> > new file mode 100644
> > index 00000000000..95ffde6a9d2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> > @@ -0,0 +1,53 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +
> > +__attribute__ ((noinline,optimize(0)))
> > +uint32_t _crc32_O0 (uint32_t crc, uint8_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +uint32_t _crc32 (uint32_t crc, uint8_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +int main ()
> > +{
> > +  uint32_t crc = 0x0D800D80;
> > +  for (uint8_t i = 0; i < 0xff; i++)
> > +    {
> > +      uint32_t res1 = _crc32_O0 (crc, i);
> > +      uint32_t res2 = _crc32 (crc, i);
> > +      if (res1 != res2)
> > +     abort ();
> > +      crc = res1;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-19 15:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-24  8:42 [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation Mariam Arutunian
2024-06-08 11:41 ` Richard Sandiford
2024-06-19 15:20   ` Mariam Arutunian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).