On Sat, Jun 8, 2024 at 3:41 PM Richard Sandiford wrote: > Mariam Arutunian writes: > > This patch introduces two new expanders for the aarch64 backend, > > dedicated to generate optimized code for CRC computations. > > The new expanders are designed to leverage specific hardware capabilities > > to achieve faster CRC calculations, > > particularly using the pmul or crc32 instructions when supported by the > > target architecture. > > Thanks for porting this to aarch64! > > > Expander 1: Bit-Forward CRC (crc4) > > For targets that support pmul instruction (TARGET_AES), > > the expander will generate code that uses the pmul (crypto_pmulldi) > > instruction for CRC computation. > > > > Expander 2: Bit-Reversed CRC (crc_rev4) > > The expander first checks if the target supports the CRC32 instruction > set > > (TARGET_CRC32) > > and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are > met, > > it emits calls to the corresponding crc32 instruction (crc32b, crc32h, > > crc32w, or crc32x depending on the data size). > > If the target does not support crc32 but supports pmul, it then uses the > > pmul (crypto_pmulldi) instruction for bit-reversed CRC computation. > > > > Otherwise table-based CRC is generated. > > > > gcc/config/aarch64/ > > > > * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern > > function declaration. > > (aarch64_expand_reversed_crc_using_clmul): Likewise. > > * aarch64.cc (aarch64_expand_crc_using_clmul): New function. > > (aarch64_expand_reversed_crc_using_clmul): Likewise. > > * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV): New unspecs. > > (crc_rev4): New expander for reversed CRC. > > (crc4): New expander for reversed CRC. > > * iterators.md (crc_data_type): New mode attribute. > > > > gcc/testsuite/gcc.target/aarch64/ > > > > * crc-1-pmul.c: Likewise. > > * crc-10-pmul.c: Likewise. > > * crc-12-pmul.c: Likewise. > > * crc-13-pmul.c: Likewise. > > * crc-14-pmul.c: Likewise. > > * crc-17-pmul.c: Likewise. > > * crc-18-pmul.c: Likewise. > > * crc-21-pmul.c: Likewise. > > * crc-22-pmul.c: Likewise. > > * crc-23-pmul.c: Likewise. > > * crc-4-pmul.c: Likewise. > > * crc-5-pmul.c: Likewise. > > * crc-6-pmul.c: Likewise. > > * crc-7-pmul.c: Likewise. > > * crc-8-pmul.c: Likewise. > > * crc-9-pmul.c: Likewise. > > * crc-CCIT-data16-pmul.c: Likewise. > > * crc-CCIT-data8-pmul.c: Likewise. > > * crc-coremark-16bitdata-pmul.c: Likewise. > > * crc-crc32-data16.c: New test. > > * crc-crc32-data32.c: Likewise. > > * crc-crc32-data8.c: Likewise. > > > > Signed-off-by: Mariam Arutunian > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > > index 1d3f94c813e..167e1140f0d 100644 > > --- a/gcc/config/aarch64/aarch64-protos.h > > +++ b/gcc/config/aarch64/aarch64-protos.h > > @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree, > rtx, int); > > > > bool aarch64_optimize_mode_switching (aarch64_mode_entity); > > void aarch64_restore_za (rtx); > > +void aarch64_expand_crc_using_clmul (rtx *); > > +void aarch64_expand_reversed_crc_using_clmul (rtx *); > > + > > > > #endif /* GCC_AARCH64_PROTOS_H */ > > diff --git a/gcc/config/aarch64/aarch64.cc > b/gcc/config/aarch64/aarch64.cc > > index ee12d8897a8..05cd0296d38 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname, > bool write_p, bool is128op) > > return sysreg->encoding; > > } > > > > +/* Generate assembly to calculate CRC > > + using carry-less multiplication instruction. > > + OPERANDS[1] is input CRC, > > + OPERANDS[2] is data (message), > > + OPERANDS[3] is the polynomial without the leading 1. */ > > + > > +void > > +aarch64_expand_crc_using_clmul (rtx *operands) > > This should probably be pmul rather than clmul. > > > +{ > > + /* Check and keep arguments. */ > > + gcc_assert (!CONST_INT_P (operands[0])); > > + gcc_assert (CONST_INT_P (operands[3])); > > + rtx crc = operands[1]; > > + rtx data = operands[2]; > > + rtx polynomial = operands[3]; > > + > > + unsigned HOST_WIDE_INT > > + crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant > (); > > + gcc_assert (crc_size <= 32); > > + unsigned HOST_WIDE_INT > > + data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant (); > > We could instead make the interface: > > void > aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode, > rtx *operands) > > so that the lines above don't need the to_constant. This should "just > work" on the .md file side, since the modes being passed are naturally > scalar_mode. > > I think it'd be worth asserting also that data_size <= crc_size. > (Although we could handle any MAX (data_size, crc_size) <= 32 > with some adjustment.) > > > + > > + /* Calculate the quotient. */ > > + unsigned HOST_WIDE_INT > > + q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size + > 1); > > + > > + /* CRC calculation's main part. */ > > + if (crc_size > data_size) > > + crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size, > > + NULL_RTX, 1); > > + > > + rtx t0 = gen_reg_rtx (DImode); > > + aarch64_emit_move (t0, gen_int_mode (q, DImode)); > > It's only a minor simplification, but this could instead be: > > rtx t0 = force_reg (DImode, gen_int_mode (q, DImode)); > > > + rtx t1 = gen_reg_rtx (DImode); > > + aarch64_emit_move (t1, polynomial); > > If polynomial is a constant operand of mode crc_mode, GCC's standard > CONST_INT representation is to sign-extend the constant to 64 bits. > E.g. a QImode value of 0b1000_0000 would be represented as -128. > > I think here we want the zero-extended form, so it might be safer to do: > > polynomial = simplify_gen_unary (ZERO_EXTEND, DImode, polynomial, > crc_mode); > rtx t1 = force_reg (DImode, polynomial); > > > + > > + rtx a0 = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1, > > + OPTAB_WIDEN); > > + > > + rtx clmul_res = gen_reg_rtx (TImode); > > + emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t0)); > > + a0 = gen_lowpart (DImode, clmul_res); > > + > > + a0 = expand_shift (RSHIFT_EXPR, DImode, a0, crc_size, NULL_RTX, 1); > > + > > + emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t1)); > > + a0 = gen_lowpart (DImode, clmul_res); > > + > > + if (crc_size > data_size) > > + { > > + rtx crc_part = expand_shift (LSHIFT_EXPR, DImode, operands[1], > data_size, > > + NULL_RTX, 0); > > + a0 = expand_binop (DImode, xor_optab, a0, crc_part, NULL_RTX, 1, > > + OPTAB_DIRECT); > > Formatting nit: extra space after "a0 = " > > > + } > > + /* Zero upper bits beyond crc_size. */ > > + rtx num_shift = gen_int_mode (64 - crc_size, DImode); > > + a0 = expand_shift (LSHIFT_EXPR, DImode, a0, 64 - crc_size, NULL_RTX, > 0); > > + a0 = expand_shift (RSHIFT_EXPR, DImode, a0, 64 - crc_size, NULL_RTX, > 1); > > Rather than shift left and then right, I think we should just AND: > > rtx mask = gen_int_mode (GET_MODE_MASK (crc_mode), DImode); > a0 = expand_binop (DImode, and_optab, a0, mask, NULL_RTX, 1, > OPTAB_DIRECT); > > That said, it looks like operands[0] has crc_mode. The register bits > above crc_size therefore shouldn't matter, since they're undefined on read. > E.g. even though (reg:SI R) is stored in an X register, only the low 32 > bits are defined; the upper 32 bits can be any value. > > So I'd expect we could replace this and... > > > + > > + rtx tgt = simplify_gen_subreg (DImode, operands[0], > > + GET_MODE (operands[0]), 0); > > + aarch64_emit_move (tgt, a0); > > ...this with just: > > aarch64_emitmove (operands[0], gen_lowpart (crc_mode, a0)); > > Perhaps that would break down if operands[0] is a subreg with > SUBREG_PROMOTED_VAR_P set, but I think it's up to target-independent > code to handle that case. > > > @@ -4543,6 +4545,63 @@ > > [(set_attr "type" "crc")] > > ) > > > > +;; Reversed CRC > > +(define_expand "crc_rev4" > > + ;; return value (calculated CRC) > > + [(set (match_operand:ALLX 0 "register_operand" "=r") > > + ;; initial CRC > > + (unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r") > > + ;; data > > + (match_operand:ALLI 2 "register_operand" "r") > > + ;; polynomial without leading 1 > > + (match_operand:ALLX 3)] > > + UNSPEC_CRC_REV))] > > Since we (rightly) never generate the RTL above, I think this can just be: > > (define_expand "crc_rev4" > [;; return value (calculated CRC) > (match_operand:ALLX 0 "register_operand") > ;; initial CRC > (match_operand:ALLX 1 "register_operand") > ;; data > (match_operand:ALLI 2 "register_operand") > ;; polynomial without leading 1 > (match_operand:ALLX 3)] > > without the unspec and constraints. > > > + "" > > + { > > + /* If the polynomial is the same as the polynomial of crc32 > instruction, > > + put that instruction. crc32 uses iSCSI polynomial > (0x1EDC6F41). */ > > + if (TARGET_CRC32 && INTVAL (operands[3]) == 517762881) > > The hex constant feels a little easier to read. I think it'd also > be worth checking mode == SImode, even though it's currently > redundant (given that no other choice would allow that polynomial). > > > + { > > + rtx crc_result = gen_reg_rtx (SImode); > > + rtx crc = operands[1]; > > + rtx data = operands[2]; > > + emit_insn (gen_aarch64_crc32c (crc_result, crc, > > + data)); > > + emit_move_insn (operands[0], > > + gen_lowpart (GET_MODE (operands[0]), crc_result)); > > If operands[0] has ALLX mode (== SImode), it looks like we should be > able to use operands[0] directly as the result of the CRC32C. > > FWIW, there's also CRC32 for the HDLC etc. polynomial 0x04C11DB7. > > > + } > > + else if (TARGET_AES) > > I think we also need to check <= for this. > Similarly for the unreversed CRC pattern. > > Thanks again for doing this. I realise RISC-V is the lead target for > this work, so you've gone above and beyond by doing a full AArch64 > port too. It'd be perfectly valid to ask Arm developers to deal > with the comments above, so please let me know if you'd prefer that. > The patch looks close to ready to me though. > Thanks for your suggestions and explanations, and thank you for recognizing my work. I'll resolve all the comments. Best regards, Mariam > Richard > > > + aarch64_expand_reversed_crc_using_clmul (operands); > > + else > > + { > > + /* Otherwise, generate table-based CRC. */ > > + expand_reversed_crc_table_based (operands[0], operands[1], > operands[2], > > + operands[3], GET_MODE > (operands[2]), > > + > generate_reflecting_code_standard); > > + } > > + DONE; > > + } > > +) > > + > > +;; Bit-forward CRC > > +(define_expand "crc4" > > + ;; return value (calculated CRC) > > + [(set (match_operand:ALLX 0 "register_operand" "=r") > > + ;; initial CRC > > + (unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r") > > + ; data > > + (match_operand:ALLI 2 "register_operand" "r") > > + ;; polynomial without leading 1 > > + (match_operand:ALLX 3)] > > + UNSPEC_CRC))] > > + "TARGET_AES" > > + { > > + aarch64_expand_crc_using_clmul (operands); > > + DONE; > > + } > > +) > > + > > + > > (define_insn "*csinc2_insn" > > [(set (match_operand:GPI 0 "register_operand" "=r") > > (plus:GPI (match_operand 2 "aarch64_comparison_operation" "") > > diff --git a/gcc/config/aarch64/iterators.md > b/gcc/config/aarch64/iterators.md > > index 99cde46f1ba..86e4863d684 100644 > > --- a/gcc/config/aarch64/iterators.md > > +++ b/gcc/config/aarch64/iterators.md > > @@ -1276,6 +1276,10 @@ > > ;; Map a mode to a specific constraint character. > > (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")]) > > > > +;; Map a mode to a specific constraint character for calling > > +;; appropriate version of crc. > > +(define_mode_attr crc_data_type [(QI "b") (HI "h") (SI "w") (DI "x")]) > > + > > ;; Map modes to Usg and Usj constraints for SISD right shifts > > (define_mode_attr cmode_simd [(SI "g") (DI "j")]) > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c > > new file mode 100644 > > index 00000000000..2bea6280762 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c > > @@ -0,0 +1,8 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */ > > + > > +#include "../../gcc.c-torture/execute/crc-1.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > \ No newline at end of file > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c > > new file mode 100644 > > index 00000000000..846eecbaa85 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-10.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c > > new file mode 100644 > > index 00000000000..0eea6aa6741 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-12.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c > > new file mode 100644 > > index 00000000000..7ff8fbcb665 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-13.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c > > new file mode 100644 > > index 00000000000..80766daf487 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-14.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c > > new file mode 100644 > > index 00000000000..0e32fffa0b6 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-17.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c > > new file mode 100644 > > index 00000000000..87f4c63b5ea > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-18.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c > > new file mode 100644 > > index 00000000000..6eeac8cf97f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-21.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c > > new file mode 100644 > > index 00000000000..76e3c00ce9f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-22.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c > > new file mode 100644 > > index 00000000000..e3a5e99ffba > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-23.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c > > new file mode 100644 > > index 00000000000..528006c0099 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-4.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c > > new file mode 100644 > > index 00000000000..41e1f8202bc > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -w -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-5.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > \ No newline at end of file > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c > > new file mode 100644 > > index 00000000000..83db99ccb8b > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-6.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > \ No newline at end of file > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c > > new file mode 100644 > > index 00000000000..7ad777aac8c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-7.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c > > new file mode 100644 > > index 00000000000..da1b619c418 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-8.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c > > new file mode 100644 > > index 00000000000..33bbe0bfb26 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-9.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c > > new file mode 100644 > > index 00000000000..0c452c1c0f4 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-CCIT-data16.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > \ No newline at end of file > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c > > new file mode 100644 > > index 00000000000..87a0b4489a2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */ > > + > > +#include "../../gcc.c-torture/execute/crc-CCIT-data8.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > \ No newline at end of file > > diff --git > a/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c > b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c > > new file mode 100644 > > index 00000000000..75ed5aff80b > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include "../../gcc.c-torture/execute/crc-coremark16-data16.c" > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */ > > \ No newline at end of file > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c > b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c > > new file mode 100644 > > index 00000000000..d5aeee7c0c4 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c > > @@ -0,0 +1,53 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include > > +#include > > + > > +__attribute__ ((noinline,optimize(0))) > > +uint32_t _crc32_O0 (uint32_t crc, uint16_t data) { > > + int i; > > + crc = crc ^ data; > > + > > + for (i = 0; i < 8; i++) { > > + if (crc & 1) > > + crc = (crc >> 1) ^ 0x82F63B78; > > + else > > + crc = (crc >> 1); > > + } > > + > > + return crc; > > +} > > + > > +uint32_t _crc32 (uint32_t crc, uint16_t data) { > > + int i; > > + crc = crc ^ data; > > + > > + for (i = 0; i < 8; i++) { > > + if (crc & 1) > > + crc = (crc >> 1) ^ 0x82F63B78; > > + else > > + crc = (crc >> 1); > > + } > > + > > + return crc; > > +} > > + > > +int main () > > +{ > > + uint32_t crc = 0x0D800D80; > > + for (uint16_t i = 0; i < 0xffff; i++) > > + { > > + uint32_t res1 = _crc32_O0 (crc, i); > > + uint32_t res2 = _crc32 (crc, i); > > + if (res1 != res2) > > + abort (); > > + crc = res1; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */ > > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c > b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c > > new file mode 100644 > > index 00000000000..f0e319b3ab8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c > > @@ -0,0 +1,52 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include > > +#include > > +__attribute__ ((noinline,optimize(0))) > > +uint32_t _crc32_O0 (uint32_t crc, uint32_t data) { > > + int i; > > + crc = crc ^ data; > > + > > + for (i = 0; i < 32; i++) { > > + if (crc & 1) > > + crc = (crc >> 1) ^ 0x82F63B78; > > + else > > + crc = (crc >> 1); > > + } > > + > > + return crc; > > +} > > + > > +uint32_t _crc32 (uint32_t crc, uint32_t data) { > > + int i; > > + crc = crc ^ data; > > + > > + for (i = 0; i < 32; i++) { > > + if (crc & 1) > > + crc = (crc >> 1) ^ 0x82F63B78; > > + else > > + crc = (crc >> 1); > > + } > > + > > + return crc; > > +} > > + > > +int main () > > +{ > > + uint32_t crc = 0x0D800D80; > > + for (uint8_t i = 0; i < 0xff; i++) > > + { > > + uint32_t res1 = _crc32_O0 (crc, i); > > + uint32_t res2 = _crc32 (crc, i); > > + if (res1 != res2) > > + abort (); > > + crc = res1; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */ > > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c > b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c > > new file mode 100644 > > index 00000000000..95ffde6a9d2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c > > @@ -0,0 +1,53 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish > -fdump-tree-crc-details" } */ > > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */ > > + > > +#include > > +#include > > + > > +__attribute__ ((noinline,optimize(0))) > > +uint32_t _crc32_O0 (uint32_t crc, uint8_t data) { > > + int i; > > + crc = crc ^ data; > > + > > + for (i = 0; i < 8; i++) { > > + if (crc & 1) > > + crc = (crc >> 1) ^ 0x82F63B78; > > + else > > + crc = (crc >> 1); > > + } > > + > > + return crc; > > +} > > + > > +uint32_t _crc32 (uint32_t crc, uint8_t data) { > > + int i; > > + crc = crc ^ data; > > + > > + for (i = 0; i < 8; i++) { > > + if (crc & 1) > > + crc = (crc >> 1) ^ 0x82F63B78; > > + else > > + crc = (crc >> 1); > > + } > > + > > + return crc; > > +} > > + > > +int main () > > +{ > > + uint32_t crc = 0x0D800D80; > > + for (uint8_t i = 0; i < 0xff; i++) > > + { > > + uint32_t res1 = _crc32_O0 (crc, i); > > + uint32_t res2 = _crc32 (crc, i); > > + if (res1 != res2) > > + abort (); > > + crc = res1; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */ > > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC > code." 0 "crc"} } */ > > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */ > > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */ >