From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 51552 invoked by alias); 21 Feb 2020 15:31:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 51257 invoked by uid 89); 21 Feb 2020 15:30:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=chip, selftest, andrea, Andrea X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 21 Feb 2020 15:30:47 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5C1B61FB; Fri, 21 Feb 2020 07:30:39 -0800 (PST) Received: from [10.2.80.62] (e120808-lin.cambridge.arm.com [10.2.80.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 90A6F3F703; Fri, 21 Feb 2020 07:30:38 -0800 (PST) Subject: Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops To: Andrea Corallo , "gcc-patches@gcc.gnu.org" Cc: Richard Earnshaw , Roman Zhuykov , nd References: <8468875e-934e-0bee-763d-91dd5ddbe7c9@arm.com> <4a28de0a-6790-732f-31bd-0e5bdfc12246@ispras.ru> From: Kyrill Tkachov Message-ID: Date: Fri, 21 Feb 2020 15:31:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2020-02/txt/msg01215.txt.bz2 Hi Andrea, On 2/19/20 1:01 PM, Andrea Corallo wrote: > Hi all, > > Second version of the patch here addressing comments. > > This patch enables the Armv8.1-M Mainline LOB (low overhead branch) > extension > low overhead loops (LOL) feature by using the 'loop-doloop' pass. > > Given the following function: > > void > loop (int *a) > { >   for (int i = 0; i < 1000; i++) >     a[i] = i; > } > > 'doloop_begin' and 'doloop_end' patterns translates into 'dls' and 'le' > giving: > >  loop: >          movw    r2, #10000 >          movs    r3, #0 >          subs    r0, r0, #4 >          push    {lr} >          dls     lr, r2 >  .L2: >          str     r3, [r0, #4]! >          adds    r3, r3, #1 >          le      lr, .L2 >          ldr     pc, [sp], #4 > > SMS is disabled in tests not to break them when SMS does loop versioning. > > bootstrapped arm-none-linux-gnueabihf, do not introduce testsuite > regressions. This should be aimed at GCC 11 at this point. Some comments inline... > > Andrea > > gcc/ChangeLog: > > 2020-??-??  Andrea Corallo  >             Mihail-Calin Ionescu >             Iain Apreotesei  > >         * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP): >         (arm_invalid_within_doloop): Implement invalid_within_doloop hook. >         * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro. >         * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn): >         Add new patterns. >         * config/arm/unspecs.md: Add new unspec. > > gcc/testsuite/ChangeLog: > > 2020-??-??  Andrea Corallo  >             Mihail-Calin Ionescu >             Iain Apreotesei  > >         * gcc.target/arm/lob.h: New header. >         * gcc.target/arm/lob1.c: New testcase. >         * gcc.target/arm/lob2.c: Likewise. >         * gcc.target/arm/lob3.c: Likewise. >         * gcc.target/arm/lob4.c: Likewise. >         * gcc.target/arm/lob5.c: Likewise. >         * gcc.target/arm/lob6.c: Likewise. > lol.patch diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e07cf03538c5..1269f40bd77c 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -586,6 +586,9 @@ extern int arm_arch_bf16; /* Target machine storage Layout. */ +/* Nonzero if this chip provides Armv8.1-M Mainline + LOB (low overhead branch features) extension instructions. */ +#define TARGET_HAVE_LOB (arm_arch8_1m_main) /* Define this macro if it is advisable to hold scalars in registers in a wider mode than that declared by the program. In such cases, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9cc7bc0e5621..7c2a7b7e9e97 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -833,6 +833,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment +#undef TARGET_INVALID_WITHIN_DOLOOP +#define TARGET_INVALID_WITHIN_DOLOOP arm_invalid_within_doloop + #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust @@ -32937,6 +32940,27 @@ arm_ge_bits_access (void) return true; } +/* NULL if INSN insn is valid within a low-overhead loop. + Otherwise return why doloop cannot be applied. */ + +static const char * +arm_invalid_within_doloop (const rtx_insn *insn) +{ + if (!TARGET_HAVE_LOB) + return default_invalid_within_doloop (insn); + + if (CALL_P (insn)) + return "Function call in the loop."; + + if (tablejump_p (insn, NULL, NULL) || computed_jump_p (insn)) + return "Computed branch in the loop."; + + if (reg_mentioned_p (gen_rtx_REG (SImode, LR_REGNUM), insn)) + return "LR is used inside loop."; + + return NULL; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index b0d3bd1cf1c4..4aff1a0838d8 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1555,8 +1555,11 @@ using a certain 'count' register and (2) the loop count can be adjusted by modifying this register prior to the loop. ??? The possible introduction of a new block to initialize the - new IV can potentially affect branch optimizations. */ - if (optimize > 0 && flag_modulo_sched) + new IV can potentially affect branch optimizations. + + Also used to implement the low over head loops feature, which is part of + the Armv8.1-M Mainline Low Overhead Branch (LOB) extension. */ + if (optimize > 0 && (flag_modulo_sched || TARGET_HAVE_LOB)) { rtx s0; rtx bcomp; @@ -1569,6 +1572,11 @@ FAIL; s0 = operands [0]; + + /* Low over head loop instructions require the first operand to be LR. */ + if (TARGET_HAVE_LOB) + s0 = gen_rtx_REG (SImode, LR_REGNUM); + if (TARGET_THUMB2) insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1))); else @@ -1650,3 +1658,30 @@ "TARGET_HAVE_MVE" "lsrl%?\\t%Q0, %R0, %1" [(set_attr "predicable" "yes")]) + +;; Originally expanded by 'doloop_end'. +(define_insn "doloop_end_internal" We usually prefer to name these patterns with a '*' in front to prevent the gen* machinery from generating gen_* unneeded expanders for them if they're not used. + [(parallel [(set (pc) + (if_then_else + (ne (reg:SI LR_REGNUM) (const_int 1)) + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:SI LR_REGNUM) + (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])] + "TARGET_32BIT && TARGET_HAVE_LOB" + "le\t%|lr, %l0") + +(define_expand "doloop_begin" + [(match_operand 0 "" "") + (match_operand 1 "" "")] + "TARGET_32BIT && TARGET_HAVE_LOB" + { + emit_insn (gen_dls_insn (operands[0])); + DONE; + }) + +(define_insn "dls_insn" + [(set (reg:SI LR_REGNUM) + (unspec:SI [(match_operand:SI 0 "s_register_operand" "r")] UNSPEC_DLS))] + "TARGET_32BIT && TARGET_HAVE_LOB" + "dls\t%|lr, %0") diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 8f4a705f43ef..df5ecb731925 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -154,6 +154,7 @@ UNSPEC_SMUADX ; Represent the SMUADX operation. UNSPEC_SSAT16 ; Represent the SSAT16 operation. UNSPEC_USAT16 ; Represent the USAT16 operation. + UNSPEC_DLS ; Used for DLS (Do Loop Start), Armv8.1-M Mainline instruction ]) diff --git a/gcc/testsuite/gcc.target/arm/lob.h b/gcc/testsuite/gcc.target/arm/lob.h new file mode 100644 index 000000000000..feaae7cc8995 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob.h @@ -0,0 +1,15 @@ +#include + +/* Common code for lob tests. */ + +#define NO_LOB asm volatile ("@ clobber lr" : : : "lr" ) + +#define N 10000 + +static void +reset_data (int *a, int *b, int *c) +{ + memset (a, -1, N * sizeof (*a)); + memset (b, -1, N * sizeof (*b)); + memset (c, -1, N * sizeof (*c)); +} diff --git a/gcc/testsuite/gcc.target/arm/lob1.c b/gcc/testsuite/gcc.target/arm/lob1.c new file mode 100644 index 000000000000..e4913519942f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob1.c @@ -0,0 +1,85 @@ +/* Check that GCC generates Armv8.1-M low over head loop instructions + for some simple loops. */ +/* { dg-do run } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-skip-if "do not run SMS to prevent loop versioning" { *-*-* } { "-fmodulo-sched" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */ We need to avoid running this test on targets that don't support LOB. This needs an appropriate effective target check (see the existing *_hw ones in lib/target-supports.exp) Thanks, Kyrill +#include +#include "lob.h" + +int a[N]; +int b[N]; +int c[N]; + +int +foo (int a, int b) +{ + return a + b; +} + +void __attribute__((noinline)) +loop1 (int *a, int *b, int *c) +{ + for (int i = 0; i < N; i++) + { + a[i] = i; + b[i] = i * 2; + c[i] = a[i] + b[i]; + } +} + +void __attribute__((noinline)) +loop2 (int *a, int *b, int *c) +{ + int i = 0; + while (i < N) + { + a[i] = i - 2; + b[i] = i * 5; + c[i] = a[i] + b[i]; + i++; + } +} + +void __attribute__((noinline)) +loop3 (int *a, int *b, int *c) +{ + int i = 0; + do + { + a[i] = i - 4; + b[i] = i * 3; + c[i] = a[i] + b[i]; + i++; + } while (i < N); +} + +void +check (int *a, int *b, int *c) +{ + for (int i = 0; i < N; i++) + { + NO_LOB; + if (c[i] != a[i] + b[i]) + abort (); + } +} + +int +main (void) +{ + reset_data (a, b, c); + loop1 (a, b ,c); + check (a, b ,c); + reset_data (a, b, c); + loop2 (a, b ,c); + check (a, b ,c); + reset_data (a, b, c); + loop3 (a, b ,c); + check (a, b ,c); + + return 0; +} + +/* { dg-final { scan-assembler-times {dls\s\S*,\s\S*} 3 } } */ +/* { dg-final { scan-assembler-times {le\slr,\s\S*} 3 } } */ diff --git a/gcc/testsuite/gcc.target/arm/lob2.c b/gcc/testsuite/gcc.target/arm/lob2.c new file mode 100644 index 000000000000..e81286694804 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob2.c @@ -0,0 +1,33 @@ +/* Check that GCC does not generate Armv8.1-M low over head loop instructions + if a non-inlineable function call takes place inside the loop. */ +/* { dg-do compile } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-skip-if "do not run SMS to prevent loop versioning" { *-*-* } { "-fmodulo-sched" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */ +#include +#include "lob.h" + +int a[N]; +int b[N]; +int c[N]; + +int __attribute__ ((noinline)) +foo (int a, int b) +{ + return a + b; +} + +int +main (void) +{ + for (int i = 0; i < N; i++) + { + a[i] = i; + b[i] = i * 2; + c[i] = foo (a[i], b[i]); + } + + return 0; +} +/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */ +/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */ diff --git a/gcc/testsuite/gcc.target/arm/lob3.c b/gcc/testsuite/gcc.target/arm/lob3.c new file mode 100644 index 000000000000..69d22b2f023a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob3.c @@ -0,0 +1,28 @@ +/* Check that GCC does not generate Armv8.1-M low over head loop instructions + if causes VFP emulation library calls to happen inside the loop. */ +/* { dg-do compile } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-skip-if "do not run SMS to prevent loop versioning" { *-*-* } { "-fmodulo-sched" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --save-temps -mfloat-abi=soft" } */ +/* { dg-require-effective-target arm_softfloat } */ +#include +#include "lob.h" + +double a[N]; +double b[N]; +double c[N]; + +int +main (void) +{ + for (int i = 0; i < N; i++) + { + a[i] = i; + b[i] = i * 2; + c[i] = a[i] + b[i]; + } + + return 0; +} +/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */ +/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */ diff --git a/gcc/testsuite/gcc.target/arm/lob4.c b/gcc/testsuite/gcc.target/arm/lob4.c new file mode 100644 index 000000000000..62be52e31007 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob4.c @@ -0,0 +1,35 @@ +/* Check that GCC does not generate Armv8.1-M low over head loop instructions + if LR is modified within the loop. */ +/* { dg-do compile } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-skip-if "do not run SMS to prevent loop versioning" { *-*-* } { "-fmodulo-sched" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --save-temps -mfloat-abi=soft" } */ +/* { dg-require-effective-target arm_softfloat } */ +#include +#include "lob.h" + +int a[N]; +int b[N]; +int c[N]; + +static __attribute__ ((always_inline)) inline int +foo (int a, int b) +{ + NO_LOB; + return a + b; +} + +int +main (void) +{ + for (int i = 0; i < N; i++) + { + a[i] = i; + b[i] = i * 2; + c[i] = foo(a[i], b[i]); + } + + return 0; +} +/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */ +/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */ diff --git a/gcc/testsuite/gcc.target/arm/lob5.c b/gcc/testsuite/gcc.target/arm/lob5.c new file mode 100644 index 000000000000..ad8a1b961e40 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob5.c @@ -0,0 +1,36 @@ +/* Check that GCC does not generates Armv8.1-M low over head loop + instructions. Innermost loop has no fixed number of iterations + therefore is not optimizable. Outer loops are not optimized. */ +/* { dg-do compile } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-skip-if "do not run SMS to prevent loop versioning" { *-*-* } { "-fmodulo-sched" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */ +#include +#include "lob.h" + +int a[N]; +int b[N]; +int c[N]; + +int +main (void) +{ + for (int i = 0; i < N; i++) + { + a[i] = i; + b[i] = i * 2; + + int k = b[i]; + while (k != 0) + { + if (k % 2 == 0) + c[i - 1] = k % 2; + k /= 2; + } + c[i] = a[i] - b[i]; + } + + return 0; +} +/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */ +/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */ diff --git a/gcc/testsuite/gcc.target/arm/lob6.c b/gcc/testsuite/gcc.target/arm/lob6.c new file mode 100644 index 000000000000..1dbcaff1670d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lob6.c @@ -0,0 +1,97 @@ +/* Check that GCC generates Armv8.1-M low over head loop instructions + with some less trivial loops and the result is correct. */ +/* { dg-do run } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-skip-if "do not run SMS to prevent loop versioning" { *-*-* } { "-fmodulo-sched" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */ +#include +#include "lob.h" + +#define TEST_CODE1 \ + { \ + for (int i = 0; i < N; i++) \ + { \ + a[i] = i; \ + b[i] = i * 2; \ + \ + for (int k = 0; k < N; k++) \ + { \ + MAYBE_LOB; \ + c[k] = k / 2; \ + } \ + c[i] = a[i] - b[i]; \ + } \ + } + +#define TEST_CODE2 \ + { \ + for (int i = 0; i < N / 2; i++) \ + { \ + MAYBE_LOB; \ + if (c[i] % 2 == 0) \ + break; \ + a[i]++; \ + b[i]++; \ + } \ + } + +int a1[N]; +int b1[N]; +int c1[N]; + +int a2[N]; +int b2[N]; +int c2[N]; + +#define MAYBE_LOB +void __attribute__((noinline)) +loop1 (int *a, int *b, int *c) + TEST_CODE1; + +void __attribute__((noinline)) +loop2 (int *a, int *b, int *c) + TEST_CODE2; + +#undef MAYBE_LOB +#define MAYBE_LOB NO_LOB + +void +ref1 (int *a, int *b, int *c) + TEST_CODE1; + +void +ref2 (int *a, int *b, int *c) + TEST_CODE2; + +void +check (void) +{ + for (int i = 0; i < N; i++) + { + NO_LOB; + if (a1[i] != a2[i] + && b1[i] != b2[i] + && c1[i] != c2[i]) + abort (); + } +} + +int +main (void) +{ + reset_data (a1, b1, c1); + reset_data (a2, b2, c2); + loop1 (a1, b1, c1); + ref1 (a2, b2, c2); + check (); + + reset_data (a1, b1, c1); + reset_data (a2, b2, c2); + loop2 (a1, b1, c1); + ref2 (a2, b2, c2); + check (); + + return 0; +} +/* { dg-final { scan-assembler-times {dls\s\S*,\s\S*} 2 } } */ +/* { dg-final { scan-assembler-times {le\slr,\s\S*} 2 } } */