* [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
@ 2016-07-13 16:15 Kyrill Tkachov
2016-07-13 20:10 ` Evandro Menezes
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Kyrill Tkachov @ 2016-07-13 16:15 UTC (permalink / raw)
To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh
[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]
Hi all,
The most common way to load and store TImode value in aarch64 is to perform an LDP/STP of two X-registers.
This is the *movti_aarch64 pattern in aarch64.md.
There is a bug in the logic in aarch64_classify_address where it validates the offset in the address used
to load a TImode value. It passes down TImode to the aarch64_offset_7bit_signed_scaled_p check which rejects
offsets that are not a multiple of the mode size of TImode (16). However, this is too conservative as X-reg LDP/STP
instructions accept immediate offsets that are a multiple of 8.
Also, considering that the definition of aarch64_offset_7bit_signed_scaled_p is:
return (offset >= -64 * GET_MODE_SIZE (mode)
&& offset < 64 * GET_MODE_SIZE (mode)
&& offset % GET_MODE_SIZE (mode) == 0);
I think the range check may even be wrong for TImode as this will accept offsets in the range [-1024, 1024)
(as long as they are a multiple of 16)
whereas X-reg LDP/STP instructions only accept offsets in the range [-512, 512).
So since the check is for an X-reg LDP/STP address we should be passing down DImode.
This patch does that and enables more aggressive generation of REG+IMM addressing modes for 64-bit aligned
TImode values, eliminating many address calculation instructions.
For the testcase in the patch we currently generate:
bar:
add x1, x1, 8
add x0, x0, 8
ldp x2, x3, [x1]
stp x2, x3, [x0]
ret
whereas with this patch we generate:
bar:
ldp x2, x3, [x1, 8]
stp x2, x3, [x0, 8]
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
2016-07-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
addresses.
2016-07-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
[-- Attachment #2: aarch64-timode-addr.patch --]
[-- Type: text/x-patch, Size: 1803 bytes --]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info *info,
X,X: 7-bit signed scaled offset
Q: 9-bit signed offset
We conservatively require an offset representable in either mode.
- */
+ When performing the check for pairs of X registers i.e. LDP/STP
+ pass down DImode since that is the natural size of the LDP/STP
+ instruction memory accesses. */
if (mode == TImode || mode == TFmode)
- return (aarch64_offset_7bit_signed_scaled_p (mode, offset)
+ return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
&& offset_9bit_signed_unscaled_p (mode, offset));
/* A 7bit offset check because OImode will emit a ldp/stp
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
@@ -0,0 +1,20 @@
+/* { dg-options "-O2" } */
+
+/* Check that we can use a REG + IMM addressing mode when moving an unaligned
+ TImode value to and from memory. */
+
+struct foo
+{
+ long long b;
+ __int128 a;
+} __attribute__ ((packed));
+
+void
+bar (struct foo *p, struct foo *q)
+{
+ p->a = q->a;
+}
+
+/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
+/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
2016-07-13 16:15 [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP Kyrill Tkachov
@ 2016-07-13 20:10 ` Evandro Menezes
2016-08-01 10:02 ` Kyrill Tkachov
2016-08-01 10:10 ` Richard Earnshaw (lists)
2 siblings, 0 replies; 4+ messages in thread
From: Evandro Menezes @ 2016-07-13 20:10 UTC (permalink / raw)
To: Kyrill Tkachov, GCC Patches
Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh
On 07/13/16 11:14, Kyrill Tkachov wrote:
> Hi all,
>
> The most common way to load and store TImode value in aarch64 is to
> perform an LDP/STP of two X-registers.
> This is the *movti_aarch64 pattern in aarch64.md.
> There is a bug in the logic in aarch64_classify_address where it
> validates the offset in the address used
> to load a TImode value. It passes down TImode to the
> aarch64_offset_7bit_signed_scaled_p check which rejects
> offsets that are not a multiple of the mode size of TImode (16).
> However, this is too conservative as X-reg LDP/STP
> instructions accept immediate offsets that are a multiple of 8.
>
> Also, considering that the definition of
> aarch64_offset_7bit_signed_scaled_p is:
> return (offset >= -64 * GET_MODE_SIZE (mode)
> && offset < 64 * GET_MODE_SIZE (mode)
> && offset % GET_MODE_SIZE (mode) == 0);
>
> I think the range check may even be wrong for TImode as this will
> accept offsets in the range [-1024, 1024)
> (as long as they are a multiple of 16)
> whereas X-reg LDP/STP instructions only accept offsets in the range
> [-512, 512).
> So since the check is for an X-reg LDP/STP address we should be
> passing down DImode.
>
> This patch does that and enables more aggressive generation of REG+IMM
> addressing modes for 64-bit aligned
> TImode values, eliminating many address calculation instructions.
> For the testcase in the patch we currently generate:
> bar:
> add x1, x1, 8
> add x0, x0, 8
> ldp x2, x3, [x1]
> stp x2, x3, [x0]
> ret
>
> whereas with this patch we generate:
> bar:
> ldp x2, x3, [x1, 8]
> stp x2, x3, [x0, 8]
> ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
LGTM
--
Evandro Menezes
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
2016-07-13 16:15 [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP Kyrill Tkachov
2016-07-13 20:10 ` Evandro Menezes
@ 2016-08-01 10:02 ` Kyrill Tkachov
2016-08-01 10:10 ` Richard Earnshaw (lists)
2 siblings, 0 replies; 4+ messages in thread
From: Kyrill Tkachov @ 2016-08-01 10:02 UTC (permalink / raw)
To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh
Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00737.html
Thanks,
Kyrill
On 13/07/16 17:14, Kyrill Tkachov wrote:
> Hi all,
>
> The most common way to load and store TImode value in aarch64 is to perform an LDP/STP of two X-registers.
> This is the *movti_aarch64 pattern in aarch64.md.
> There is a bug in the logic in aarch64_classify_address where it validates the offset in the address used
> to load a TImode value. It passes down TImode to the aarch64_offset_7bit_signed_scaled_p check which rejects
> offsets that are not a multiple of the mode size of TImode (16). However, this is too conservative as X-reg LDP/STP
> instructions accept immediate offsets that are a multiple of 8.
>
> Also, considering that the definition of aarch64_offset_7bit_signed_scaled_p is:
> return (offset >= -64 * GET_MODE_SIZE (mode)
> && offset < 64 * GET_MODE_SIZE (mode)
> && offset % GET_MODE_SIZE (mode) == 0);
>
> I think the range check may even be wrong for TImode as this will accept offsets in the range [-1024, 1024)
> (as long as they are a multiple of 16)
> whereas X-reg LDP/STP instructions only accept offsets in the range [-512, 512).
> So since the check is for an X-reg LDP/STP address we should be passing down DImode.
>
> This patch does that and enables more aggressive generation of REG+IMM addressing modes for 64-bit aligned
> TImode values, eliminating many address calculation instructions.
> For the testcase in the patch we currently generate:
> bar:
> add x1, x1, 8
> add x0, x0, 8
> ldp x2, x3, [x1]
> stp x2, x3, [x0]
> ret
>
> whereas with this patch we generate:
> bar:
> ldp x2, x3, [x1, 8]
> stp x2, x3, [x0, 8]
> ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2016-07-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
> performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
> addresses.
>
> 2016-07-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
2016-07-13 16:15 [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP Kyrill Tkachov
2016-07-13 20:10 ` Evandro Menezes
2016-08-01 10:02 ` Kyrill Tkachov
@ 2016-08-01 10:10 ` Richard Earnshaw (lists)
2 siblings, 0 replies; 4+ messages in thread
From: Richard Earnshaw (lists) @ 2016-08-01 10:10 UTC (permalink / raw)
To: Kyrill Tkachov, GCC Patches; +Cc: Marcus Shawcroft, James Greenhalgh
On 13/07/16 17:14, Kyrill Tkachov wrote:
> Hi all,
>
> The most common way to load and store TImode value in aarch64 is to
> perform an LDP/STP of two X-registers.
> This is the *movti_aarch64 pattern in aarch64.md.
> There is a bug in the logic in aarch64_classify_address where it
> validates the offset in the address used
> to load a TImode value. It passes down TImode to the
> aarch64_offset_7bit_signed_scaled_p check which rejects
> offsets that are not a multiple of the mode size of TImode (16).
> However, this is too conservative as X-reg LDP/STP
> instructions accept immediate offsets that are a multiple of 8.
>
> Also, considering that the definition of
> aarch64_offset_7bit_signed_scaled_p is:
> return (offset >= -64 * GET_MODE_SIZE (mode)
> && offset < 64 * GET_MODE_SIZE (mode)
> && offset % GET_MODE_SIZE (mode) == 0);
>
> I think the range check may even be wrong for TImode as this will accept
> offsets in the range [-1024, 1024)
> (as long as they are a multiple of 16)
> whereas X-reg LDP/STP instructions only accept offsets in the range
> [-512, 512).
> So since the check is for an X-reg LDP/STP address we should be passing
> down DImode.
>
> This patch does that and enables more aggressive generation of REG+IMM
> addressing modes for 64-bit aligned
> TImode values, eliminating many address calculation instructions.
> For the testcase in the patch we currently generate:
> bar:
> add x1, x1, 8
> add x0, x0, 8
> ldp x2, x3, [x1]
> stp x2, x3, [x0]
> ret
>
> whereas with this patch we generate:
> bar:
> ldp x2, x3, [x1, 8]
> stp x2, x3, [x0, 8]
> ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
OK.
R.
>
> Thanks,
> Kyrill
>
> 2016-07-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
> performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
> addresses.
>
> 2016-07-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
>
> aarch64-timode-addr.patch
>
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info *info,
> X,X: 7-bit signed scaled offset
> Q: 9-bit signed offset
> We conservatively require an offset representable in either mode.
> - */
> + When performing the check for pairs of X registers i.e. LDP/STP
> + pass down DImode since that is the natural size of the LDP/STP
> + instruction memory accesses. */
> if (mode == TImode || mode == TFmode)
> - return (aarch64_offset_7bit_signed_scaled_p (mode, offset)
> + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
> && offset_9bit_signed_unscaled_p (mode, offset));
>
> /* A 7bit offset check because OImode will emit a ldp/stp
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
> @@ -0,0 +1,20 @@
> +/* { dg-options "-O2" } */
> +
> +/* Check that we can use a REG + IMM addressing mode when moving an unaligned
> + TImode value to and from memory. */
> +
> +struct foo
> +{
> + long long b;
> + __int128 a;
> +} __attribute__ ((packed));
> +
> +void
> +bar (struct foo *p, struct foo *q)
> +{
> + p->a = q->a;
> +}
> +
> +/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */
> +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
> +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-08-01 10:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13 16:15 [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP Kyrill Tkachov
2016-07-13 20:10 ` Evandro Menezes
2016-08-01 10:02 ` Kyrill Tkachov
2016-08-01 10:10 ` Richard Earnshaw (lists)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).