public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] [ARM] Post-indexed addressing for NEON memory access
@ 2014-06-02 16:47 Charles Baylis
  2014-06-05  6:27 ` Ramana Radhakrishnan
  2014-06-18 10:01 ` Ramana Radhakrishnan
  0 siblings, 2 replies; 9+ messages in thread
From: Charles Baylis @ 2014-06-02 16:47 UTC (permalink / raw)
  To: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

This patch adds support for post-indexed addressing for NEON structure
memory accesses.

For example VLD1.8 {d0}, [r0], r1


Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.

Ok for trunk?


gcc/Changelog:

2014-06-02  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/arm.c (neon_vector_mem_operand): Allow register
        POST_MODIFY for neon loads and stores.
        (arm_print_operand): Output post-index register for neon loads and
        stores.

[-- Attachment #2: 0001-post-indexed-addressing-for-vld-vst.patch --]
[-- Type: text/x-patch, Size: 1979 bytes --]

From a8e0bdbceab00d5e5b655611965d3975ba74365c Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Tue, 6 May 2014 15:23:46 +0100
Subject: [PATCH] post-indexed addressing for vld/vst

2014-05-09  Charles Baylis  <charles.baylis@linaro.org>

	* config/arm/arm.c (neon_vector_mem_operand): Allow register
	POST_MODIFY for neon loads and stores.
	(arm_print_operand): Output post-index register for neon loads and
	stores.
---
 gcc/config/arm/arm.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1117bd4..6ab02ef 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12786,7 +12786,11 @@ neon_vector_mem_operand (rtx op, int type, bool strict)
       || (type == 0 && GET_CODE (ind) == PRE_DEC))
     return arm_address_register_rtx_p (XEXP (ind, 0), 0);
 
-  /* FIXME: vld1 allows register post-modify.  */
+  /* Allow post-increment by register for VLDn */
+  if (type == 2 && GET_CODE (ind) == POST_MODIFY
+      && GET_CODE (XEXP (ind, 1)) == PLUS
+      && REG_P (XEXP (XEXP (ind, 1), 1)))
+     return true;
 
   /* Match:
      (plus (reg)
@@ -21816,6 +21820,7 @@ arm_print_operand (FILE *stream, rtx x, int code)
       {
 	rtx addr;
 	bool postinc = FALSE;
+	rtx postinc_reg = NULL;
 	unsigned align, memsize, align_bits;
 
 	gcc_assert (MEM_P (x));
@@ -21825,6 +21830,11 @@ arm_print_operand (FILE *stream, rtx x, int code)
 	    postinc = 1;
 	    addr = XEXP (addr, 0);
 	  }
+	if (GET_CODE (addr) == POST_MODIFY)
+	  {
+	    postinc_reg = XEXP( XEXP (addr, 1), 1);
+	    addr = XEXP (addr, 0);
+	  }
 	asm_fprintf (stream, "[%r", REGNO (addr));
 
 	/* We know the alignment of this access, so we can emit a hint in the
@@ -21850,6 +21860,8 @@ arm_print_operand (FILE *stream, rtx x, int code)
 
 	if (postinc)
 	  fputs("!", stream);
+	if (postinc_reg)
+	  asm_fprintf (stream, ", %r", REGNO (postinc_reg));
       }
       return;
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-02 16:47 [PATCH] [ARM] Post-indexed addressing for NEON memory access Charles Baylis
@ 2014-06-05  6:27 ` Ramana Radhakrishnan
  2014-06-17 15:04   ` Charles Baylis
  2014-06-18 10:01 ` Ramana Radhakrishnan
  1 sibling, 1 reply; 9+ messages in thread
From: Ramana Radhakrishnan @ 2014-06-05  6:27 UTC (permalink / raw)
  To: Charles Baylis; +Cc: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> This patch adds support for post-indexed addressing for NEON structure
> memory accesses.
>
> For example VLD1.8 {d0}, [r0], r1
>
>
> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>
> Ok for trunk?

This looks like a reasonable start but this work doesn't look complete
to me yet.

Can you also look at the impact on performance of a range of
benchmarks especially a popular embedded one to see how this behaves
unless you have already done so ?

POST_INC, POST_MODIFY usually have a funny way of biting you with
either ivopts or the way in which address costs work. I think there
maybe further tweaks needed but for a first step I'd like to know what
the performance impact is.

I would also suggest running this through clyon's neon intrinsics
testsuite to see if that catches any issues especially with the large
vector modes.

regards
Ramana

>
>
> gcc/Changelog:
>
> 2014-06-02  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/arm.c (neon_vector_mem_operand): Allow register
>         POST_MODIFY for neon loads and stores.
>         (arm_print_operand): Output post-index register for neon loads and
>         stores.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-05  6:27 ` Ramana Radhakrishnan
@ 2014-06-17 15:04   ` Charles Baylis
  2014-06-18 10:06     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 9+ messages in thread
From: Charles Baylis @ 2014-06-17 15:04 UTC (permalink / raw)
  To: ramrad01; +Cc: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
> <charles.baylis@linaro.org> wrote:
>> This patch adds support for post-indexed addressing for NEON structure
>> memory accesses.
>>
>> For example VLD1.8 {d0}, [r0], r1
>>
>>
>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>
>> Ok for trunk?
>
> This looks like a reasonable start but this work doesn't look complete
> to me yet.
>
> Can you also look at the impact on performance of a range of
> benchmarks especially a popular embedded one to see how this behaves
> unless you have already done so ?

I ran a popular suite of embedded benchmarks, and there is no impact
at all on Chromebook (including with the additional attached patch)

The patch was developed to address a performance issue with a new
version of libvpx which uses intrinsics instead of NEON assembler. The
patch results in a 3% improvement for VP8 decode.

> POST_INC, POST_MODIFY usually have a funny way of biting you with
> either ivopts or the way in which address costs work. I think there
> maybe further tweaks needed but for a first step I'd like to know what
> the performance impact is.

> I would also suggest running this through clyon's neon intrinsics
> testsuite to see if that catches any issues especially with the large
> vector modes.

No issues found in clyon's tests.

Your mention of larger vector modes prompted me to check that the
patch has the desired result with them. In fact, the costs are
estimated incorrectly which means the post_modify pattern is not used.
The attached patch fixes that. (used in combination with my original
patch)


2014-06-15  Charles Baylis  <charles.bayls@linaro.org>

        * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
        embedded side effects.

[-- Attachment #2: 0002-Adjust-costs-for-mem-with-post_modify.patch --]
[-- Type: application/x-download, Size: 1216 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-02 16:47 [PATCH] [ARM] Post-indexed addressing for NEON memory access Charles Baylis
  2014-06-05  6:27 ` Ramana Radhakrishnan
@ 2014-06-18 10:01 ` Ramana Radhakrishnan
  2014-06-18 13:49   ` Charles Baylis
  1 sibling, 1 reply; 9+ messages in thread
From: Ramana Radhakrishnan @ 2014-06-18 10:01 UTC (permalink / raw)
  To: Charles Baylis; +Cc: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> This patch adds support for post-indexed addressing for NEON structure
> memory accesses.
>
> For example VLD1.8 {d0}, [r0], r1
>
>
> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>
> Ok for trunk?

This is OK.

Ramana
>
>
> gcc/Changelog:
>
> 2014-06-02  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/arm.c (neon_vector_mem_operand): Allow register
>         POST_MODIFY for neon loads and stores.
>         (arm_print_operand): Output post-index register for neon loads and
>         stores.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-17 15:04   ` Charles Baylis
@ 2014-06-18 10:06     ` Ramana Radhakrishnan
  2014-06-18 14:31       ` Charles Baylis
  2015-06-19 18:37       ` Charles Baylis
  0 siblings, 2 replies; 9+ messages in thread
From: Ramana Radhakrishnan @ 2014-06-18 10:06 UTC (permalink / raw)
  To: Charles Baylis; +Cc: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
>> <charles.baylis@linaro.org> wrote:
>>> This patch adds support for post-indexed addressing for NEON structure
>>> memory accesses.
>>>
>>> For example VLD1.8 {d0}, [r0], r1
>>>
>>>
>>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>>
>>> Ok for trunk?
>>
>> This looks like a reasonable start but this work doesn't look complete
>> to me yet.
>>
>> Can you also look at the impact on performance of a range of
>> benchmarks especially a popular embedded one to see how this behaves
>> unless you have already done so ?
>
> I ran a popular suite of embedded benchmarks, and there is no impact
> at all on Chromebook (including with the additional attached patch)

Thanks for the due diligence

>
> The patch was developed to address a performance issue with a new
> version of libvpx which uses intrinsics instead of NEON assembler. The
> patch results in a 3% improvement for VP8 decode.

Good - 3% not to be sneezed at.

>
>> POST_INC, POST_MODIFY usually have a funny way of biting you with
>> either ivopts or the way in which address costs work. I think there
>> maybe further tweaks needed but for a first step I'd like to know what
>> the performance impact is.
>
>> I would also suggest running this through clyon's neon intrinsics
>> testsuite to see if that catches any issues especially with the large
>> vector modes.

Thanks.

>
> No issues found in clyon's tests.

Please keep an eye out for any regressions.

>
> Your mention of larger vector modes prompted me to check that the
> patch has the desired result with them. In fact, the costs are
> estimated incorrectly which means the post_modify pattern is not used.
> The attached patch fixes that. (used in combination with my original
> patch)
>
>
> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>
>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>         embedded side effects.

I'm not too thrilled with putting in more special cases that are not
table driven in there. Can you file a PR with some testcases that show
this so that we don't forget and CC me on it please ?


Ramana

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-18 10:01 ` Ramana Radhakrishnan
@ 2014-06-18 13:49   ` Charles Baylis
  0 siblings, 0 replies; 9+ messages in thread
From: Charles Baylis @ 2014-06-18 13:49 UTC (permalink / raw)
  To: ramrad01; +Cc: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

On 18 June 2014 11:01, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
> <charles.baylis@linaro.org> wrote:
>> This patch adds support for post-indexed addressing for NEON structure
>> memory accesses.
>>
>> For example VLD1.8 {d0}, [r0], r1
>>
>>
>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>
>> Ok for trunk?
>
> This is OK.

Committed as r211783.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-18 10:06     ` Ramana Radhakrishnan
@ 2014-06-18 14:31       ` Charles Baylis
  2015-06-19 18:37       ` Charles Baylis
  1 sibling, 0 replies; 9+ messages in thread
From: Charles Baylis @ 2014-06-18 14:31 UTC (permalink / raw)
  To: ramrad01; +Cc: GCC Patches, Richard Earnshaw, Ramana Radhakrishnan

On 18 June 2014 11:06, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>>
>>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>>         embedded side effects.
>
> I'm not too thrilled with putting in more special cases that are not
> table driven in there. Can you file a PR with some testcases that show
> this so that we don't forget and CC me on it please ?

I created PR61551 and CC'd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2014-06-18 10:06     ` Ramana Radhakrishnan
  2014-06-18 14:31       ` Charles Baylis
@ 2015-06-19 18:37       ` Charles Baylis
  2015-06-23 16:41         ` Ramana Radhakrishnan
  1 sibling, 1 reply; 9+ messages in thread
From: Charles Baylis @ 2015-06-19 18:37 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Kyrill Tkachov; +Cc: GCC Patches

On 18 June 2014 at 11:06, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
> <charles.baylis@linaro.org> wrote:
>> Your mention of larger vector modes prompted me to check that the
>> patch has the desired result with them. In fact, the costs are
>> estimated incorrectly which means the post_modify pattern is not used.
>> The attached patch fixes that. (used in combination with my original
>> patch)
>>
>>
>> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>>
>>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>>         embedded side effects.
>
> I'm not too thrilled with putting in more special cases that are not
> table driven in there. Can you file a PR with some testcases that show
> this so that we don't forget and CC me on it please ?

I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 at the time.

I've come back to look at this again and would like to fix it in this
release cycle. I still don't really understand what you mean by
table-driven in this context. Do you still hold this view, and if so,
could you describe what you'd like to see instead of this patch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
  2015-06-19 18:37       ` Charles Baylis
@ 2015-06-23 16:41         ` Ramana Radhakrishnan
  0 siblings, 0 replies; 9+ messages in thread
From: Ramana Radhakrishnan @ 2015-06-23 16:41 UTC (permalink / raw)
  To: Charles Baylis; +Cc: Ramana Radhakrishnan, Kyrill Tkachov, GCC Patches

On Fri, Jun 19, 2015 at 7:04 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> On 18 June 2014 at 11:06, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
>> <charles.baylis@linaro.org> wrote:
>>> Your mention of larger vector modes prompted me to check that the
>>> patch has the desired result with them. In fact, the costs are
>>> estimated incorrectly which means the post_modify pattern is not used.
>>> The attached patch fixes that. (used in combination with my original
>>> patch)
>>>
>>>
>>> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>>>
>>>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>>>         embedded side effects.
>>
>> I'm not too thrilled with putting in more special cases that are not
>> table driven in there. Can you file a PR with some testcases that show
>> this so that we don't forget and CC me on it please ?
>
> I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 at the time.
>
> I've come back to look at this again and would like to fix it in this
> release cycle. I still don't really understand what you mean by
> table-driven in this context. Do you still hold this view, and if so,
> could you describe what you'd like to see instead of this patch?

By table-driven I mean something similar to the existing rtx_costs
infrastructure with a walker and a data structure holding costs for
each addressing mode which are tweakable on a per-core basis.

Thus for example I'd expect this to be on a per access mode basis
along with different costs with respect to post_inc, reg indirect,
pre_inc etc. that were in a data structure and then a worker function
that peeled rtx's to obtain the appropriate cost from said
data-structure.



Ramana

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-06-23 16:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-02 16:47 [PATCH] [ARM] Post-indexed addressing for NEON memory access Charles Baylis
2014-06-05  6:27 ` Ramana Radhakrishnan
2014-06-17 15:04   ` Charles Baylis
2014-06-18 10:06     ` Ramana Radhakrishnan
2014-06-18 14:31       ` Charles Baylis
2015-06-19 18:37       ` Charles Baylis
2015-06-23 16:41         ` Ramana Radhakrishnan
2014-06-18 10:01 ` Ramana Radhakrishnan
2014-06-18 13:49   ` Charles Baylis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).