public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ARM] PR/66433: Reduce cost of memory instructions with autoincrement
@ 2015-06-16 14:11 Yury Usishchev
  2015-06-17  9:39 ` Kyrill Tkachov
  0 siblings, 1 reply; 2+ messages in thread
From: Yury Usishchev @ 2015-06-16 14:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vyacheslav Barinov

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

Hello!

Following patch fixes PR target/66433.

As described in PR, cost of memory operation with autoincrement is
considered to be greater than same operation without autoincrement. This
causes auto-inc-dec pass not to optimize vector memory operations like
vld and vst.

Bootstrapped and regtested on armv7l-linux-gnueabi on trunk.
OK for trunk?

--
BR,
Yury Usishchev


[-- Attachment #2: pr_66433.CL --]
[-- Type: text/plain, Size: 303 bytes --]


gcc/
2015-06-16  Yury Usishchev  <y.usishchev@samsung.com>

	PR target/66433
	* config/arm/arm.c (arm_new_rtx_costs): Reduce cost of memory instructions
	with autoincrement.

gcc/testsuite/
2015-06-16  Yury Usishchev  <y.usishchev@samsung.com>

	PR target/66433
	* gcc.target/arm/pr66433.c: New test.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: pr_66433.patch --]
[-- Type: text/x-diff, Size: 1425 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f5050cb..a8dc0ed 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9444,7 +9444,9 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
     case MEM:
       /* A memory access costs 1 insn if the mode is small, or the address is
 	 a single register, otherwise it costs one insn per word.  */
-      if (REG_P (XEXP (x, 0)))
+      if (REG_P (XEXP (x, 0))
+	  || (GET_RTX_CLASS (GET_CODE (XEXP (x, 0))) == RTX_AUTOINC
+	      && REG_P (XEXP (XEXP (x, 0), 0))))
 	*cost = COSTS_N_INSNS (1);
       else if (flag_pic
 	       && GET_CODE (XEXP (x, 0)) == PLUS
diff --git a/gcc/testsuite/gcc.target/arm/pr66433.c b/gcc/testsuite/gcc.target/arm/pr66433.c
new file mode 100644
index 0000000..22ba158
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr66433.c
@@ -0,0 +1,21 @@
+/* Test the optimization of `vld*' ARM NEON intrinsic with autoincrement. */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+void test_vld_autoinc (uint32_t *__restrict__ a, uint32_t *__restrict__ b)
+{
+  int i;
+  for(i = 0; i < 1000000; i++) {
+    vst1q_u32 (b, vld1q_u32 (a));
+    a += 4;
+    b += 4;
+  }
+}
+
+/* { dg-final { scan-assembler "vld1\.32.*!.*\n" } } */
+/* { dg-final { scan-assembler "vst1\.32.*!.*\n" } } */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH][ARM] PR/66433: Reduce cost of memory instructions with autoincrement
  2015-06-16 14:11 [PATCH][ARM] PR/66433: Reduce cost of memory instructions with autoincrement Yury Usishchev
@ 2015-06-17  9:39 ` Kyrill Tkachov
  0 siblings, 0 replies; 2+ messages in thread
From: Kyrill Tkachov @ 2015-06-17  9:39 UTC (permalink / raw)
  To: Yury Usishchev, gcc-patches
  Cc: Vyacheslav Barinov, Ramana Radhakrishnan, Richard Earnshaw

Hi Yury [cc'ing the ARM maintainers]

On 16/06/15 15:04, Yury Usishchev wrote:
> Hello!
>
> Following patch fixes PR target/66433.
>
> As described in PR, cost of memory operation with autoincrement is
> considered to be greater than same operation without autoincrement. This
> causes auto-inc-dec pass not to optimize vector memory operations like
> vld and vst.

The autoincrement form may not always be as cheap as a
simple memory op, since it does involve an implicit addition
operation.

I've tried out your patch and I do see the autoincrement forms
being used more aggressively. Do you have any benchmark data
for making this change?


>
> Bootstrapped and regtested on armv7l-linux-gnueabi on trunk.
> OK for trunk?

      case MEM:
        /* A memory access costs 1 insn if the mode is small, or the address is
  	 a single register, otherwise it costs one insn per word.  */
-      if (REG_P (XEXP (x, 0)))
+      if (REG_P (XEXP (x, 0))
+	  || (GET_RTX_CLASS (GET_CODE (XEXP (x, 0))) == RTX_AUTOINC
+	      && REG_P (XEXP (XEXP (x, 0), 0))))
  	*cost = COSTS_N_INSNS (1);
        else if (flag_pic
  	       && GET_CODE (XEXP (x, 0)) == PLUS


I would have hoped that auto-inc-dec.c would be using address costs rather than rtx costs
here, but I don't think it's well defined who is responsible for choosing preferences between
these autoinc ops :(
I note that in our arm_arm_address_cost we already consider the autoinc modes to be cheap.

One situation that we want to avoid is for non-NEON memory ops sequences of the form:
ldr ra, [rn, #4]
ldr rb, [rn, #8]
ldr rc, [rn, #12]
add rn, rn, #16

being transformed into:
ldr ra, [rn]!
ldr rb, [rn]!
ldr rc, [rn]!

So I think at least for non-vector/FP modes where we can use offsets we should consider
autoinc ops to be slightly more expensive (COSTS_N_INSNS (2) instead of COSTS_N_INSNS (1)).

But when optimising for size, we should prefer the autoinc forms since they can save us on
add/sub instructions.

Thanks,
Kyrill

>
> --
> BR,
> Yury Usishchev
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-06-17  9:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-16 14:11 [PATCH][ARM] PR/66433: Reduce cost of memory instructions with autoincrement Yury Usishchev
2015-06-17  9:39 ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).