public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Andrew Pinski <pinskia@gmail.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH/AARCH64] Improve ThunderX code generation slightly with load/store pair
Date: Mon, 08 Aug 2016 17:48:00 -0000	[thread overview]
Message-ID: <CA+=Sn1mn=G2L0L49-rTSMyOLrMDuJdmsGhFCzH-DyXz6Sh4JWg@mail.gmail.com> (raw)
In-Reply-To: <CA+=Sn1mTJPVOke20gDmH=RrjEA-qvdg=b+CgSJReoi3rvzM25g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1855 bytes --]

On Fri, Aug 5, 2016 at 12:18 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> Hi,
>   On ThunderX, load (and store) pair that does a pair of two word
> (32bits) load/stores is slower in some cases than doing two
> load/stores.  For some internal benchmarks, it provides a 2-5%
> improvement.
>
> This patch disables the forming of the load/store pairs for SImode if
> we are tuning for ThunderX.  I used the tuning flags route so it can
> be overridden if needed later on or if someone else wants to use the
> same method for their core.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.


Here is a new version based on feedback both on the list and off.
I added a check for alignment to greater than 8 bytes as that is
alignment < 8 causes the slow down.
I also added two new testcases testing this to make sure it did the
load pair optimization when it is profitable.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (slow_ldpw): New tuning option.
* config/aarch64/aarch64.c (thunderx_tunings): Enable
AARCH64_EXTRA_TUNE_SLOW_LDPW.
(aarch64_operands_ok_for_ldpstp): Return false if
AARCH64_EXTRA_TUNE_SLOW_LDPW and the mode was SImode
and the alignment is less than 8 byte.
(aarch64_operands_adjust_ok_for_ldpstp): Likewise.

testsuite/ChangeLog:
* gcc.target/aarch64/thunderxloadpair.c: New testcase.
* gcc.target/aarch64/thunderxnoloadpair.c: New testcase.

>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * config/aarch64/aarch64-tuning-flags.def (slow_ldpw): New tuning option.
> * config/aarch64/aarch64.c (thunderx_tunings): Enable
> AARCH64_EXTRA_TUNE_SLOW_LDPW.
> (aarch64_operands_ok_for_ldpstp): Return false if
> AARCH64_EXTRA_TUNE_SLOW_LDPW and the mode was SImode.
> (aarch64_operands_adjust_ok_for_ldpstp): Likewise.

[-- Attachment #2: stldpw.diff.txt --]
[-- Type: text/plain, Size: 3172 bytes --]

Index: config/aarch64/aarch64-tuning-flags.def
===================================================================
--- config/aarch64/aarch64-tuning-flags.def	(revision 239228)
+++ config/aarch64/aarch64-tuning-flags.def	(working copy)
@@ -29,3 +29,4 @@
      AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
+AARCH64_EXTRA_TUNING_OPTION ("slow_ldpw", SLOW_LDPW)
Index: config/aarch64/aarch64.c
===================================================================
--- config/aarch64/aarch64.c	(revision 239228)
+++ config/aarch64/aarch64.c	(working copy)
@@ -712,7 +712,7 @@
   0,	/* max_case_values.  */
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_SLOW_LDPW)	/* tune_flags.  */
 };
 
 static const struct tune_params xgene1_tunings =
@@ -13593,6 +13593,15 @@
   if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
     return false;
 
+  /* If we have SImode and slow ldp, check the alignment to be greater
+     than 8 byte. */
+  if (mode == SImode
+      && (aarch64_tune_params.extra_tuning_flags
+          & AARCH64_EXTRA_TUNE_SLOW_LDPW)
+      && !optimize_size
+      && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
+    return false;
+
   /* Check if the addresses are in the form of [base+offset].  */
   extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
   if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
@@ -13752,6 +13761,15 @@
 	return false;
     }
 
+  /* If we have SImode and slow ldp, check the alignment to be greater
+     than 8 byte. */
+  if (mode == SImode
+      && (aarch64_tune_params.extra_tuning_flags
+          & AARCH64_EXTRA_TUNE_SLOW_LDPW)
+      && !optimize_size
+      && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
+    return false;
+
   if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
     rclass_1 = FP_REGS;
   else
Index: testsuite/gcc.target/aarch64/thunderxloadpair.c
===================================================================
--- testsuite/gcc.target/aarch64/thunderxloadpair.c	(nonexistent)
+++ testsuite/gcc.target/aarch64/thunderxloadpair.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=thunderx" } */
+
+struct ldp
+{
+  long long c;
+  int a, b;
+};
+
+
+int f(struct ldp *a)
+{
+  return a->a + a->b;
+}
+
+
+/* We know the alignement of a->a to be 8 byte aligned so it is profitable
+   to do ldp. */
+/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 1 } } */
+
Index: testsuite/gcc.target/aarch64/thunderxnoloadpair.c
===================================================================
--- testsuite/gcc.target/aarch64/thunderxnoloadpair.c	(nonexistent)
+++ testsuite/gcc.target/aarch64/thunderxnoloadpair.c	(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=thunderx" } */
+
+struct noldp
+{
+  int a, b;
+};
+
+
+int f(struct noldp *a)
+{
+  return a->a + a->b;
+}
+
+/* We know the alignement of a->a to be 4 byte aligned so it is not profitable
+   to do ldp. */
+/* { dg-final { scan-assembler-time "ldp\tw\[0-9\]+, w\[0-9\]" 1 } } */

  parent reply	other threads:[~2016-08-08 17:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-05  7:18 Andrew Pinski
2016-08-05 20:47 ` Jim Wilson
2016-08-08 17:48 ` Andrew Pinski [this message]
2016-08-09  9:43   ` Richard Earnshaw (lists)
2016-09-12 21:29     ` Andrew Pinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+=Sn1mn=G2L0L49-rTSMyOLrMDuJdmsGhFCzH-DyXz6Sh4JWg@mail.gmail.com' \
    --to=pinskia@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).