Re: [PATCH] [AArch64] PR target/71663 Improve Vector Initializtion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Hurugalawadi, Naveen" <Naveen.Hurugalawadi@cavium.com>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
	"gcc-patches@gcc.gnu.org"	<gcc-patches@gcc.gnu.org>
Cc: "Pinski, Andrew" <Andrew.Pinski@cavium.com>,
	James Greenhalgh	<james.greenhalgh@arm.com>,
	Marcus Shawcroft <marcus.shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
Date: Wed, 26 Apr 2017 09:04:00 -0000	[thread overview]
Message-ID: <CO2PR07MB26941DBF2C22CDA616A2967983110@CO2PR07MB2694.namprd07.prod.outlook.com> (raw)
In-Reply-To: <58FF0803.5070007@foss.arm.com>

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

Hi Kyrill,

Thanks for the review and your comments.

>> It would be useful if you expanded a bit on the approach used to
>> generate the improved codegen

The patch creates a duplicate of most common element and tries to optimize
the insertion using dup for the element followed by insertions.

Current code:
============================================
        movi    v2.4s, 0
        ins     v2.s[0], v0.s[0]
        ins     v2.s[1], v1.s[0]
        ins     v2.s[2], v0.s[0]
        orr     v0.16b, v2.16b, v2.16b
        ins     v0.s[3], v3.s[0]
        ret
============================================

Code after the patch:
============================================
        dup     v0.4s, v0.s[0]
        ins     v0.s[1], v1.s[0]
        ins     v0.s[3], v3.s[0]
        ret
============================================

>> Some typos

Modified as required

>> worth adding a testcase where one of the vector elements appears more than
>> the others?

Modified the testcase as required using common element.

Please review the patch and let us know if its okay?
Bootstrapped and Regression tested on aarch64-thunder-linux.

Thanks,
Naveen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr71663-2.patch --]
[-- Type: text/x-diff; name="pr71663-2.patch", Size: 2469 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2e385c4..8747a23 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11671,11 +11671,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
       aarch64_expand_vector_init (target, copy);
     }
 
-  /* Insert the variable lanes directly.  */
-
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
+  /* If there are only variable elements, try to optimize
+     the insertion using dup for the most common element
+     followed by insertions.  */
+  if (n_var == n_elts && n_elts <= 16)
+    {
+      int matches[16][2];
+      int nummatches = 0;
+      memset (matches, 0, sizeof(matches));
+      for(int i = 0; i < n_elts; i++)
+	{
+	  for (int j = 0; j <= i; j++)
+	    {
+	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+		{
+		  matches[i][0] = j;
+		  matches[j][1]++;
+		  if (i != j)
+		    nummatches++;
+		  break;
+		}
+	    }
+	}
+      int maxelement = 0;
+      int maxv = 0;
+      for (int i = 0; i < n_elts; i++)
+	if (matches[i][1] > maxv)
+	  maxelement = i, maxv = matches[i][1];
+
+      /* Create a duplicate of the most common element.  */
+      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+
+      /* Insert the rest.  */
+      for (int i = 0; i < n_elts; i++)
+	{
+	  rtx x = XVECEXP (vals, 0, i);
+	  if (matches[i][0] == maxelement)
+	    continue;
+	  x = copy_to_mode_reg (inner_mode, x);
+	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+	}
+      return;
+    }
+
+  /* Insert the variable lanes directly.  */
   for (int i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
new file mode 100644
index 0000000..a043a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float c, float d)
+{
+  return (vector float) { a, b, a, d };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
+/* { dg-final { scan-assembler-times "ins\t" 2 } } */
+/* { dg-final { scan-assembler-times "dup\t" 1 } } */

next prev parent reply	other threads:[~2017-04-26  6:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-09  3:30 Hurugalawadi, Naveen
2016-12-09  7:02 ` Hurugalawadi, Naveen
2017-02-06  6:46   ` Hurugalawadi, Naveen
2017-04-25  7:03     ` [PING] " Hurugalawadi, Naveen
2017-04-25  8:37   ` Kyrill Tkachov
2017-04-26  9:04     ` Hurugalawadi, Naveen [this message]
2017-05-11  4:56       ` [PING] " Hurugalawadi, Naveen
2017-05-26  6:27         ` [PING 2] " Hurugalawadi, Naveen
2017-06-09 14:16       ` James Greenhalgh
2017-06-13 10:25         ` Hurugalawadi, Naveen
2017-06-13 13:56           ` James Greenhalgh
2017-06-14  8:53             ` Hurugalawadi, Naveen
2017-06-14  9:10               ` James Greenhalgh
2016-12-12  3:16 ` [PATCH] [AArch64] Implement popcount pattern Hurugalawadi, Naveen
2016-12-12 10:53   ` Kyrill Tkachov
2016-12-13 11:51     ` Hurugalawadi, Naveen
2016-12-13 11:59       ` Kyrill Tkachov
2017-01-25  8:07         ` [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99 Hurugalawadi, Naveen
2017-01-25  9:29           ` Kyrill Tkachov
2017-02-02  5:03             ` Hurugalawadi, Naveen
2017-02-02 11:43               ` James Greenhalgh
2017-02-01 13:56         ` [PATCH] [AArch64] Implement popcount pattern James Greenhalgh
2017-02-02  4:03           ` Hurugalawadi, Naveen
2017-02-02 11:56             ` James Greenhalgh
2017-02-03  5:56               ` Andrew Pinski
2017-02-03  7:02                 ` Hurugalawadi, Naveen
2017-02-03 10:28                   ` Kyrill Tkachov
2017-02-04  4:04                   ` James Greenhalgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CO2PR07MB26941DBF2C22CDA616A2967983110@CO2PR07MB2694.namprd07.prod.outlook.com \
    --to=naveen.hurugalawadi@cavium.com \
    --cc=Andrew.Pinski@cavium.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=marcus.shawcroft@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).