[PATCH] [AArch64] PR target/71663 Improve Vector Initializtion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
@ 2016-12-09  3:30 Hurugalawadi, Naveen
  2016-12-09  7:02 ` Hurugalawadi, Naveen
  2016-12-12  3:16 ` [PATCH] [AArch64] Implement popcount pattern Hurugalawadi, Naveen
  0 siblings, 2 replies; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2016-12-09  3:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

Hi,

The AArch64 vector initialization sequence can be optimized to generate
better code. The attached patch handles for the case where the vector
contains only variables. It checks for the common elements in the vector
and inserts the values in optimized way.

Bootstrapped and Regression tested on aarch64-thunder-linux.
Please review the patch and let us know if its okay?

2016-12-09  Andrew PInski  <apinski@cavium.com>

gcc
	* config/aarch64/aarch64.c (aarch64_expand_vector_init):
	Improve vector initialization code gen.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr71663.patch --]
[-- Type: text/x-diff; name="pr71663.patch", Size: 1807 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e87831f..da5b6fa 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11609,11 +11609,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
       aarch64_expand_vector_init (target, copy);
     }
 
-  /* Insert the variable lanes directly.  */
-
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
+  /* If there is only varables, try to optimize
+     the inseration using dup for the most common element
+     followed by insertations. */
+  if (n_var == n_elts && n_elts <= 16)
+    {
+      int matches[16][2];
+      int nummatches = 0;
+      memset (matches, 0, sizeof(matches));
+      for(int i = 0; i < n_elts; i++)
+	{
+	  for (int j = 0; j <= i; j++)
+	    {
+	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+		{
+		  matches[i][0] = j;
+		  matches[j][1]++;
+		  if (i != j)
+		    nummatches++;
+		  break;
+		}
+	    }
+	}
+      int maxelement = 0;
+      int maxv = 0;
+      for (int i = 0; i < n_elts; i++)
+	if (matches[i][1] > maxv)
+	  maxelement = i, maxv = matches[i][1];
+
+      /* Create a duplicate of the most common element. */
+      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+      /* Insert the rest. */
+      for (int i = 0; i < n_elts; i++)
+	{
+	  rtx x = XVECEXP (vals, 0, i);
+	  if (matches[i][0] == maxelement)
+	    continue;
+	  x = copy_to_mode_reg (inner_mode, x);
+	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+	}
+      return;
+    }
+
+  /* Insert the variable lanes directly.  */
+
   for (int i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2016-12-09  3:30 [PATCH] [AArch64] PR target/71663 Improve Vector Initializtion Hurugalawadi, Naveen
@ 2016-12-09  7:02 ` Hurugalawadi, Naveen
  2017-02-06  6:46   ` Hurugalawadi, Naveen
  2017-04-25  8:37   ` Kyrill Tkachov
  2016-12-12  3:16 ` [PATCH] [AArch64] Implement popcount pattern Hurugalawadi, Naveen
  1 sibling, 2 replies; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2016-12-09  7:02 UTC (permalink / raw)
  To: gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

Hi,

Sorry. Missed out the testcase in patch submission.
Added the missing testcase along with the ChangeLog.
Please review the same and let us know if thats okay?

2016-12-09  Andrew PInski  <apinski@cavium.com>

gcc
        * config/aarch64/aarch64.c (aarch64_expand_vector_init):
        Improve vector initialization code gen.    
gcc/testsuite
        * gcc.target/aarch64/pr71663.c: New Testcase.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr71663-1.patch --]
[-- Type: text/x-diff; name="pr71663-1.patch", Size: 2459 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e87831f..da5b6fa 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11609,11 +11609,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
       aarch64_expand_vector_init (target, copy);
     }
 
-  /* Insert the variable lanes directly.  */
-
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
+  /* If there is only varables, try to optimize
+     the inseration using dup for the most common element
+     followed by insertations. */
+  if (n_var == n_elts && n_elts <= 16)
+    {
+      int matches[16][2];
+      int nummatches = 0;
+      memset (matches, 0, sizeof(matches));
+      for(int i = 0; i < n_elts; i++)
+	{
+	  for (int j = 0; j <= i; j++)
+	    {
+	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+		{
+		  matches[i][0] = j;
+		  matches[j][1]++;
+		  if (i != j)
+		    nummatches++;
+		  break;
+		}
+	    }
+	}
+      int maxelement = 0;
+      int maxv = 0;
+      for (int i = 0; i < n_elts; i++)
+	if (matches[i][1] > maxv)
+	  maxelement = i, maxv = matches[i][1];
+
+      /* Create a duplicate of the most common element. */
+      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+      /* Insert the rest. */
+      for (int i = 0; i < n_elts; i++)
+	{
+	  rtx x = XVECEXP (vals, 0, i);
+	  if (matches[i][0] == maxelement)
+	    continue;
+	  x = copy_to_mode_reg (inner_mode, x);
+	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+	}
+      return;
+    }
+
+  /* Insert the variable lanes directly.  */
+
   for (int i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
new file mode 100644
index 0000000..c8df847
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float c, float d)
+{
+  return (vector float) { a, b, c, d };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
+/* { dg-final { scan-assembler-times "ins\t" 3 } } */
+/* { dg-final { scan-assembler-times "dup\t" 1 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2016-12-09  7:02 ` Hurugalawadi, Naveen
@ 2017-02-06  6:46   ` Hurugalawadi, Naveen
  2017-04-25  7:03     ` [PING] " Hurugalawadi, Naveen
  2017-04-25  8:37   ` Kyrill Tkachov
  1 sibling, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-02-06  6:46 UTC (permalink / raw)
  To: gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi,

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.

https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00718.html

Thanks,
Naveen




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PING] [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-02-06  6:46   ` Hurugalawadi, Naveen
@ 2017-04-25  7:03     ` Hurugalawadi, Naveen
  0 siblings, 0 replies; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-04-25  7:03 UTC (permalink / raw)
  To: gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw, nd

Hi,

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.

https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00718.html

Thanks,
Naveen




    

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2016-12-09  7:02 ` Hurugalawadi, Naveen
  2017-02-06  6:46   ` Hurugalawadi, Naveen
@ 2017-04-25  8:37   ` Kyrill Tkachov
  2017-04-26  9:04     ` Hurugalawadi, Naveen
  1 sibling, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2017-04-25  8:37 UTC (permalink / raw)
  To: Hurugalawadi, Naveen, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi Naveen,

On 09/12/16 07:02, Hurugalawadi, Naveen wrote:
> Hi,
>
> Sorry. Missed out the testcase in patch submission.
> Added the missing testcase along with the ChangeLog.
> Please review the same and let us know if thats okay?

It would be useful if you expanded a bit on the approach used to
generate the improved codegen, or at least show for the testcase
what code was generated before this patch and what is generated
after this patch.

> 2016-12-09  Andrew PInski  <apinski@cavium.com>
>
> gcc
>          * config/aarch64/aarch64.c (aarch64_expand_vector_init):
>          Improve vector initialization code gen.
> gcc/testsuite
>          * gcc.target/aarch64/pr71663.c: New Testcase.

+  /* If there is only varables, try to optimize
+     the inseration using dup for the most common element
+     followed by insertations. */

Some typos: s/is only varables/are only variable elements/, s/inseration/insertion/,
s/insertations/insertions/.

  +  if (n_var == n_elts && n_elts <= 16)


diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
new file mode 100644
index 0000000..c8df847
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float c, float d)
+{
+  return (vector float) { a, b, c, d };
+}

A large part of the aarch64.c hunk of your patch deals with finding the most commonly-occuring
element in the vector of variables, yet in your testcase all variables appear exactly once.
Perhaps worth adding a testcase where one of the vector elements appears more than the others?
I'd guess the codegen then would be better with this patch?

Cheers,
Kyrill

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-04-25  8:37   ` Kyrill Tkachov
@ 2017-04-26  9:04     ` Hurugalawadi, Naveen
  2017-05-11  4:56       ` [PING] " Hurugalawadi, Naveen
  2017-06-09 14:16       ` James Greenhalgh
  0 siblings, 2 replies; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-04-26  9:04 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

Hi Kyrill,

Thanks for the review and your comments.

>> It would be useful if you expanded a bit on the approach used to
>> generate the improved codegen

The patch creates a duplicate of most common element and tries to optimize
the insertion using dup for the element followed by insertions.

Current code:
============================================
        movi    v2.4s, 0
        ins     v2.s[0], v0.s[0]
        ins     v2.s[1], v1.s[0]
        ins     v2.s[2], v0.s[0]
        orr     v0.16b, v2.16b, v2.16b
        ins     v0.s[3], v3.s[0]
        ret
============================================

Code after the patch:
============================================
        dup     v0.4s, v0.s[0]
        ins     v0.s[1], v1.s[0]
        ins     v0.s[3], v3.s[0]
        ret
============================================

>> Some typos

Modified as required

>> worth adding a testcase where one of the vector elements appears more than
>> the others?

Modified the testcase as required using common element.

Please review the patch and let us know if its okay?
Bootstrapped and Regression tested on aarch64-thunder-linux.

Thanks,
Naveen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr71663-2.patch --]
[-- Type: text/x-diff; name="pr71663-2.patch", Size: 2469 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2e385c4..8747a23 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11671,11 +11671,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
       aarch64_expand_vector_init (target, copy);
     }
 
-  /* Insert the variable lanes directly.  */
-
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
+  /* If there are only variable elements, try to optimize
+     the insertion using dup for the most common element
+     followed by insertions.  */
+  if (n_var == n_elts && n_elts <= 16)
+    {
+      int matches[16][2];
+      int nummatches = 0;
+      memset (matches, 0, sizeof(matches));
+      for(int i = 0; i < n_elts; i++)
+	{
+	  for (int j = 0; j <= i; j++)
+	    {
+	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+		{
+		  matches[i][0] = j;
+		  matches[j][1]++;
+		  if (i != j)
+		    nummatches++;
+		  break;
+		}
+	    }
+	}
+      int maxelement = 0;
+      int maxv = 0;
+      for (int i = 0; i < n_elts; i++)
+	if (matches[i][1] > maxv)
+	  maxelement = i, maxv = matches[i][1];
+
+      /* Create a duplicate of the most common element.  */
+      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+
+      /* Insert the rest.  */
+      for (int i = 0; i < n_elts; i++)
+	{
+	  rtx x = XVECEXP (vals, 0, i);
+	  if (matches[i][0] == maxelement)
+	    continue;
+	  x = copy_to_mode_reg (inner_mode, x);
+	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+	}
+      return;
+    }
+
+  /* Insert the variable lanes directly.  */
   for (int i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
new file mode 100644
index 0000000..a043a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float c, float d)
+{
+  return (vector float) { a, b, a, d };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
+/* { dg-final { scan-assembler-times "ins\t" 2 } } */
+/* { dg-final { scan-assembler-times "dup\t" 1 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PING] [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-04-26  9:04     ` Hurugalawadi, Naveen
@ 2017-05-11  4:56       ` Hurugalawadi, Naveen
  2017-05-26  6:27         ` [PING 2] " Hurugalawadi, Naveen
  2017-06-09 14:16       ` James Greenhalgh
  1 sibling, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-05-11  4:56 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi,  

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.  

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01260.html

Thanks,
Naveen



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PING 2] [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-05-11  4:56       ` [PING] " Hurugalawadi, Naveen
@ 2017-05-26  6:27         ` Hurugalawadi, Naveen
  0 siblings, 0 replies; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-05-26  6:27 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi,  

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.  

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01260.html

Thanks,
Naveen



    

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-04-26  9:04     ` Hurugalawadi, Naveen
  2017-05-11  4:56       ` [PING] " Hurugalawadi, Naveen
@ 2017-06-09 14:16       ` James Greenhalgh
  2017-06-13 10:25         ` Hurugalawadi, Naveen
  1 sibling, 1 reply; 28+ messages in thread
From: James Greenhalgh @ 2017-06-09 14:16 UTC (permalink / raw)
  To: Hurugalawadi, Naveen
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

On Wed, Apr 26, 2017 at 06:34:36AM +0000, Hurugalawadi, Naveen wrote:
> Hi Kyrill,
> 
> Thanks for the review and your comments.
> 
> >> It would be useful if you expanded a bit on the approach used to
> >> generate the improved codegen
> 
> The patch creates a duplicate of most common element and tries to optimize
> the insertion using dup for the element followed by insertions.
> 
> Current code:
> ============================================
>         movi    v2.4s, 0
>         ins     v2.s[0], v0.s[0]
>         ins     v2.s[1], v1.s[0]
>         ins     v2.s[2], v0.s[0]
>         orr     v0.16b, v2.16b, v2.16b
>         ins     v0.s[3], v3.s[0]
>         ret
> ============================================
> 
> Code after the patch:
> ============================================
>         dup     v0.4s, v0.s[0]
>         ins     v0.s[1], v1.s[0]
>         ins     v0.s[3], v3.s[0]
>         ret
> ============================================
> 
> >> Some typos
> 
> Modified as required
> 
> >> worth adding a testcase where one of the vector elements appears more than
> >> the others?
> 
> Modified the testcase as required using common element.
> 
> Please review the patch and let us know if its okay?
> Bootstrapped and Regression tested on aarch64-thunder-linux.

This patch fell through the cracks as it shows up in a reply chain with a
few others. If you could try to keep one reply chain for each patch series
you're submitting, that would make tracking the submissions much
easier :-).


> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 2e385c4..8747a23 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11671,11 +11671,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>        aarch64_expand_vector_init (target, copy);
>      }
>  
> -  /* Insert the variable lanes directly.  */
> -
>    enum insn_code icode = optab_handler (vec_set_optab, mode);
>    gcc_assert (icode != CODE_FOR_nothing);
>  
> +  /* If there are only variable elements, try to optimize
> +     the insertion using dup for the most common element
> +     followed by insertions.  */
> +  if (n_var == n_elts && n_elts <= 16)
> +    {
> +      int matches[16][2];
> +      int nummatches = 0;
> +      memset (matches, 0, sizeof(matches));

Very minor, but what is wrong with:

  int matches[16][2] = {0};

Rather than the explicit memset?

> +      for(int i = 0; i < n_elts; i++)
> +	{
> +	  for (int j = 0; j <= i; j++)
> +	    {
> +	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
> +		{
> +		  matches[i][0] = j;
> +		  matches[j][1]++;
> +		  if (i != j)
> +		    nummatches++;

nummatches is unused.

This search algorithm is tough to follow. A comment explaining that you will
fill matches[*][0] with the earliest matching element, and matches[X][1]
with the count of duplicate elements (if X is the earliest element which has
duplicates). Would be useful to understand exactly what you're aiming for.
Certainly it took me a while to understand that for:

  { a, b, c, b, b, d, c, e }

matches would look like:

  { { 0, 1 },
    { 1, 3 },
    { 2. 2 },
    { 1, 0 },
    { 1, 0 },
    { 5, 1 },
    { 2, 0 },
    { 7, 1 } }

> +		}
> +	    }
> +	}
> +      int maxelement = 0;
> +      int maxv = 0;
> +      for (int i = 0; i < n_elts; i++)
> +	if (matches[i][1] > maxv)
> +	  maxelement = i, maxv = matches[i][1];

Put braces round this and write it as two statements, or eliminate the use of
maxv and use matches[i][1] > matches[maxelement][1], but don't use the comma
operator like this please.

> +      /* Create a duplicate of the most common element.  */
> +      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> +      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
> +
> +      /* Insert the rest.  */
> +      for (int i = 0; i < n_elts; i++)
> +	{
> +	  rtx x = XVECEXP (vals, 0, i);
> +	  if (matches[i][0] == maxelement)
> +	    continue;
> +	  x = copy_to_mode_reg (inner_mode, x);
> +	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
> +	}
> +      return;
> +    }
> +
> +  /* Insert the variable lanes directly.  */

This code would read better if you rearranged the cases. As it stands your
code is added in the middle of a logical operation (deal with constant lanes,
then deal with variable lanes). Move your new code above the part-variable
case.

>    for (int i = 0; i < n_elts; i++)
>      {
>        rtx x = XVECEXP (vals, 0, i);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
> new file mode 100644
> index 0000000..a043a21
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(16)))
> +
> +vector float combine (float a, float b, float c, float d)

c is unused.

> +{
> +  return (vector float) { a, b, a, d };
> +}
> +
> +/* { dg-final { scan-assembler-not "movi\t" } } */
> +/* { dg-final { scan-assembler-not "orr\t" } } */
> +/* { dg-final { scan-assembler-times "ins\t" 2 } } */
> +/* { dg-final { scan-assembler-times "dup\t" 1 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-06-09 14:16       ` James Greenhalgh
@ 2017-06-13 10:25         ` Hurugalawadi, Naveen
  2017-06-13 13:56           ` James Greenhalgh
  0 siblings, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-06-13 10:25 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 678 bytes --]

Hi James,

Thanks for your review and useful comments.

>> If you could try to keep one reply chain for each patch series
Will keep that in mind for sure :-)

>> Very minor, but what is wrong with:
>> int matches[16][2] = {0};
Done.

>> nummatches is unused.
Removed.

>> This search algorithm is tough to follow
Updated as per your comments.

>> Put braces round this and write it as two statements
Done.

>> Move your new code above the part-variable case.
Done.

>> c is unused.
Removed.

Bootstrapped and Regression tested on aarch64-thunder-linux.

Please review the patch and let us know if any comments or suggestions.

Thanks,
Naveen
    

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr71663-2.patch --]
[-- Type: text/x-diff; name="pr71663-2.patch", Size: 2857 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bce490f..239ba72 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11707,6 +11707,57 @@ aarch64_expand_vector_init (rtx target, rtx vals)
       return;
     }
 
+  enum insn_code icode = optab_handler (vec_set_optab, mode);
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  /* If there are only variable elements, try to optimize
+     the insertion using dup for the most common element
+     followed by insertions.  */
+
+  /* The algorithm will fill matches[*][0] with the earliest matching element,
+     and matches[X][1] with the count of duplicate elements (if X is the
+     earliest element which has duplicates).  */
+
+  if (n_var == n_elts && n_elts <= 16)
+    {
+      int matches[16][2] = {0};
+      for (int i = 0; i < n_elts; i++)
+	{
+	  for (int j = 0; j <= i; j++)
+	    {
+	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+		{
+		  matches[i][0] = j;
+		  matches[j][1]++;
+		  break;
+		}
+	    }
+	}
+      int maxelement = 0;
+      int maxv = 0;
+      for (int i = 0; i < n_elts; i++)
+	if (matches[i][1] > maxv)
+	  {
+	    maxelement = i;
+	    maxv = matches[i][1];
+	  }
+
+      /* Create a duplicate of the most common element.  */
+      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+
+      /* Insert the rest.  */
+      for (int i = 0; i < n_elts; i++)
+	{
+	  rtx x = XVECEXP (vals, 0, i);
+	  if (matches[i][0] == maxelement)
+	    continue;
+	  x = copy_to_mode_reg (inner_mode, x);
+	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+	}
+      return;
+    }
+
   /* Initialise a vector which is part-variable.  We want to first try
      to build those lanes which are constant in the most efficient way we
      can.  */
@@ -11740,10 +11791,6 @@ aarch64_expand_vector_init (rtx target, rtx vals)
     }
 
   /* Insert the variable lanes directly.  */
-
-  enum insn_code icode = optab_handler (vec_set_optab, mode);
-  gcc_assert (icode != CODE_FOR_nothing);
-
   for (int i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
new file mode 100644
index 0000000..65f368d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float d)
+{
+  return (vector float) { a, b, a, d };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
+/* { dg-final { scan-assembler-times "ins\t" 2 } } */
+/* { dg-final { scan-assembler-times "dup\t" 1 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-06-13 10:25         ` Hurugalawadi, Naveen
@ 2017-06-13 13:56           ` James Greenhalgh
  2017-06-14  8:53             ` Hurugalawadi, Naveen
  0 siblings, 1 reply; 28+ messages in thread
From: James Greenhalgh @ 2017-06-13 13:56 UTC (permalink / raw)
  To: Hurugalawadi, Naveen
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

On Tue, Jun 13, 2017 at 10:24:59AM +0000, Hurugalawadi, Naveen wrote:
> Hi James,
> 
> Thanks for your review and useful comments.
> 
> >> If you could try to keep one reply chain for each patch series
> Will keep that in mind for sure :-)
> 
> >> Very minor, but what is wrong with:
> >> int matches[16][2] = {0};
> Done.
> 
> >> nummatches is unused.
> Removed.
> 
> >> This search algorithm is tough to follow
> Updated as per your comments.
> 
> >> Put braces round this and write it as two statements
> Done.
> 
> >> Move your new code above the part-variable case.
> Done.
> 
> >> c is unused.
> Removed.
> 
> Bootstrapped and Regression tested on aarch64-thunder-linux.
> 
> Please review the patch and let us know if any comments or suggestions.

Almost OK. Could you make the testcase a bit more comprehensive? Test
that the right thing happens with multiple duplicates, that the right thing
happens when there are no duplicates, etc. At the moment the test does not
provide good coverage of the cases your code handles.

With a fuller testcase this will likely be OK, but please repost the patch
for another review.

Thanks,
James

> diff --git a/gcc/testsuite/gcc.target/aarch64/pr71663.c b/gcc/testsuite/gcc.target/aarch64/pr71663.c
> new file mode 100644
> index 0000000..65f368d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr71663.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(16)))
> +
> +vector float combine (float a, float b, float d)
> +{
> +  return (vector float) { a, b, a, d };
> +}
> +
> +/* { dg-final { scan-assembler-not "movi\t" } } */
> +/* { dg-final { scan-assembler-not "orr\t" } } */
> +/* { dg-final { scan-assembler-times "ins\t" 2 } } */
> +/* { dg-final { scan-assembler-times "dup\t" 1 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-06-13 13:56           ` James Greenhalgh
@ 2017-06-14  8:53             ` Hurugalawadi, Naveen
  2017-06-14  9:10               ` James Greenhalgh
  0 siblings, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-06-14  8:53 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]

Hi James,

>> Could you make the testcase a bit more comprehensive? 

Modified the testcase considering all the possible cases.
Split up the test based on different scenarios.

Please review the patch and let us know if its okay?

Thanks,
Naveen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr71663-3.patch --]
[-- Type: text/x-diff; name="pr71663-3.patch", Size: 4921 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bce490f..239ba72 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11707,6 +11707,57 @@ aarch64_expand_vector_init (rtx target, rtx vals)
       return;
     }
 
+  enum insn_code icode = optab_handler (vec_set_optab, mode);
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  /* If there are only variable elements, try to optimize
+     the insertion using dup for the most common element
+     followed by insertions.  */
+
+  /* The algorithm will fill matches[*][0] with the earliest matching element,
+     and matches[X][1] with the count of duplicate elements (if X is the
+     earliest element which has duplicates).  */
+
+  if (n_var == n_elts && n_elts <= 16)
+    {
+      int matches[16][2] = {0};
+      for (int i = 0; i < n_elts; i++)
+	{
+	  for (int j = 0; j <= i; j++)
+	    {
+	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+		{
+		  matches[i][0] = j;
+		  matches[j][1]++;
+		  break;
+		}
+	    }
+	}
+      int maxelement = 0;
+      int maxv = 0;
+      for (int i = 0; i < n_elts; i++)
+	if (matches[i][1] > maxv)
+	  {
+	    maxelement = i;
+	    maxv = matches[i][1];
+	  }
+
+      /* Create a duplicate of the most common element.  */
+      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+
+      /* Insert the rest.  */
+      for (int i = 0; i < n_elts; i++)
+	{
+	  rtx x = XVECEXP (vals, 0, i);
+	  if (matches[i][0] == maxelement)
+	    continue;
+	  x = copy_to_mode_reg (inner_mode, x);
+	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+	}
+      return;
+    }
+
   /* Initialise a vector which is part-variable.  We want to first try
      to build those lanes which are constant in the most efficient way we
      can.  */
@@ -11740,10 +11791,6 @@ aarch64_expand_vector_init (rtx target, rtx vals)
     }
 
   /* Insert the variable lanes directly.  */
-
-  enum insn_code icode = optab_handler (vec_set_optab, mode);
-  gcc_assert (icode != CODE_FOR_nothing);
-
   for (int i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-init-1.c b/gcc/testsuite/gcc.target/aarch64/vect-init-1.c
new file mode 100644
index 0000000..90ba3ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-init-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float c, float d)
+{
+  return (vector float) { a, b, c, d };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-init-2.c b/gcc/testsuite/gcc.target/aarch64/vect-init-2.c
new file mode 100644
index 0000000..0444675
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-init-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b, float d)
+{
+  return (vector float) { a, b, a, d };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-init-3.c b/gcc/testsuite/gcc.target/aarch64/vect-init-3.c
new file mode 100644
index 0000000..b5822b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-init-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b)
+{
+  return (vector float) { a, b, a, b };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-init-4.c b/gcc/testsuite/gcc.target/aarch64/vect-init-4.c
new file mode 100644
index 0000000..09a0095
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-init-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b)
+{
+  return (vector float) { a, b, b, a };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-init-5.c b/gcc/testsuite/gcc.target/aarch64/vect-init-5.c
new file mode 100644
index 0000000..76d5502
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-init-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define vector __attribute__((vector_size(16)))
+
+vector float combine (float a, float b)
+{
+  return (vector float) { a, b, a, a };
+}
+
+/* { dg-final { scan-assembler-not "movi\t" } } */
+/* { dg-final { scan-assembler-not "orr\t" } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  PR target/71663 Improve Vector Initializtion
  2017-06-14  8:53             ` Hurugalawadi, Naveen
@ 2017-06-14  9:10               ` James Greenhalgh
  0 siblings, 0 replies; 28+ messages in thread
From: James Greenhalgh @ 2017-06-14  9:10 UTC (permalink / raw)
  To: Hurugalawadi, Naveen
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

On Wed, Jun 14, 2017 at 08:53:27AM +0000, Hurugalawadi, Naveen wrote:
> Hi James,
> 
> >> Could you make the testcase a bit more comprehensive? 
> 
> Modified the testcase considering all the possible cases.
> Split up the test based on different scenarios.
> 
> Please review the patch and let us know if its okay?

OK with an appropriate ChangeLog. Thanks for the patch!

Thanks,
James


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] [AArch64]  Implement popcount pattern
  2016-12-09  3:30 [PATCH] [AArch64] PR target/71663 Improve Vector Initializtion Hurugalawadi, Naveen
  2016-12-09  7:02 ` Hurugalawadi, Naveen
@ 2016-12-12  3:16 ` Hurugalawadi, Naveen
  2016-12-12 10:53   ` Kyrill Tkachov
  1 sibling, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2016-12-12  3:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

Hi,

Please find attached the patch that implements the support for popcount
patterns in AArch64.

The implementation improves OVS-DPDK on ThunderX by 3%. It would have a
similar effect on other AArch64 targets.

Please review the patch and let us know if its okay?

2016-12-12  Andrew Pinski  <apinski@cavium.com>

gcc
	* config/aarch64/aarch64.md (popcount<mode>2): New pattern.

gcc/testsuite
	* gcc.target/aarch64/popcount.c : New Testcase.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: popcount.patch --]
[-- Type: text/x-diff; name="popcount.patch", Size: 1733 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..c688ddc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
   }
 )
 
+/* Pop count be done via the pop count instruction in NEON. */
+/*
+  mov v.1d, x0
+  Cnt v1.8b, v.8b
+  Addv b2, v1.8b
+  Mov w0, v2.b[0]
+*/
+(define_expand "popcount<mode>2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(<MODE>mode == SImode)
+    {
+      rtx tmp;
+      tmp = gen_reg_rtx (DImode);
+      /* If we have SImode, zero extend to DImode, pop count does
+         not change if we have extra zeros. */
+      emit_insn (gen_zero_extendsidi2 (tmp, in));
+      in = tmp;
+    }
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi<mode>2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
         (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount.c b/gcc/testsuite/gcc.target/aarch64/popcount.c
new file mode 100644
index 0000000..2d71168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x)
+{
+  return __builtin_popcount(x);
+}
+
+/* { dg-final { scan-assembler-not "popcountdi2" } } */
+/* { dg-final { scan-assembler "cnt\t" } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  Implement popcount pattern
  2016-12-12  3:16 ` [PATCH] [AArch64] Implement popcount pattern Hurugalawadi, Naveen
@ 2016-12-12 10:53   ` Kyrill Tkachov
  2016-12-13 11:51     ` Hurugalawadi, Naveen
  0 siblings, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2016-12-12 10:53 UTC (permalink / raw)
  To: Hurugalawadi, Naveen, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi Naveen,

On 12/12/16 03:16, Hurugalawadi, Naveen wrote:
> Hi,
>
> Please find attached the patch that implements the support for popcount
> patterns in AArch64.
>
> The implementation improves OVS-DPDK on ThunderX by 3%. It would have a
> similar effect on other AArch64 targets.
>
> Please review the patch and let us know if its okay?

Besides a few comments below this looks good to me if it has gone through
the normal required bootstrap and testing, but I can't approve.
It's at the discretion of the target maintainers/reviewers if they want to
accept it at this stage.

> 2016-12-12  Andrew Pinski  <apinski@cavium.com>
>
> gcc
> 	* config/aarch64/aarch64.md (popcount<mode>2): New pattern.
>
> gcc/testsuite
> 	* gcc.target/aarch64/popcount.c : New Testcase.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..c688ddc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
    }
  )
  
+/* Pop count be done via the pop count instruction in NEON. */

The rest of the MD file uses the term AdvSIMD. Also, the instrurction
is CNT rather than "pop count".

+/*
+  mov v.1d, x0
+  Cnt v1.8b, v.8b
+  Addv b2, v1.8b
+  Mov w0, v2.b[0]
+*/

Minor nit, but please use semicolons for MD file comments. The convention (at least in the aarch64 md files)
is to use ";;" at the beginning of the line. Also, please use uppercase letters for the assembly instructions
in the comment.

+(define_expand "popcount<mode>2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(<MODE>mode == SImode)
+    {
+      rtx tmp;
+      tmp = gen_reg_rtx (DImode);
+      /* If we have SImode, zero extend to DImode, pop count does
+         not change if we have extra zeros. */
+      emit_insn (gen_zero_extendsidi2 (tmp, in));
+      in = tmp;
+    }
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi<mode>2 (out, r));
+  DONE;
+})
+
  (define_insn "clrsb<mode>2"
    [(set (match_operand:GPI 0 "register_operand" "=r")
          (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount.c b/gcc/testsuite/gcc.target/aarch64/popcount.c
new file mode 100644
index 0000000..2d71168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x)
+{
+  return __builtin_popcount(x);
+}
+
+/* { dg-final { scan-assembler-not "popcountdi2" } } */

__builtin_popcount takes an unsigned int, so this should be scanning for absence of popcountsi2 instead?
I suggest you also add a test for popcountdi2 using__builtin_popcountll.
Thanks,
Kyrill

+/* { dg-final { scan-assembler "cnt\t" } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  Implement popcount pattern
  2016-12-12 10:53   ` Kyrill Tkachov
@ 2016-12-13 11:51     ` Hurugalawadi, Naveen
  2016-12-13 11:59       ` Kyrill Tkachov
  0 siblings, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2016-12-13 11:51 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

Hi Kyrill,

Thanks for reviewing the patch and your useful comments.

>> looks good to me if it has gone through the normal required
>> bootstrap and testing, but I can't approve.

Bootstrapped and Regression Tested on aarch64-thunderx-linux.

>> The rest of the MD file uses the term AdvSIMD. Also, the instrurction
>> is CNT rather than "pop count".

Done.

>> __builtin_popcount takes an unsigned int, so this should be 
>> scanning for absence of popcountsi2 instead?

Done.

Please find attached the modified patch as per review comments
and let me know if its okay for Stage-1 or current branch.

Thanks,
Naveen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: popcount-1.patch --]
[-- Type: text/x-diff; name="popcount-1.patch", Size: 1796 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..0acb3f0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount<mode>2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(<MODE>mode == SImode)
+    {
+      rtx tmp;
+      tmp = gen_reg_rtx (DImode);
+      /* If we have SImode, zero extend to DImode, pop count does
+         not change if we have extra zeros. */
+      emit_insn (gen_zero_extendsidi2 (tmp, in));
+      in = tmp;
+    }
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi<mode>2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
         (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 0000000..37cf4b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+long
+foo1 (int x)
+{
+  return __builtin_popcountl (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 2 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  Implement popcount pattern
  2016-12-13 11:51     ` Hurugalawadi, Naveen
@ 2016-12-13 11:59       ` Kyrill Tkachov
  2017-01-25  8:07         ` [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99 Hurugalawadi, Naveen
  2017-02-01 13:56         ` [PATCH] [AArch64] Implement popcount pattern James Greenhalgh
  0 siblings, 2 replies; 28+ messages in thread
From: Kyrill Tkachov @ 2016-12-13 11:59 UTC (permalink / raw)
  To: Hurugalawadi, Naveen, gcc-patches
  Cc: Pinski, Andrew, James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi Naveen,

On 13/12/16 11:51, Hurugalawadi, Naveen wrote:
> Hi Kyrill,
>
> Thanks for reviewing the patch and your useful comments.
>
>>> looks good to me if it has gone through the normal required
>>> bootstrap and testing, but I can't approve.
> Bootstrapped and Regression Tested on aarch64-thunderx-linux.
>
>>> The rest of the MD file uses the term AdvSIMD. Also, the instrurction
>>> is CNT rather than "pop count".
> Done.
>
>>> __builtin_popcount takes an unsigned int, so this should be
>>> scanning for absence of popcountsi2 instead?
> Done.
>
> Please find attached the modified patch as per review comments
> and let me know if its okay for Stage-1 or current branch.

This looks much better, thanks.
I still have a minor nit about the testcase.

+long
+foo1 (int x)
+{
+  return __builtin_popcountl (x);
+}

On ILP32 systems this would still use the SImode patterns, so I suggest you use __builtin_popcountll and
an unsigned long long return type to ensure you always exercise the 64-bit code.

+
+/* { dg-final { scan-assembler-not "popcount" } } */


This looks ok to me otherwise, but you'll need an ok from the aarch64 folk.
Kyrill

> Thanks,
> Naveen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99
  2016-12-13 11:59       ` Kyrill Tkachov
@ 2017-01-25  8:07         ` Hurugalawadi, Naveen
  2017-01-25  9:29           ` Kyrill Tkachov
  2017-02-01 13:56         ` [PATCH] [AArch64] Implement popcount pattern James Greenhalgh
  1 sibling, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-01-25  8:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

Hi,

Please find attached the patch that adds AES and CMP_BRANCH
fusion for Thunderx2t99.

Bootstrapped and Regression tested on aarch64-thunderx2t99.
Please review the patch and let us know if its okay?

2017-1-25  Naveen H.S <Naveen.Hurugalawadi@cavium.com>

gcc
        * config/aarch64/aarch64.c (thunderx2t99_tunings):
	Improve vector initialization code gen.

Thanks,
Naveen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aes-cmp.patch --]
[-- Type: text/x-diff; name="aes-cmp.patch", Size: 526 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f343d92..acaa975 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -780,7 +780,7 @@ static const struct tune_params thunderx2t99_tunings =
   &generic_approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  AARCH64_FUSE_NOTHING, /* fuseable_ops.  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99
  2017-01-25  8:07         ` [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99 Hurugalawadi, Naveen
@ 2017-01-25  9:29           ` Kyrill Tkachov
  2017-02-02  5:03             ` Hurugalawadi, Naveen
  0 siblings, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2017-01-25  9:29 UTC (permalink / raw)
  To: Hurugalawadi, Naveen, gcc-patches
  Cc: James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi Naveen,

On 25/01/17 06:16, Hurugalawadi, Naveen wrote:
> Hi,
>
> Please find attached the patch that adds AES and CMP_BRANCH
> fusion for Thunderx2t99.
>
> Bootstrapped and Regression tested on aarch64-thunderx2t99.
> Please review the patch and let us know if its okay?

Code looks ok (it's quite simple), but I can't approve.
but there are a couple of issues with the ChangeLog

> 2017-1-25  Naveen H.S <Naveen.Hurugalawadi@cavium.com>

2017-01-25.
Also, two spaces between name and email

>
> gcc
>          * config/aarch64/aarch64.c (thunderx2t99_tunings):
> 	Improve vector initialization code gen.

This doesn't fit the code in the patch

Cheers,
Kyrill

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99
  2017-01-25  9:29           ` Kyrill Tkachov
@ 2017-02-02  5:03             ` Hurugalawadi, Naveen
  2017-02-02 11:43               ` James Greenhalgh
  0 siblings, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-02-02  5:03 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: James Greenhalgh, Marcus Shawcroft, Richard Earnshaw

Hi Kyrill,

Thanks for the review and comments.

>> but there are a couple of issues with the ChangeLog

2017-02-02  Naveen H.S  <Naveen.Hurugalawadi@cavium.com>

	* config/aarch64/aarch64.c (thunderx2t99_tunings): Enable AES and
	cmp_branch fusion.

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99
  2017-02-02  5:03             ` Hurugalawadi, Naveen
@ 2017-02-02 11:43               ` James Greenhalgh
  0 siblings, 0 replies; 28+ messages in thread
From: James Greenhalgh @ 2017-02-02 11:43 UTC (permalink / raw)
  To: Hurugalawadi, Naveen
  Cc: Kyrill Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw, nd

On Thu, Feb 02, 2017 at 05:03:13AM +0000, Hurugalawadi, Naveen wrote:
> Hi Kyrill,
> 
> Thanks for the review and comments.
> 
> >> but there are a couple of issues with the ChangeLog
> 
> 2017-02-02  Naveen H.S  <Naveen.Hurugalawadi@cavium.com>
> 
> 	* config/aarch64/aarch64.c (thunderx2t99_tunings): Enable AES and
> 	cmp_branch fusion.
> 

OK.

Thanks,
James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  Implement popcount pattern
  2016-12-13 11:59       ` Kyrill Tkachov
  2017-01-25  8:07         ` [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99 Hurugalawadi, Naveen
@ 2017-02-01 13:56         ` James Greenhalgh
  2017-02-02  4:03           ` Hurugalawadi, Naveen
  1 sibling, 1 reply; 28+ messages in thread
From: James Greenhalgh @ 2017-02-01 13:56 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Hurugalawadi, Naveen, gcc-patches, Pinski, Andrew,
	Marcus Shawcroft, Richard Earnshaw, nd

On Tue, Dec 13, 2016 at 11:59:36AM +0000, Kyrill Tkachov wrote:
> Hi Naveen,
> 
> On 13/12/16 11:51, Hurugalawadi, Naveen wrote:
> >Hi Kyrill,
> >
> >Thanks for reviewing the patch and your useful comments.
> >
> >>>looks good to me if it has gone through the normal required
> >>>bootstrap and testing, but I can't approve.
> >Bootstrapped and Regression Tested on aarch64-thunderx-linux.
> >
> >>>The rest of the MD file uses the term AdvSIMD. Also, the instrurction
> >>>is CNT rather than "pop count".
> >Done.
> >
> >>>__builtin_popcount takes an unsigned int, so this should be
> >>>scanning for absence of popcountsi2 instead?
> >Done.
> >
> >Please find attached the modified patch as per review comments
> >and let me know if its okay for Stage-1 or current branch.
> 
> This looks much better, thanks.
> I still have a minor nit about the testcase.
> 
> +long
> +foo1 (int x)
> +{
> +  return __builtin_popcountl (x);
> +}
> 
> On ILP32 systems this would still use the SImode patterns, so I suggest you use __builtin_popcountll and
> an unsigned long long return type to ensure you always exercise the 64-bit code.
> 
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> 
> 
> This looks ok to me otherwise, but you'll need an ok from the aarch64 folk.

I didn't see a follow-up patch posted with Kyrill's comments addressed?

Thanks,
James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  Implement popcount pattern
  2017-02-01 13:56         ` [PATCH] [AArch64] Implement popcount pattern James Greenhalgh
@ 2017-02-02  4:03           ` Hurugalawadi, Naveen
  2017-02-02 11:56             ` James Greenhalgh
  0 siblings, 1 reply; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-02-02  4:03 UTC (permalink / raw)
  To: James Greenhalgh, Kyrill Tkachov
  Cc: gcc-patches, Pinski, Andrew, Marcus Shawcroft, Richard Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

Hi James and Kyrill,

Thanks for the review and comments on the patch.

>> On ILP32 systems this would still use the SImode patterns, 
>> so I suggest you use __builtin_popcountll and
>> an unsigned long long return type to ensure you always exercise the 64-bit code.

Sorry for not commenting on this part.
The issue is that code generates "__popcountdi2" for all the codes in testcase
for LP64 and ILP32 variants.
__builtin_popcount, __builtin_popcountl and __builtin_popcount.

Hence, modified the patch to check for "popcount".

Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.

Please find attached the modified patch and let us know if its okay?

Thanks,
Naveen

    

[-- Attachment #2: popcount-2.patch --]
[-- Type: application/octet-stream, Size: 1892 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a693a3b..684a833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3778,6 +3778,39 @@
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount<mode>2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(<MODE>mode == SImode)
+    {
+      rtx tmp;
+      tmp = gen_reg_rtx (DImode);
+      /* If we have SImode, zero extend to DImode, pop count does
+         not change if we have extra zeros. */
+      emit_insn (gen_zero_extendsidi2 (tmp, in));
+      in = tmp;
+    }
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi<mode>2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
         (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 0000000..3b8cb31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+unsigned long
+foo1 (int x)
+{
+  return __builtin_popcountl (x);
+}
+
+unsigned long long
+foo2 (int x)
+{
+  return __builtin_popcountll (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 3 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64]  Implement popcount pattern
  2017-02-02  4:03           ` Hurugalawadi, Naveen
@ 2017-02-02 11:56             ` James Greenhalgh
  2017-02-03  5:56               ` Andrew Pinski
  0 siblings, 1 reply; 28+ messages in thread
From: James Greenhalgh @ 2017-02-02 11:56 UTC (permalink / raw)
  To: Hurugalawadi, Naveen
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

On Thu, Feb 02, 2017 at 04:03:42AM +0000, Hurugalawadi, Naveen wrote:
> Hi James and Kyrill,
> 
> Thanks for the review and comments on the patch.
> 
> >> On ILP32 systems this would still use the SImode patterns, 
> >> so I suggest you use __builtin_popcountll and
> >> an unsigned long long return type to ensure you always exercise the 64-bit code.
> 
> Sorry for not commenting on this part.
> The issue is that code generates "__popcountdi2" for all the codes in testcase
> for LP64 and ILP32 variants.
> __builtin_popcount, __builtin_popcountl and __builtin_popcount.
> 
> Hence, modified the patch to check for "popcount".

I don't understand this comment and how it relates to your updated
patch (which still checks for CNT).

> Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.
> 
> Please find attached the modified patch and let us know if its okay?

Could you please clarify what you meant above? The patch looks fine (though
possibly not for Stage 4), but I'm not sure of your intent.

Thanks,
James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Implement popcount pattern
  2017-02-02 11:56             ` James Greenhalgh
@ 2017-02-03  5:56               ` Andrew Pinski
  2017-02-03  7:02                 ` Hurugalawadi, Naveen
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Pinski @ 2017-02-03  5:56 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Hurugalawadi, Naveen, Kyrill Tkachov, gcc-patches, Pinski,
	Andrew, Marcus Shawcroft, Richard Earnshaw, nd

On Thu, Feb 2, 2017 at 3:55 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Thu, Feb 02, 2017 at 04:03:42AM +0000, Hurugalawadi, Naveen wrote:
>> Hi James and Kyrill,
>>
>> Thanks for the review and comments on the patch.
>>
>> >> On ILP32 systems this would still use the SImode patterns,
>> >> so I suggest you use __builtin_popcountll and
>> >> an unsigned long long return type to ensure you always exercise the 64-bit code.
>>
>> Sorry for not commenting on this part.
>> The issue is that code generates "__popcountdi2" for all the codes in testcase
>> for LP64 and ILP32 variants.
>> __builtin_popcount, __builtin_popcountl and __builtin_popcount.
>>
>> Hence, modified the patch to check for "popcount".
>
> I don't understand this comment and how it relates to your updated
> patch (which still checks for CNT).

Now I understand Kyrill's comments as Naveen and I did not understand
before.  What Naveen was trying to test was that there was no longer
any call to __popcountdi2 or __popcountsi2 any more for all three:
__builtin_popcount, __builtin_popcountl and __builtin_popcountll.
Kyrill was asking him to change the test for __builtin_popcountl to
__builtin_popcountll even though there was already one test in that
file already.  Now of course what should change still is the argument
types to foo1/foo2 so they take long and long long respectively.  He
will post a new version of the patch soon.

>
>> Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.
>>
>> Please find attached the modified patch and let us know if its okay?
>
> Could you please clarify what you meant above? The patch looks fine (though
> possibly not for Stage 4), but I'm not sure of your intent.

What he meant was he tested on aarch64-linux-gnu with no regressions.
Is it ok for when stage 1 opens with the change of the testcase?

Thanks,
Andrew

>
> Thanks,
> James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Implement popcount pattern
  2017-02-03  5:56               ` Andrew Pinski
@ 2017-02-03  7:02                 ` Hurugalawadi, Naveen
  2017-02-03 10:28                   ` Kyrill Tkachov
  2017-02-04  4:04                   ` James Greenhalgh
  0 siblings, 2 replies; 28+ messages in thread
From: Hurugalawadi, Naveen @ 2017-02-03  7:02 UTC (permalink / raw)
  To: Andrew Pinski, James Greenhalgh
  Cc: Kyrill Tkachov, gcc-patches, Pinski, Andrew, Marcus Shawcroft,
	Richard Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 715 bytes --]

Hi Andrew,

Thanks for clearing the confusion.

> I don't understand this comment and how it relates to your updated patch

foo, foo1 and foo2 generates calls to "popcountdi2" which should have
been "popcountsi2" for foo1. When Kyrill commented on using the
popcountsi2; I was confused :).

Hence, the testcase generally checks for the absence of call to "popcount"
and the presence of "cnt" instruction instead.

>> Now of course what should change still is the argument 
>> types to foo1/foo2 

The arguments to foo1 and foo2 are modified as required.

Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.

Please let us know if its okay for stage 1?

Thanks,
Naveen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: popcnt-3.patch --]
[-- Type: text/x-diff; name="popcnt-3.patch", Size: 1872 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a693a3b..684a833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3778,6 +3778,39 @@
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount<mode>2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(<MODE>mode == SImode)
+    {
+      rtx tmp;
+      tmp = gen_reg_rtx (DImode);
+      /* If we have SImode, zero extend to DImode, pop count does
+         not change if we have extra zeros. */
+      emit_insn (gen_zero_extendsidi2 (tmp, in));
+      in = tmp;
+    }
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi<mode>2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
         (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 0000000..7e95796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+long
+foo1 (long x)
+{
+  return __builtin_popcountl (x);
+}
+
+long long
+foo2 (long long x)
+{
+  return __builtin_popcountll (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 3 } } */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Implement popcount pattern
  2017-02-03  7:02                 ` Hurugalawadi, Naveen
@ 2017-02-03 10:28                   ` Kyrill Tkachov
  2017-02-04  4:04                   ` James Greenhalgh
  1 sibling, 0 replies; 28+ messages in thread
From: Kyrill Tkachov @ 2017-02-03 10:28 UTC (permalink / raw)
  To: Hurugalawadi, Naveen, Andrew Pinski, James Greenhalgh
  Cc: gcc-patches, Pinski, Andrew, Marcus Shawcroft, Richard Earnshaw


On 03/02/17 06:57, Hurugalawadi, Naveen wrote:
> Hi Andrew,
>
> Thanks for clearing the confusion.
>
>> I don't understand this comment and how it relates to your updated patch
> foo, foo1 and foo2 generates calls to "popcountdi2" which should have
> been "popcountsi2" for foo1. When Kyrill commented on using the
> popcountsi2; I was confused :).
>
> Hence, the testcase generally checks for the absence of call to "popcount"
> and the presence of "cnt" instruction instead.
>

Sorry for the confusion, I should have been more explicit that I was talking
about the optab.

>>> Now of course what should change still is the argument
>>> types to foo1/foo2
> The arguments to foo1 and foo2 are modified as required.
>
> Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.
>
> Please let us know if its okay for stage 1?

This looks good to me, but you'll need James (or another aarch64 maintainer) to approve.

Thanks,
Kyrill

> Thanks,
> Naveen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] [AArch64] Implement popcount pattern
  2017-02-03  7:02                 ` Hurugalawadi, Naveen
  2017-02-03 10:28                   ` Kyrill Tkachov
@ 2017-02-04  4:04                   ` James Greenhalgh
  1 sibling, 0 replies; 28+ messages in thread
From: James Greenhalgh @ 2017-02-04  4:04 UTC (permalink / raw)
  To: Hurugalawadi, Naveen
  Cc: Andrew Pinski, Kyrill Tkachov, gcc-patches, Pinski, Andrew,
	Marcus Shawcroft, Richard Earnshaw, nd

On Fri, Feb 03, 2017 at 06:57:54AM +0000, Hurugalawadi, Naveen wrote:
> Hi Andrew,
> 
> Thanks for clearing the confusion.
> 
> > I don't understand this comment and how it relates to your updated patch
> 
> foo, foo1 and foo2 generates calls to "popcountdi2" which should have
> been "popcountsi2" for foo1. When Kyrill commented on using the
> popcountsi2; I was confused :).
> 
> Hence, the testcase generally checks for the absence of call to "popcount"
> and the presence of "cnt" instruction instead.
> 
> >> Now of course what should change still is the argument 
> >> types to foo1/foo2 
> 
> The arguments to foo1 and foo2 are modified as required.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.
> 
> Please let us know if its okay for stage 1?

This looks OK to me now. The change is self contained, looks safe, and
has only been held up for the testcase issue.

But, please give Richard/Marcus a reasonable time to object (say, end of day
Tuesday) before committing.

Thanks,
James

> 
> Thanks,
> Naveen

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a693a3b..684a833 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3778,6 +3778,39 @@
>    }
>  )
>  
> +;; Pop count be done via the "CNT" instruction in AdvSIMD.
> +;;
> +;; MOV	v.1d, x0
> +;; CNT	v1.8b, v.8b
> +;; ADDV b2, v1.8b
> +;; MOV	w0, v2.b[0]
> +
> +(define_expand "popcount<mode>2"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx v = gen_reg_rtx (V8QImode);
> +  rtx v1 = gen_reg_rtx (V8QImode);
> +  rtx r = gen_reg_rtx (QImode);
> +  rtx in = operands[1];
> +  rtx out = operands[0];
> +  if(<MODE>mode == SImode)
> +    {
> +      rtx tmp;
> +      tmp = gen_reg_rtx (DImode);
> +      /* If we have SImode, zero extend to DImode, pop count does
> +         not change if we have extra zeros. */
> +      emit_insn (gen_zero_extendsidi2 (tmp, in));
> +      in = tmp;
> +    }
> +  emit_move_insn (v, gen_lowpart (V8QImode, in));
> +  emit_insn (gen_popcountv8qi2 (v1, v));
> +  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
> +  emit_insn (gen_zero_extendqi<mode>2 (out, r));
> +  DONE;
> +})
> +
>  (define_insn "clrsb<mode>2"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>          (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> new file mode 100644
> index 0000000..7e95796
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int x)
> +{
> +  return __builtin_popcount (x);
> +}
> +
> +long
> +foo1 (long x)
> +{
> +  return __builtin_popcountl (x);
> +}
> +
> +long long
> +foo2 (long long x)
> +{
> +  return __builtin_popcountll (x);
> +}
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> +/* { dg-final { scan-assembler-times "cnt\t" 3 } } */


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-14  9:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-09  3:30 [PATCH] [AArch64] PR target/71663 Improve Vector Initializtion Hurugalawadi, Naveen
2016-12-09  7:02 ` Hurugalawadi, Naveen
2017-02-06  6:46   ` Hurugalawadi, Naveen
2017-04-25  7:03     ` [PING] " Hurugalawadi, Naveen
2017-04-25  8:37   ` Kyrill Tkachov
2017-04-26  9:04     ` Hurugalawadi, Naveen
2017-05-11  4:56       ` [PING] " Hurugalawadi, Naveen
2017-05-26  6:27         ` [PING 2] " Hurugalawadi, Naveen
2017-06-09 14:16       ` James Greenhalgh
2017-06-13 10:25         ` Hurugalawadi, Naveen
2017-06-13 13:56           ` James Greenhalgh
2017-06-14  8:53             ` Hurugalawadi, Naveen
2017-06-14  9:10               ` James Greenhalgh
2016-12-12  3:16 ` [PATCH] [AArch64] Implement popcount pattern Hurugalawadi, Naveen
2016-12-12 10:53   ` Kyrill Tkachov
2016-12-13 11:51     ` Hurugalawadi, Naveen
2016-12-13 11:59       ` Kyrill Tkachov
2017-01-25  8:07         ` [PATCH] [AArch64] Enable AES and cmp_branch fusion for Thunderx2t99 Hurugalawadi, Naveen
2017-01-25  9:29           ` Kyrill Tkachov
2017-02-02  5:03             ` Hurugalawadi, Naveen
2017-02-02 11:43               ` James Greenhalgh
2017-02-01 13:56         ` [PATCH] [AArch64] Implement popcount pattern James Greenhalgh
2017-02-02  4:03           ` Hurugalawadi, Naveen
2017-02-02 11:56             ` James Greenhalgh
2017-02-03  5:56               ` Andrew Pinski
2017-02-03  7:02                 ` Hurugalawadi, Naveen
2017-02-03 10:28                   ` Kyrill Tkachov
2017-02-04  4:04                   ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).