[Patch combine] Don't create vector mode ZERO

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patch combine] Don't create vector mode ZERO_EXTEND from subregs
@ 2017-12-11 14:19 James Greenhalgh
  2017-12-11 15:46 ` [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful James Greenhalgh
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: James Greenhalgh @ 2017-12-11 14:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, law

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]


Hi,

In simplify_set we try transforming the paradoxical subreg expression:

  (set FOO (subreg:M (mem:N BAR) 0))

in to:

  (set FOO (zero_extend:M (mem:N BAR)))

However, this code does not consider the case where M is a vector
mode, allowing it to construct (for example):

  (zero_extend:V4SI (mem:SI))

This would clearly have the wrong semantics, but fortunately we fail long
before then in expand_compound_operation. As we really don't want a vector
zero_extend of a scalar value.

We need to explicitly reject vector modes from this transformation.

This fixes a failure I'm seeing on a branch in which I'm trying to
tackle some performance regressions, so I have no live testcase for
this, but it is wrong by observation.

Tested on aarch64-none-elf and bootstrapped on aarch64-none-linux-gnu with
no issues.

OK?

Thanks,
James

---
2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* combine.c (simplify_set): Do not transform subregs to zero_extends
	if the destination mode is a vector mode.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-combine-Don-t-create-vector-mode-ZERO_EXTEND-f.patch --]
[-- Type: text/x-patch; name="0001-Patch-combine-Don-t-create-vector-mode-ZERO_EXTEND-f.patch", Size: 789 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index 786a840..562eae6 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6962,11 +6962,13 @@ simplify_set (rtx x)
 
   /* If we have (set FOO (subreg:M (mem:N BAR) 0)) with M wider than N, this
      would require a paradoxical subreg.  Replace the subreg with a
-     zero_extend to avoid the reload that would otherwise be required.  */
+     zero_extend to avoid the reload that would otherwise be required.
+     Don't do this for vector modes, as the transformation is incorrect.  */
 
   enum rtx_code extend_op;
   if (paradoxical_subreg_p (src)
       && MEM_P (SUBREG_REG (src))
+      && !VECTOR_MODE_P (GET_MODE (src))
       && (extend_op = load_extend_op (GET_MODE (SUBREG_REG (src)))) != UNKNOWN)
     {
       SUBST (SET_SRC (x),

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
  2017-12-11 14:19 [Patch combine] Don't create vector mode ZERO_EXTEND from subregs James Greenhalgh
@ 2017-12-11 15:46 ` James Greenhalgh
  2017-12-18 23:37   ` Jeff Law
  2017-12-11 21:29 ` [Patch combine] Don't create vector mode ZERO_EXTEND from subregs Jeff Law
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: James Greenhalgh @ 2017-12-11 15:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, law, marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 2298 bytes --]


Hi,

In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.

This duplication can be clunky and wasteful.

Better would be to simply use the fact that we will always be overwriting
the remaining bits, and simply move the first element to the corrcet place
(implicitly zeroing all other bits).

This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.

Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.

Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.

We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
catch:

  (vec_merge:OUTER
     (vec_duplicate:OUTER x:INNER)
     (subreg:OUTER y:INNER 0)
     (const_int N))

And simplify it to:

  (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)

This is similar to the existing patterns which are tested in this function,
without requiring the second operand to also be a vec_duplicate.

Bootstrapped and tested on aarch64-none-linux-gnu and tested on
aarch64-none-elf.

Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
if we don't want to ICE creating broken vector zero extends.

Are the non-AArch64 parts OK?

Thanks,
James

---
2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
	generation for cases where splatting a value is not useful.
	* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
	across a vec_duplicate and a paradoxical subreg forming a vector
	mode to a vec_concat.

2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/aarch64/vect-slp-dup.c: New.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch --]
[-- Type: text/x-patch; name="0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch", Size: 4158 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 83d8607..8abb8e4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12105,9 +12105,51 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	    maxv = matches[i][1];
 	  }
 
-      /* Create a duplicate of the most common element.  */
-      rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-      aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+      /* Create a duplicate of the most common element, unless all elements
+	 are equally useless to us, in which case just immediately set the
+	 vector register using the first element.  */
+
+      if (maxv == 1)
+	{
+	  /* For vectors of two 64-bit elements, we can do even better.  */
+	  if (n_elts == 2
+	      && (inner_mode == E_DImode
+		  || inner_mode == E_DFmode))
+
+	    {
+	      rtx x0 = XVECEXP (vals, 0, 0);
+	      rtx x1 = XVECEXP (vals, 0, 1);
+	      /* Combine can pick up this case, but handling it directly
+		 here leaves clearer RTL.
+
+		 This is load_pair_lanes<mode>, and also gives us a clean-up
+		 for store_pair_lanes<mode>.  */
+	      if (memory_operand (x0, inner_mode)
+		  && memory_operand (x1, inner_mode)
+		  && !STRICT_ALIGNMENT
+		  && rtx_equal_p (XEXP (x1, 0),
+				  plus_constant (Pmode,
+						 XEXP (x0, 0),
+						 GET_MODE_SIZE (inner_mode))))
+		{
+		  rtx t;
+		  if (inner_mode == DFmode)
+		    t = gen_load_pair_lanesdf (target, x0, x1);
+		  else
+		    t = gen_load_pair_lanesdi (target, x0, x1);
+		  emit_insn (t);
+		  return;
+		}
+	    }
+	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
+	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
+	  maxelement = 0;
+	}
+      else
+	{
+	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+	}
 
       /* Insert the rest.  */
       for (int i = 0; i < n_elts; i++)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 806c309..ed16f70 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
 	    }
 
+	  /* Replace:
+
+	      (vec_merge:outer (vec_duplicate:outer x:inner)
+			       (subreg:outer y:inner 0)
+			       (const_int N))
+
+	     with (vec_concat:outer x:inner y:inner) if N == 1,
+	     or (vec_concat:outer y:inner x:inner) if N == 2.
+
+	     Implicitly, this means we have a paradoxical subreg, but such
+	     a check is cheap, so make it anyway.
+
+	     Only applies for vectors of two elements.  */
+	  if (GET_CODE (op0) == VEC_DUPLICATE
+	      && GET_CODE (op1) == SUBREG
+	      && GET_MODE (op1) == GET_MODE (op0)
+	      && GET_MODE (SUBREG_REG (op1)) == GET_MODE (XEXP (op0, 0))
+	      && paradoxical_subreg_p (op1)
+	      && SUBREG_BYTE (op1) == 0
+	      && GET_MODE_NUNITS (GET_MODE (op0)) == 2
+	      && GET_MODE_NUNITS (GET_MODE (op1)) == 2
+	      && IN_RANGE (sel, 1, 2))
+	    {
+	      rtx newop0 = XEXP (op0, 0);
+	      rtx newop1 = SUBREG_REG (op1);
+	      if (sel == 2)
+		std::swap (newop0, newop1);
+	      return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
+	    }
+
 	  /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y)
 				 (const_int n))
 	     with (vec_concat x y) or (vec_concat y x) depending on value
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c
new file mode 100644
index 0000000..0541e48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model" } */
+
+void bar (double);
+
+void
+foo (double *restrict in, double *restrict in2,
+     double *restrict out1, double *restrict out2)
+{
+  for (int i = 0; i < 1024; i++)
+    {
+      out1[i] = in[i] + 2.0 * in[i+128];
+      out1[i+1] = in[i+1] + 2.0 * in2[i];
+      bar (in[i]);
+    }
+}
+
+/* { dg-final { scan-assembler-not "dup\tv\[0-9\]+.2d, v\[0-9\]+" } } */
+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch combine] Don't create vector mode ZERO_EXTEND from subregs
  2017-12-11 14:19 [Patch combine] Don't create vector mode ZERO_EXTEND from subregs James Greenhalgh
  2017-12-11 15:46 ` [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful James Greenhalgh
@ 2017-12-11 21:29 ` Jeff Law
  2017-12-16 14:04 ` Marc Glisse
  2017-12-17  3:14 ` Segher Boessenkool
  3 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2017-12-11 21:29 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: nd

On 12/11/2017 07:18 AM, James Greenhalgh wrote:
> 
> Hi,
> 
> In simplify_set we try transforming the paradoxical subreg expression:
> 
>   (set FOO (subreg:M (mem:N BAR) 0))
> 
> in to:
> 
>   (set FOO (zero_extend:M (mem:N BAR)))
> 
> However, this code does not consider the case where M is a vector
> mode, allowing it to construct (for example):
> 
>   (zero_extend:V4SI (mem:SI))
> 
> This would clearly have the wrong semantics, but fortunately we fail long
> before then in expand_compound_operation. As we really don't want a vector
> zero_extend of a scalar value.
> 
> We need to explicitly reject vector modes from this transformation.
> 
> This fixes a failure I'm seeing on a branch in which I'm trying to
> tackle some performance regressions, so I have no live testcase for
> this, but it is wrong by observation.
> 
> Tested on aarch64-none-elf and bootstrapped on aarch64-none-linux-gnu with
> no issues.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* combine.c (simplify_set): Do not transform subregs to zero_extends
> 	if the destination mode is a vector mode.
> 
OK.  Ideally you'd have a test for the testsuite as well, but I won't
stress without it :-)


jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch combine] Don't create vector mode ZERO_EXTEND from subregs
  2017-12-11 14:19 [Patch combine] Don't create vector mode ZERO_EXTEND from subregs James Greenhalgh
  2017-12-11 15:46 ` [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful James Greenhalgh
  2017-12-11 21:29 ` [Patch combine] Don't create vector mode ZERO_EXTEND from subregs Jeff Law
@ 2017-12-16 14:04 ` Marc Glisse
  2017-12-17  3:14 ` Segher Boessenkool
  3 siblings, 0 replies; 8+ messages in thread
From: Marc Glisse @ 2017-12-16 14:04 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches, nd, law

I was happily going to close PR 55549, but it looks like we are trying the 
same transformation in a second place in that file :-(

On Mon, 11 Dec 2017, James Greenhalgh wrote:

> Hi,
>
> In simplify_set we try transforming the paradoxical subreg expression:
>
>  (set FOO (subreg:M (mem:N BAR) 0))
>
> in to:
>
>  (set FOO (zero_extend:M (mem:N BAR)))
>
> However, this code does not consider the case where M is a vector
> mode, allowing it to construct (for example):
>
>  (zero_extend:V4SI (mem:SI))
>
> This would clearly have the wrong semantics, but fortunately we fail long
> before then in expand_compound_operation. As we really don't want a vector
> zero_extend of a scalar value.
>
> We need to explicitly reject vector modes from this transformation.
>
> This fixes a failure I'm seeing on a branch in which I'm trying to
> tackle some performance regressions, so I have no live testcase for
> this, but it is wrong by observation.
>
> Tested on aarch64-none-elf and bootstrapped on aarch64-none-linux-gnu with
> no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* combine.c (simplify_set): Do not transform subregs to zero_extends
> 	if the destination mode is a vector mode.
>
>

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch combine] Don't create vector mode ZERO_EXTEND from subregs
  2017-12-11 14:19 [Patch combine] Don't create vector mode ZERO_EXTEND from subregs James Greenhalgh
                   ` (2 preceding siblings ...)
  2017-12-16 14:04 ` Marc Glisse
@ 2017-12-17  3:14 ` Segher Boessenkool
  2017-12-21 16:35   ` James Greenhalgh
  3 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2017-12-17  3:14 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches, nd, law

Hi!

On Mon, Dec 11, 2017 at 02:18:53PM +0000, James Greenhalgh wrote:
> 
> In simplify_set we try transforming the paradoxical subreg expression:
> 
>   (set FOO (subreg:M (mem:N BAR) 0))
> 
> in to:
> 
>   (set FOO (zero_extend:M (mem:N BAR)))
> 
> However, this code does not consider the case where M is a vector
> mode, allowing it to construct (for example):
> 
>   (zero_extend:V4SI (mem:SI))
> 
> This would clearly have the wrong semantics, but fortunately we fail long
> before then in expand_compound_operation. As we really don't want a vector
> zero_extend of a scalar value.
> 
> We need to explicitly reject vector modes from this transformation.

It does not consider any other modes either.  Both modes involved are
required to be SCALAR_INT_MODE_P (for zero_extend to be valid at all and
to be equivalent to the subreg); could you test for that instead please?


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
  2017-12-11 15:46 ` [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful James Greenhalgh
@ 2017-12-18 23:37   ` Jeff Law
  2018-01-03  9:55     ` Christophe Lyon
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2017-12-18 23:37 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: nd, marcus.shawcroft, richard.earnshaw

On 12/11/2017 08:44 AM, James Greenhalgh wrote:
> Hi,
> 
> In the testcase in this patch we create an SLP vector with only two
> elements. Our current vector initialisation code will first duplicate
> the first element to both lanes, then overwrite the top lane with a new
> value.
> 
> This duplication can be clunky and wasteful.
> 
> Better would be to simply use the fact that we will always be overwriting
> the remaining bits, and simply move the first element to the corrcet place
> (implicitly zeroing all other bits).
> 
> This reduces the code generation for this case, and can allow more
> efficient addressing modes, and other second order benefits for AArch64
> code which has been vectorized to V2DI mode.
> 
> Note that the change is generic enough to catch the case for any vector
> mode, but is expected to be most useful for 2x64-bit vectorization.
> 
> Unfortunately, on its own, this would cause failures in
> gcc.target/aarch64/load_v2vec_lanes_1.c and
> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> vec_merge and vec_duplicate for their simplifications to apply. To fix this,
> add a special case to the AArch64 code if we are loading from two memory
> addresses, and use the load_pair_lanes patterns directly.
> 
> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
> catch:
> 
>   (vec_merge:OUTER
>      (vec_duplicate:OUTER x:INNER)
>      (subreg:OUTER y:INNER 0)
>      (const_int N))
> 
> And simplify it to:
> 
>   (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
> 
> This is similar to the existing patterns which are tested in this function,
> without requiring the second operand to also be a vec_duplicate.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> aarch64-none-elf.
> 
> Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> if we don't want to ICE creating broken vector zero extends.
> 
> Are the non-AArch64 parts OK?
> 
> Thanks,
> James
> 
> ---
> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
> 	generation for cases where splatting a value is not useful.
> 	* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
> 	across a vec_duplicate and a paradoxical subreg forming a vector
> 	mode to a vec_concat.
> 
> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* gcc.target/aarch64/vect-slp-dup.c: New.
> 
> 
> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch
> 
> 

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 806c309..ed16f70 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
>  		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
>  	    }
>  
> +	  /* Replace:
> +
> +	      (vec_merge:outer (vec_duplicate:outer x:inner)
> +			       (subreg:outer y:inner 0)
> +			       (const_int N))
> +
> +	     with (vec_concat:outer x:inner y:inner) if N == 1,
> +	     or (vec_concat:outer y:inner x:inner) if N == 2.
> +
> +	     Implicitly, this means we have a paradoxical subreg, but such
> +	     a check is cheap, so make it anyway.
I'm going to assume that N == 1 and N == 3 are handled elsewhere and do
not show up here in practice.


So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG
show up in the opposite order?  Or is there some canonicalization that
prevents that?

simplify-rtx bits are OK as-is if we're certain we're not going to get
the alternate ordering of the VEC_MERGE operands.  ALso OK if you either
generalize this chunk of code or duplicate & twiddle it to handle the
alternate order.

I didn't look at the aarch64 specific bits.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch combine] Don't create vector mode ZERO_EXTEND from subregs
  2017-12-17  3:14 ` Segher Boessenkool
@ 2017-12-21 16:35   ` James Greenhalgh
  0 siblings, 0 replies; 8+ messages in thread
From: James Greenhalgh @ 2017-12-21 16:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, segher, law

[-- Attachment #1: Type: text/plain, Size: 1291 bytes --]


On Sun, Dec 17, 2017 at 03:14:08AM +0000, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Dec 11, 2017 at 02:18:53PM +0000, James Greenhalgh wrote:
> >
> > In simplify_set we try transforming the paradoxical subreg expression:
> >
> >   (set FOO (subreg:M (mem:N BAR) 0))
> >
> > in to:
> >
> >   (set FOO (zero_extend:M (mem:N BAR)))
> >
> > However, this code does not consider the case where M is a vector
> > mode, allowing it to construct (for example):
> >
> >   (zero_extend:V4SI (mem:SI))
> >
> > This would clearly have the wrong semantics, but fortunately we fail long
> > before then in expand_compound_operation. As we really don't want a vector
> > zero_extend of a scalar value.
> >
> > We need to explicitly reject vector modes from this transformation.
>
> It does not consider any other modes either.  Both modes involved are
> required to be SCALAR_INT_MODE_P (for zero_extend to be valid at all and
> to be equivalent to the subreg); could you test for that instead please?

Makes sense. I've committed this as obvious after an AArch64 bootstrap and
test as revision 255945.

Thanks,
James

---
gcc/

2017-12-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* combine.c (simplify_set): Do not transform subregs to zero_extends
	if the destination is not a scalar int mode.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Re-Patch-combine-Don-t-create-vector-mode-ZERO_EXTEN.patch --]
[-- Type: text/x-patch; name="0001-Re-Patch-combine-Don-t-create-vector-mode-ZERO_EXTEN.patch", Size: 867 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index 9f19ee4..91bed06 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6967,12 +6967,13 @@ simplify_set (rtx x)
   /* If we have (set FOO (subreg:M (mem:N BAR) 0)) with M wider than N, this
      would require a paradoxical subreg.  Replace the subreg with a
      zero_extend to avoid the reload that would otherwise be required.
-     Don't do this for vector modes, as the transformation is incorrect.  */
+     Don't do this unless we have a scalar integer mode, otherwise the
+     transformation is incorrect.  */
 
   enum rtx_code extend_op;
   if (paradoxical_subreg_p (src)
       && MEM_P (SUBREG_REG (src))
-      && !VECTOR_MODE_P (GET_MODE (src))
+      && SCALAR_INT_MODE_P (GET_MODE (src))
       && (extend_op = load_extend_op (GET_MODE (SUBREG_REG (src)))) != UNKNOWN)
     {
       SUBST (SET_SRC (x),

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
  2017-12-18 23:37   ` Jeff Law
@ 2018-01-03  9:55     ` Christophe Lyon
  0 siblings, 0 replies; 8+ messages in thread
From: Christophe Lyon @ 2018-01-03  9:55 UTC (permalink / raw)
  To: Jeff Law
  Cc: James Greenhalgh, gcc-patches, nd, Marcus Shawcroft, Richard Earnshaw

On 19 December 2017 at 00:36, Jeff Law <law@redhat.com> wrote:
> On 12/11/2017 08:44 AM, James Greenhalgh wrote:
>> Hi,
>>
>> In the testcase in this patch we create an SLP vector with only two
>> elements. Our current vector initialisation code will first duplicate
>> the first element to both lanes, then overwrite the top lane with a new
>> value.
>>
>> This duplication can be clunky and wasteful.
>>
>> Better would be to simply use the fact that we will always be overwriting
>> the remaining bits, and simply move the first element to the corrcet place
>> (implicitly zeroing all other bits).
>>
>> This reduces the code generation for this case, and can allow more
>> efficient addressing modes, and other second order benefits for AArch64
>> code which has been vectorized to V2DI mode.
>>
>> Note that the change is generic enough to catch the case for any vector
>> mode, but is expected to be most useful for 2x64-bit vectorization.
>>
>> Unfortunately, on its own, this would cause failures in
>> gcc.target/aarch64/load_v2vec_lanes_1.c and
>> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
>> vec_merge and vec_duplicate for their simplifications to apply. To fix this,
>> add a special case to the AArch64 code if we are loading from two memory
>> addresses, and use the load_pair_lanes patterns directly.
>>
>> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
>> catch:
>>
>>   (vec_merge:OUTER
>>      (vec_duplicate:OUTER x:INNER)
>>      (subreg:OUTER y:INNER 0)
>>      (const_int N))
>>
>> And simplify it to:
>>
>>   (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
>>
>> This is similar to the existing patterns which are tested in this function,
>> without requiring the second operand to also be a vec_duplicate.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
>> aarch64-none-elf.
>>
>> Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
>> if we don't want to ICE creating broken vector zero extends.
>>
>> Are the non-AArch64 parts OK?
>>
>> Thanks,
>> James
>>
>> ---
>> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
>>
>>       * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
>>       generation for cases where splatting a value is not useful.
>>       * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
>>       across a vec_duplicate and a paradoxical subreg forming a vector
>>       mode to a vec_concat.
>>
>> 2017-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
>>
>>       * gcc.target/aarch64/vect-slp-dup.c: New.
>>
>>
>> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch
>>
>>
>
>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>> index 806c309..ed16f70 100644
>> --- a/gcc/simplify-rtx.c
>> +++ b/gcc/simplify-rtx.c
>> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
>>               return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
>>           }
>>
>> +       /* Replace:
>> +
>> +           (vec_merge:outer (vec_duplicate:outer x:inner)
>> +                            (subreg:outer y:inner 0)
>> +                            (const_int N))
>> +
>> +          with (vec_concat:outer x:inner y:inner) if N == 1,
>> +          or (vec_concat:outer y:inner x:inner) if N == 2.
>> +
>> +          Implicitly, this means we have a paradoxical subreg, but such
>> +          a check is cheap, so make it anyway.
> I'm going to assume that N == 1 and N == 3 are handled elsewhere and do
> not show up here in practice.
>
>
> So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG
> show up in the opposite order?  Or is there some canonicalization that
> prevents that?
>
> simplify-rtx bits are OK as-is if we're certain we're not going to get
> the alternate ordering of the VEC_MERGE operands.  ALso OK if you either
> generalize this chunk of code or duplicate & twiddle it to handle the
> alternate order.
>
> I didn't look at the aarch64 specific bits.
>

Hi James,

Your patch (r255946) introduced regressions on aarch64_be:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255946/report-build-info.html

I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83663
to track this.

Thanks,

Christophe

> jeff
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-03  9:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-11 14:19 [Patch combine] Don't create vector mode ZERO_EXTEND from subregs James Greenhalgh
2017-12-11 15:46 ` [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful James Greenhalgh
2017-12-18 23:37   ` Jeff Law
2018-01-03  9:55     ` Christophe Lyon
2017-12-11 21:29 ` [Patch combine] Don't create vector mode ZERO_EXTEND from subregs Jeff Law
2017-12-16 14:04 ` Marc Glisse
2017-12-17  3:14 ` Segher Boessenkool
2017-12-21 16:35   ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).