public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2]middle-end Teach CSE to be able to do vector extracts.
@ 2021-08-31 13:29 Tamar Christina
  2021-09-01 23:48 ` Jeff Law
  2021-09-03 10:26 ` Richard Sandiford
  0 siblings, 2 replies; 5+ messages in thread
From: Tamar Christina @ 2021-08-31 13:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 6125 bytes --]

Hi All,

This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include <stdint.h>
#include <arm_neon.h>

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
  uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
  uint64_t res = a | arr[0];
  uint64x2_t val = vld1q_u64 (arr);
  *rt = vaddq_u64 (val, b);
  return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
        adrp    x2, .LC0
        sub     sp, sp, #16
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x2, 16502
        movk    x2, 0x1023, lsl 16
        movk    x2, 0x4308, lsl 32
        add     v1.2d, v1.2d, v0.2d
        movk    x2, 0x942, lsl 48
        orr     x0, x0, x2
        str     q1, [x1]
        add     sp, sp, 16
        ret
.LC0:
        .xword  667169396713799798
        .xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

  <bb 2> :
  arr[0] = 667169396713799798;
  arr[1] = 667169396713799798;
  res_7 = a_6(D) | 667169396713799798;
  _16 = __builtin_aarch64_ld1v2di (&arr);
  _17 = VIEW_CONVERT_EXPR<uint64x2_t>(_16);
  _11 = b_10(D) + _17;
  *rt_12(D) = _11;
  arr ={v} {CLOBBER};
  return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
        (const_vector:V2DI [
                (const_int 667169396713799798 [0x942430810234076]) repeated x2
            ])) "cse.c":7:12 -1
     (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
        (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
     (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
        (ior:DI (reg/v:DI 96 [ a ])
            (reg:DI 103))) "cse.c":8:12 -1
     (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.

Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.

I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* cse.c (find_sets_in_insn): Register constants in sets.
	(cse_insn): Try materializing using vec_dup.

--- inline copy of patch -- 
diff --git a/gcc/cse.c b/gcc/cse.c
index 330c1e90ce05b8f95b58f24576ec93e10ec55d89..d76e01b6478e22e9dd5760b7c78cecb536d7daef 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "regs.h"
 #include "function-abi.h"
 #include "rtlanal.h"
+#include "expr.h"
 
 /* The basic idea of common subexpression elimination is to go
    through the code, keeping a record of expressions that would
@@ -4274,6 +4275,25 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
 	 someplace else, so it isn't worth cse'ing.  */
       else if (GET_CODE (SET_SRC (x)) == CALL)
 	;
+      else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
+	       && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
+	{
+	  /* First register the vector itself.  */
+	  sets[n_sets++].rtl = x;
+	  rtx src = SET_SRC (x);
+	  machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
+	  /* Go over the constants of the CONST_VECTOR in forward order, to
+	     put them in the same order in the SETS array.  */
+	  for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
+	    {
+	      /* These are templates and don't actually get emitted but are
+		 used to tell CSE how to get to a particular constant.  */
+	      rtx tmp = gen_rtx_PARALLEL (VOIDmode,
+					  gen_rtvec (1, GEN_INT (i)));
+	      rtx y = gen_rtx_VEC_SELECT (elem_mode, SET_DEST (x), tmp);
+	      sets[n_sets++].rtl = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
+	    }
+	}
       else
 	sets[n_sets++].rtl = x;
     }
@@ -4513,7 +4533,14 @@ cse_insn (rtx_insn *insn)
   struct set *sets = (struct set *) 0;
 
   if (GET_CODE (x) == SET)
-    sets = XALLOCA (struct set);
+    {
+      /* For CONST_VECTOR we wants to be able to CSE the vector itself along with
+	 elements inside the vector if the target says it's cheap.  */
+      if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
+	sets = XALLOCAVEC (struct set, const_vector_encoded_nelts (SET_SRC (x)) + 1);
+      else
+	sets = XALLOCA (struct set);
+    }
   else if (GET_CODE (x) == PARALLEL)
     sets = XALLOCAVEC (struct set, XVECLEN (x, 0));
 
@@ -4997,6 +5024,26 @@ cse_insn (rtx_insn *insn)
 	  src_related_is_const_anchor = src_related != NULL_RTX;
 	}
 
+      /* Try to re-materialize a vec_dup with an existing constant.   */
+      if (GET_CODE (src) == CONST_VECTOR
+	  && const_vector_encoded_nelts (src) == 1)
+	{
+	   rtx const_rtx = CONST_VECTOR_ELT (src, 0);
+	   machine_mode const_mode = GET_MODE_INNER (GET_MODE (src));
+	   struct table_elt *related_elt
+		= lookup (const_rtx, HASH (const_rtx, const_mode), const_mode);
+	   if (related_elt)
+	    {
+	      for (related_elt = related_elt->first_same_value;
+		   related_elt; related_elt = related_elt->next_same_value)
+		if (REG_P (related_elt->exp))
+		  {
+		    src_eqv_here
+			= gen_rtx_VEC_DUPLICATE (GET_MODE (src),
+						 related_elt->exp);
+		  }
+	    }
+	}
 
       if (src == src_folded)
 	src_folded = 0;


-- 

[-- Attachment #2: rb14773.patch --]
[-- Type: text/x-diff, Size: 2969 bytes --]

diff --git a/gcc/cse.c b/gcc/cse.c
index 330c1e90ce05b8f95b58f24576ec93e10ec55d89..d76e01b6478e22e9dd5760b7c78cecb536d7daef 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "regs.h"
 #include "function-abi.h"
 #include "rtlanal.h"
+#include "expr.h"
 
 /* The basic idea of common subexpression elimination is to go
    through the code, keeping a record of expressions that would
@@ -4274,6 +4275,25 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
 	 someplace else, so it isn't worth cse'ing.  */
       else if (GET_CODE (SET_SRC (x)) == CALL)
 	;
+      else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
+	       && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
+	{
+	  /* First register the vector itself.  */
+	  sets[n_sets++].rtl = x;
+	  rtx src = SET_SRC (x);
+	  machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
+	  /* Go over the constants of the CONST_VECTOR in forward order, to
+	     put them in the same order in the SETS array.  */
+	  for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
+	    {
+	      /* These are templates and don't actually get emitted but are
+		 used to tell CSE how to get to a particular constant.  */
+	      rtx tmp = gen_rtx_PARALLEL (VOIDmode,
+					  gen_rtvec (1, GEN_INT (i)));
+	      rtx y = gen_rtx_VEC_SELECT (elem_mode, SET_DEST (x), tmp);
+	      sets[n_sets++].rtl = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
+	    }
+	}
       else
 	sets[n_sets++].rtl = x;
     }
@@ -4513,7 +4533,14 @@ cse_insn (rtx_insn *insn)
   struct set *sets = (struct set *) 0;
 
   if (GET_CODE (x) == SET)
-    sets = XALLOCA (struct set);
+    {
+      /* For CONST_VECTOR we wants to be able to CSE the vector itself along with
+	 elements inside the vector if the target says it's cheap.  */
+      if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
+	sets = XALLOCAVEC (struct set, const_vector_encoded_nelts (SET_SRC (x)) + 1);
+      else
+	sets = XALLOCA (struct set);
+    }
   else if (GET_CODE (x) == PARALLEL)
     sets = XALLOCAVEC (struct set, XVECLEN (x, 0));
 
@@ -4997,6 +5024,26 @@ cse_insn (rtx_insn *insn)
 	  src_related_is_const_anchor = src_related != NULL_RTX;
 	}
 
+      /* Try to re-materialize a vec_dup with an existing constant.   */
+      if (GET_CODE (src) == CONST_VECTOR
+	  && const_vector_encoded_nelts (src) == 1)
+	{
+	   rtx const_rtx = CONST_VECTOR_ELT (src, 0);
+	   machine_mode const_mode = GET_MODE_INNER (GET_MODE (src));
+	   struct table_elt *related_elt
+		= lookup (const_rtx, HASH (const_rtx, const_mode), const_mode);
+	   if (related_elt)
+	    {
+	      for (related_elt = related_elt->first_same_value;
+		   related_elt; related_elt = related_elt->next_same_value)
+		if (REG_P (related_elt->exp))
+		  {
+		    src_eqv_here
+			= gen_rtx_VEC_DUPLICATE (GET_MODE (src),
+						 related_elt->exp);
+		  }
+	    }
+	}
 
       if (src == src_folded)
 	src_folded = 0;


^ permalink raw reply	[flat|nested] 5+ messages in thread
[parent not found: <VI1PR08MB5325CA9228123B678C0D8770FFB99@VI1PR08MB5325.eurprd08.prod.outlook.com>]

end of thread, other threads:[~2021-11-01  8:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 13:29 [PATCH 1/2]middle-end Teach CSE to be able to do vector extracts Tamar Christina
2021-09-01 23:48 ` Jeff Law
2021-09-03 10:26 ` Richard Sandiford
2021-09-08 12:34   ` Tamar Christina
     [not found] <VI1PR08MB5325CA9228123B678C0D8770FFB99@VI1PR08MB5325.eurprd08.prod.outlook.com>
     [not found] ` <VI1PR08MB5325B91D2383557AD7DFD4D3FF819@VI1PR08MB5325.eurprd08.prod.outlook.com>
     [not found]   ` <VI1PR08MB5325B96A92E4B412462AE628FF859@VI1PR08MB5325.eurprd08.prod.outlook.com>
     [not found]     ` <mpt1r43hlre.fsf@arm.com>
2021-11-01  8:43       ` [PATCH 1/2] middle-end " Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).