public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r13-1607] Improved Scalar-To-Vector (STV) support for TImode to V1TImode on x86_64.
@ 2022-07-11 15:06 Roger Sayle
  0 siblings, 0 replies; only message in thread
From: Roger Sayle @ 2022-07-11 15:06 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:c3ed9e0d6e96d8697e4bab994f8acbc5506240ee

commit r13-1607-gc3ed9e0d6e96d8697e4bab994f8acbc5506240ee
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Jul 11 16:04:46 2022 +0100

    Improved Scalar-To-Vector (STV) support for TImode to V1TImode on x86_64.
    
    This patch upgrades x86_64's scalar-to-vector (STV) pass to more
    aggressively transform 128-bit scalar TImode operations into vector
    V1TImode operations performed on SSE registers.  TImode functionality
    already exists in STV, but only for move operations.  This change
    brings support for logical operations (AND, IOR, XOR, NOT and ANDN)
    and comparisons.
    
    The effect of these changes are conveniently demonstrated by the new
    sse4_1-stv-5.c test case:
    
    __int128 a[16];
    __int128 b[16];
    __int128 c[16];
    
    void foo()
    {
      for (unsigned int i=0; i<16; i++)
        a[i] = b[i] & ~c[i];
    }
    
    which when currently compiled on mainline wtih -O2 -msse4 produces:
    
    foo:    xorl    %eax, %eax
    .L2:    movq    c(%rax), %rsi
            movq    c+8(%rax), %rdi
            addq    $16, %rax
            notq    %rsi
            notq    %rdi
            andq    b-16(%rax), %rsi
            andq    b-8(%rax), %rdi
            movq    %rsi, a-16(%rax)
            movq    %rdi, a-8(%rax)
            cmpq    $256, %rax
            jne     .L2
            ret
    
    but with this patch now produces:
    
    foo:    xorl    %eax, %eax
    .L2:    movdqa  c(%rax), %xmm0
            pandn   b(%rax), %xmm0
            addq    $16, %rax
            movaps  %xmm0, a-16(%rax)
            cmpq    $256, %rax
            jne     .L2
            ret
    
    Technically, the STV pass is implemented by three C++ classes, a common
    abstract base class "scalar_chain" that contains common functionality,
    and two derived classes: general_scalar_chain (which handles SI and
    DI modes) and timode_scalar_chain (which handles TI modes).  As
    mentioned previously, because only TI mode moves were handled the
    two worker classes behaved significantly differently.  These changes
    bring the functionality of these two classes closer together, which
    is reflected by refactoring more shared code from general_scalar_chain
    to the parent scalar_chain and reusing it from timode_scalar_chain.
    There still remain significant differences (and simplifications) so
    the existing division of classes (as specializations) continues to
    make sense.
    
    2022-07-11  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            * config/i386/i386-features.h (scalar_chain): Add fields
            insns_conv, n_sse_to_integer and n_integer_to_sse to this
            parent class, moved from general_scalar_chain.
            (scalar_chain::convert_compare): Protected method moved
            from general_scalar_chain.
            (mark_dual_mode_def): Make protected, not private virtual.
            (scalar_chain:convert_op): New private virtual method.
    
            (general_scalar_chain::general_scalar_chain): Simplify constructor.
            (general_scalar_chain::~general_scalar_chain): Delete destructor.
            (general_scalar_chain): Move insns_conv, n_sse_to_integer and
            n_integer_to_sse fields to parent class, scalar_chain.
            (general_scalar_chain::mark_dual_mode_def): Delete prototype.
            (general_scalar_chain::convert_compare): Delete prototype.
    
            (timode_scalar_chain::compute_convert_gain): Remove simplistic
            implementation, convert to a method prototype.
            (timode_scalar_chain::mark_dual_mode_def): Delete prototype.
            (timode_scalar_chain::convert_op): Prototype new virtual method.
    
            * config/i386/i386-features.cc (scalar_chain::scalar_chain):
            Allocate insns_conv and initialize n_sse_to_integer and
            n_integer_to_sse fields in constructor.
            (scalar_chain::scalar_chain): Free insns_conv in destructor.
    
            (general_scalar_chain::general_scalar_chain): Delete
            constructor, now defined in the class declaration.
            (general_scalar_chain::~general_scalar_chain): Delete destructor.
    
            (scalar_chain::mark_dual_mode_def): Renamed from
            general_scalar_chain::mark_dual_mode_def.
            (timode_scalar_chain::mark_dual_mode_def): Delete.
            (scalar_chain::convert_compare): Renamed from
            general_scalar_chain::convert_compare.
    
            (timode_scalar_chain::compute_convert_gain): New method to
            determine the gain from converting a TImode chain to V1TImode.
            (timode_scalar_chain::convert_op): New method to convert an
            operand from TImode to V1TImode.
    
            (timode_scalar_chain::convert_insn) <case REG>: Only PUT_MODE
            on REG_EQUAL notes that were originally TImode (not CONST_INT).
            Handle AND, ANDN, XOR, IOR, NOT and COMPARE.
            (timode_mem_p): Helper predicate to check where operand is
            memory reference with sufficient alignment for TImode STV.
            (timode_scalar_to_vector_candidate_p): Use convertible_comparison_p
            to check whether COMPARE is convertible.  Handle SET_DESTs that
            that are REG_P or MEM_P and SET_SRCs that are REG, CONST_INT,
            CONST_WIDE_INT, MEM, AND, ANDN, IOR, XOR or NOT.
    
    gcc/testsuite/ChangeLog
            * gcc.target/i386/sse4_1-stv-2.c: New test case, pand.
            * gcc.target/i386/sse4_1-stv-3.c: New test case, por.
            * gcc.target/i386/sse4_1-stv-4.c: New test case, pxor.
            * gcc.target/i386/sse4_1-stv-5.c: New test case, pandn.
            * gcc.target/i386/sse4_1-stv-6.c: New test case, ptest.

Diff:
---
 gcc/config/i386/i386-features.cc             | 330 +++++++++++++++++++++------
 gcc/config/i386/i386-features.h              |  31 +--
 gcc/testsuite/gcc.target/i386/sse4_1-stv-2.c |  14 ++
 gcc/testsuite/gcc.target/i386/sse4_1-stv-3.c |  14 ++
 gcc/testsuite/gcc.target/i386/sse4_1-stv-4.c |  14 ++
 gcc/testsuite/gcc.target/i386/sse4_1-stv-5.c |  14 ++
 gcc/testsuite/gcc.target/i386/sse4_1-stv-6.c |  15 ++
 7 files changed, 347 insertions(+), 85 deletions(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index a7bd17240ca..f1b03c3bfc6 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -291,7 +291,11 @@ scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode vmode_)
   insns = BITMAP_ALLOC (NULL);
   defs = BITMAP_ALLOC (NULL);
   defs_conv = BITMAP_ALLOC (NULL);
+  insns_conv = BITMAP_ALLOC (NULL);
   queue = NULL;
+
+  n_sse_to_integer = 0;
+  n_integer_to_sse = 0;
 }
 
 /* Free chain's data.  */
@@ -301,6 +305,7 @@ scalar_chain::~scalar_chain ()
   BITMAP_FREE (insns);
   BITMAP_FREE (defs);
   BITMAP_FREE (defs_conv);
+  BITMAP_FREE (insns_conv);
   bitmap_obstack_release (NULL);
 }
 
@@ -319,25 +324,11 @@ scalar_chain::add_to_queue (unsigned insn_uid)
   bitmap_set_bit (queue, insn_uid);
 }
 
-general_scalar_chain::general_scalar_chain (enum machine_mode smode_,
-					    enum machine_mode vmode_)
-     : scalar_chain (smode_, vmode_)
-{
-  insns_conv = BITMAP_ALLOC (NULL);
-  n_sse_to_integer = 0;
-  n_integer_to_sse = 0;
-}
-
-general_scalar_chain::~general_scalar_chain ()
-{
-  BITMAP_FREE (insns_conv);
-}
-
 /* For DImode conversion, mark register defined by DEF as requiring
    conversion.  */
 
 void
-general_scalar_chain::mark_dual_mode_def (df_ref def)
+scalar_chain::mark_dual_mode_def (df_ref def)
 {
   gcc_assert (DF_REF_REG_DEF_P (def));
 
@@ -364,14 +355,6 @@ general_scalar_chain::mark_dual_mode_def (df_ref def)
 	     DF_REF_REGNO (def), DF_REF_INSN_UID (def), chain_id);
 }
 
-/* For TImode conversion, it is unused.  */
-
-void
-timode_scalar_chain::mark_dual_mode_def (df_ref)
-{
-  gcc_unreachable ();
-}
-
 /* Check REF's chain to add new insns into a queue
    and find registers requiring conversion.  */
 
@@ -934,7 +917,7 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 /* Convert COMPARE to vector mode.  */
 
 rtx
-general_scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
+scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
 {
   rtx tmp = gen_reg_rtx (vmode);
   rtx src;
@@ -1068,7 +1051,7 @@ general_scalar_chain::convert_insn (rtx_insn *insn)
       emit_conversion_insns (gen_move_insn (dst, tmp), insn);
       dst = gen_rtx_SUBREG (vmode, tmp, 0);
     }
-  else if (REG_P (dst))
+  else if (REG_P (dst) && GET_MODE (dst) == smode)
     {
       /* Replace the definition with a SUBREG to the definition we
          use inside the chain.  */
@@ -1173,6 +1156,83 @@ general_scalar_chain::convert_insn (rtx_insn *insn)
   df_insn_rescan (insn);
 }
 
+/* Compute a gain for chain conversion.  */
+
+int
+timode_scalar_chain::compute_convert_gain ()
+{
+  /* Assume that if we have to move TImode values between units,
+     then transforming this chain isn't worth it.  */
+  if (n_sse_to_integer || n_integer_to_sse)
+    return -1;
+
+  bitmap_iterator bi;
+  unsigned insn_uid;
+
+  /* Split ties to prefer V1TImode when not optimizing for size.  */
+  int gain = optimize_size ? 0 : 1;
+
+  if (dump_file)
+    fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id);
+
+  EXECUTE_IF_SET_IN_BITMAP (insns, 0, insn_uid, bi)
+    {
+      rtx_insn *insn = DF_INSN_UID_GET (insn_uid)->insn;
+      rtx def_set = single_set (insn);
+      rtx src = SET_SRC (def_set);
+      rtx dst = SET_DEST (def_set);
+      int igain = 0;
+
+      switch (GET_CODE (src))
+	{
+	case REG:
+	  if (optimize_insn_for_size_p ())
+	    igain = MEM_P (dst) ? COSTS_N_BYTES (6) : COSTS_N_BYTES (3);
+	  else
+	    igain = COSTS_N_INSNS (1);
+	  break;
+
+	case MEM:
+	  igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (7)
+					      : COSTS_N_INSNS (1);
+	  break;
+
+	case CONST_INT:
+	  if (MEM_P (dst)
+	      && standard_sse_constant_p (src, V1TImode))
+	    igain = optimize_insn_for_size_p() ? COSTS_N_BYTES (11) : 1;
+	  break;
+
+	case NOT:
+	  if (MEM_P (dst))
+	    igain = -COSTS_N_INSNS (1);
+	  break;
+
+	case AND:
+	case XOR:
+	case IOR:
+	  if (!MEM_P (dst))
+	    igain = COSTS_N_INSNS (1);
+	  break;
+
+	default:
+	  break;
+	}
+
+      if (igain != 0 && dump_file)
+	{
+	  fprintf (dump_file, "  Instruction gain %d for ", igain);
+	  dump_insn_slim (dump_file, insn);
+	}
+      gain += igain;
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "  Total gain: %d\n", gain);
+
+  return gain;
+}
+
 /* Fix uses of converted REG in debug insns.  */
 
 void
@@ -1211,6 +1271,61 @@ timode_scalar_chain::fix_debug_reg_uses (rtx reg)
     }
 }
 
+/* Convert operand OP in INSN from TImode to V1TImode.  */
+
+void
+timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
+{
+  *op = copy_rtx_if_shared (*op);
+
+  if (REG_P (*op))
+    *op = gen_rtx_SUBREG (V1TImode, *op, 0);
+  else if (MEM_P (*op))
+    {
+      rtx tmp = gen_reg_rtx (V1TImode);
+      emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (V1TImode, tmp, 0),
+				     gen_gpr_to_xmm_move_src (V1TImode, *op)),
+			insn);
+      *op = gen_rtx_SUBREG (V1TImode, tmp, 0);
+
+      if (dump_file)
+	fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
+		 INSN_UID (insn), REGNO (tmp));
+    }
+  else if (CONST_INT_P (*op))
+    {
+      rtx vec_cst;
+      rtx tmp = gen_rtx_SUBREG (V1TImode, gen_reg_rtx (TImode), 0);
+
+      /* Prefer all ones vector in case of -1.  */
+      if (constm1_operand (*op, TImode))
+	vec_cst = CONSTM1_RTX (V1TImode);
+      else
+	{
+	  rtx *v = XALLOCAVEC (rtx, 1);
+	  v[0] = *op;
+	  vec_cst = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec_v (1, v));
+	}
+
+      if (!standard_sse_constant_p (vec_cst, V1TImode))
+	{
+	  start_sequence ();
+	  vec_cst = validize_mem (force_const_mem (V1TImode, vec_cst));
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+	  emit_insn_before (seq, insn);
+	}
+
+      emit_insn_before (gen_move_insn (copy_rtx (tmp), vec_cst), insn);
+      *op = tmp;
+    }
+  else
+    {
+      gcc_assert (SUBREG_P (*op));
+      gcc_assert (GET_MODE (*op) == vmode);
+    }
+}
+
 /* Convert INSN from TImode to V1T1mode.  */
 
 void
@@ -1219,17 +1334,24 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
   rtx def_set = single_set (insn);
   rtx src = SET_SRC (def_set);
   rtx dst = SET_DEST (def_set);
+  rtx tmp;
 
+  if (MEM_P (dst) && !REG_P (src))
+    {
+      /* There are no scalar integer instructions and therefore
+	 temporary register usage is required.  */
+    }
   switch (GET_CODE (dst))
     {
     case REG:
-      {
-	rtx tmp = find_reg_equal_equiv_note (insn);
-	if (tmp)
-	  PUT_MODE (XEXP (tmp, 0), V1TImode);
-	PUT_MODE (dst, V1TImode);
-	fix_debug_reg_uses (dst);
-      }
+      if (GET_MODE (dst) == TImode)
+	{
+	  tmp = find_reg_equal_equiv_note (insn);
+	  if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode)
+	    PUT_MODE (XEXP (tmp, 0), V1TImode);
+	  PUT_MODE (dst, V1TImode);
+	  fix_debug_reg_uses (dst);
+	}
       break;
     case MEM:
       PUT_MODE (dst, V1TImode);
@@ -1257,7 +1379,6 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
 	{
 	  /* Since there are no instructions to store 128-bit constant,
 	     temporary register usage is required.  */
-	  rtx tmp = gen_reg_rtx (V1TImode);
 	  start_sequence ();
 	  src = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src));
 	  src = validize_mem (force_const_mem (V1TImode, src));
@@ -1265,8 +1386,12 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
 	  end_sequence ();
 	  if (seq)
 	    emit_insn_before (seq, insn);
-	  emit_conversion_insns (gen_rtx_SET (dst, tmp), insn);
-	  dst = tmp;
+	  if (MEM_P (dst))
+	    {
+	      tmp = gen_reg_rtx (V1TImode);
+	      emit_insn_before (gen_rtx_SET (tmp, src), insn);
+	      src = tmp;
+	    }
 	}
       break;
 
@@ -1282,16 +1407,57 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
 	default:
 	  gcc_unreachable ();
 	}
-      if (NONDEBUG_INSN_P (insn))
+      if (MEM_P (dst))
+	{
+	  tmp = gen_reg_rtx (V1TImode);
+          emit_insn_before (gen_rtx_SET (tmp, src), insn);
+          src = tmp;
+	}
+      break;
+
+    case AND:
+      if (GET_CODE (XEXP (src, 0)) == NOT)
+	{
+	  convert_op (&XEXP (XEXP (src, 0), 0), insn);
+	  convert_op (&XEXP (src, 1), insn);
+	  PUT_MODE (XEXP (src, 0), V1TImode);
+	  PUT_MODE (src, V1TImode);
+	  break;
+	}
+      /* FALLTHRU */
+
+    case XOR:
+    case IOR:
+      convert_op (&XEXP (src, 0), insn);
+      convert_op (&XEXP (src, 1), insn);
+      PUT_MODE (src, V1TImode);
+      if (MEM_P (dst))
 	{
-	  rtx tmp = gen_reg_rtx (V1TImode);
-	  /* Since there are no instructions to store standard SSE
-	     constant, temporary register usage is required.  */
-	  emit_conversion_insns (gen_rtx_SET (dst, tmp), insn);
-	  dst = tmp;
+	  tmp = gen_reg_rtx (V1TImode);
+          emit_insn_before (gen_rtx_SET (tmp, src), insn);
+          src = tmp;
 	}
       break;
 
+    case NOT:
+      src = XEXP (src, 0);
+      convert_op (&src, insn);
+      tmp = gen_reg_rtx (V1TImode);
+      emit_insn_before (gen_move_insn (tmp, CONSTM1_RTX (V1TImode)), insn);
+      src = gen_rtx_XOR (V1TImode, src, tmp);
+      if (MEM_P (dst))
+	{
+	  tmp = gen_reg_rtx (V1TImode);
+          emit_insn_before (gen_rtx_SET (tmp, src), insn);
+          src = tmp;
+	}
+      break;
+
+    case COMPARE:
+      dst = gen_rtx_REG (CCmode, FLAGS_REG);
+      src = convert_compare (XEXP (src, 0), XEXP (src, 1), insn);
+      break;
+
     default:
       gcc_unreachable ();
     }
@@ -1551,6 +1717,16 @@ general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode mode)
   return true;
 }
 
+/* Check for a suitable TImode memory operand.  */
+
+static bool
+timode_mem_p (rtx x)
+{
+  return MEM_P (x)
+	 && (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
+	     || !misaligned_operand (x, TImode));
+}
+
 /* The TImode version of scalar_to_vector_candidate_p.  */
 
 static bool
@@ -1564,45 +1740,59 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn)
   rtx src = SET_SRC (def_set);
   rtx dst = SET_DEST (def_set);
 
-  /* Only TImode load and store are allowed.  */
-  if (GET_MODE (dst) != TImode)
+  if (GET_CODE (src) == COMPARE)
+    return convertible_comparison_p (insn, TImode);
+
+  if (GET_MODE (dst) != TImode
+      || (GET_MODE (src) != TImode
+          && !CONST_SCALAR_INT_P (src)))
     return false;
 
-  if (MEM_P (dst))
-    {
-      /* Check for store.  Memory must be aligned or unaligned store
-	 is optimal.  Only support store from register, standard SSE
-	 constant or CONST_WIDE_INT generated from piecewise store.
+  if (!REG_P (dst) && !MEM_P (dst))
+    return false;
 
-	 ??? Verify performance impact before enabling CONST_INT for
-	 __int128 store.  */
-      if (misaligned_operand (dst, TImode)
-	  && !TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
-	return false;
+  if (MEM_P (dst)
+      && misaligned_operand (dst, TImode)
+      && !TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
+    return false;
 
-      switch (GET_CODE (src))
-	{
-	default:
-	  return false;
+  switch (GET_CODE (src))
+    {
+    case REG:
+    case CONST_WIDE_INT:
+      return true;
 
-	case REG:
-	case CONST_WIDE_INT:
-	  return true;
+    case CONST_INT:
+      /* ??? Verify performance impact before enabling CONST_INT for
+	 __int128 store.  */
+      return standard_sse_constant_p (src, TImode);
 
-	case CONST_INT:
-	  return standard_sse_constant_p (src, TImode);
-	}
-    }
-  else if (MEM_P (src))
-    {
-      /* Check for load.  Memory must be aligned or unaligned load is
-	 optimal.  */
+    case MEM:
+      /* Memory must be aligned or unaligned load is optimal.  */
       return (REG_P (dst)
 	      && (!misaligned_operand (src, TImode)
 		  || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL));
-    }
 
-  return false;
+    case AND:
+      if (!MEM_P (dst)
+	  && GET_CODE (XEXP (src, 0)) == NOT
+	  && REG_P (XEXP (XEXP (src, 0), 0))
+	  && (REG_P (XEXP (src, 1)) || timode_mem_p (XEXP (src, 1))))
+	return true;
+      return REG_P (XEXP (src, 0))
+	     && (REG_P (XEXP (src, 1)) || timode_mem_p (XEXP (src, 1)));
+
+    case IOR:
+    case XOR:
+      return REG_P (XEXP (src, 0))
+	     && (REG_P (XEXP (src, 1)) || timode_mem_p (XEXP (src, 1)));
+
+    case NOT:
+      return REG_P (XEXP (src, 0)) || timode_mem_p (XEXP (src, 0));
+
+    default:
+      return false;
+    }
 }
 
 /* For a register REGNO, scan instructions for its defs and uses.
diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-features.h
index 839b63c569b..88b222debc9 100644
--- a/gcc/config/i386/i386-features.h
+++ b/gcc/config/i386/i386-features.h
@@ -148,6 +148,10 @@ class scalar_chain
   /* Registers used in both vector and sclar modes.  */
   bitmap defs_conv;
 
+  bitmap insns_conv;
+  unsigned n_sse_to_integer;
+  unsigned n_integer_to_sse;
+
   void build (bitmap candidates, unsigned insn_uid);
   virtual int compute_convert_gain () = 0;
   int convert ();
@@ -155,33 +159,31 @@ class scalar_chain
  protected:
   void add_to_queue (unsigned insn_uid);
   void emit_conversion_insns (rtx insns, rtx_insn *pos);
+  rtx convert_compare (rtx op1, rtx op2, rtx_insn *insn);
+  void mark_dual_mode_def (df_ref def);
 
  private:
   void add_insn (bitmap candidates, unsigned insn_uid);
   void analyze_register_chain (bitmap candidates, df_ref ref);
-  virtual void mark_dual_mode_def (df_ref def) = 0;
   virtual void convert_insn (rtx_insn *insn) = 0;
   virtual void convert_registers () = 0;
+  virtual void convert_op (rtx *op, rtx_insn *insn) = 0;
 };
 
 class general_scalar_chain : public scalar_chain
 {
  public:
-  general_scalar_chain (enum machine_mode smode_, enum machine_mode vmode_);
-  ~general_scalar_chain ();
+  general_scalar_chain (enum machine_mode smode_, enum machine_mode vmode_)
+    : scalar_chain (smode_, vmode_) {}
   int compute_convert_gain () final override;
+
  private:
   hash_map<rtx, rtx> defs_map;
-  bitmap insns_conv;
-  unsigned n_sse_to_integer;
-  unsigned n_integer_to_sse;
-  void mark_dual_mode_def (df_ref def) final override;
   void convert_insn (rtx_insn *insn) final override;
-  void convert_op (rtx *op, rtx_insn *insn);
   void convert_reg (rtx_insn *insn, rtx dst, rtx src);
+  void convert_op (rtx *op, rtx_insn *insn);
   void make_vector_copies (rtx_insn *, rtx);
   void convert_registers () final override;
-  rtx convert_compare (rtx op1, rtx op2, rtx_insn *insn);
   int vector_const_cost (rtx exp);
 };
 
@@ -189,21 +191,20 @@ class timode_scalar_chain : public scalar_chain
 {
  public:
   timode_scalar_chain () : scalar_chain (TImode, V1TImode) {}
-
-  /* Convert from TImode to V1TImode is always faster.  */
-  int compute_convert_gain () final override { return 1; }
+  int compute_convert_gain () final override;
 
  private:
-  void mark_dual_mode_def (df_ref def) final override;
   void fix_debug_reg_uses (rtx reg);
   void convert_insn (rtx_insn *insn) final override;
-  /* We don't convert registers to difference size.  */
+  void convert_op (rtx *op, rtx_insn *insn);
+  /* We don't convert registers to different size.  */
   void convert_registers () final override {}
 };
 
 } // anon namespace
 
-bool ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined);
+bool ix86_save_reg (unsigned int regno, bool maybe_eh_return,
+		    bool ignore_outlined);
 int ix86_compare_version_priority (tree decl1, tree decl2);
 tree ix86_generate_version_dispatcher_body (void *node_p);
 tree ix86_get_function_versions_dispatcher (void *decl);
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-2.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-2.c
new file mode 100644
index 00000000000..e637927b79f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+
+__int128 a[16];
+__int128 b[16];
+__int128 c[16];
+
+void foo()
+{
+  for (unsigned int i=0; i<16; i++)
+    a[i] = b[i] & c[i];
+}
+
+/* { dg-final { scan-assembler "pand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-3.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-3.c
new file mode 100644
index 00000000000..bdc0bac36d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+
+__int128 a[16];
+__int128 b[16];
+__int128 c[16];
+
+void foo()
+{
+  for (unsigned int i=0; i<16; i++)
+    a[i] = b[i] | c[i];
+}
+
+/* { dg-final { scan-assembler "por" } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-4.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-4.c
new file mode 100644
index 00000000000..a9ef61980a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+
+__int128 a[16];
+__int128 b[16];
+__int128 c[16];
+
+void foo()
+{
+  for (unsigned int i=0; i<16; i++)
+    a[i] = b[i] ^ c[i];
+}
+
+/* { dg-final { scan-assembler "pxor" } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-5.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-5.c
new file mode 100644
index 00000000000..7eff3c1726e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+
+__int128 a[16];
+__int128 b[16];
+__int128 c[16];
+
+void foo()
+{
+  for (unsigned int i=0; i<16; i++)
+    a[i] = b[i] & ~c[i];
+}
+
+/* { dg-final { scan-assembler "pandn" } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-6.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-6.c
new file mode 100644
index 00000000000..407bc551b64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+
+__int128 a[16];
+__int128 b[16];
+
+int foo()
+{
+  for (unsigned int i=0; i<16; i++)
+    if (a[i] == b[i])
+      return i;
+  return -1;
+}
+
+/* { dg-final { scan-assembler "ptest" } } */


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-07-11 15:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-11 15:06 [gcc r13-1607] Improved Scalar-To-Vector (STV) support for TImode to V1TImode on x86_64 Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).