From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <meissner@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1005)
	id 0F8603858D37; Fri, 28 Apr 2023 17:57:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F8603858D37
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1682704664;
	bh=309l5cRayZX89EqdPeWVWe34d9mZtBZFzVDhcaATSxs=;
	h=From:To:Subject:Date:From;
	b=cln+cEIsZThv/3aU+zb3OBwPnlI2UYd76xtfKgfrm8T3p0WAAwo/SoFIzVF7jynY1
	 2l2VM80hibkHWYP9lR++FJPj3wpGxnAPTg4xxSA9bR2N30/AnO/37OQFeuBa4jyaCx
	 PADuy/BKdOz19rMqnpwK7OCsGIlfhWEVhN3kDnzY=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Michael Meissner <meissner@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/users/meissner/heads/work119)] Optimize vec_extract of V4SF
 from memory with constant element numbers.
X-Act-Checkin: gcc
X-Git-Author: Michael Meissner <meissner@linux.ibm.com>
X-Git-Refname: refs/users/meissner/heads/work119
X-Git-Oldrev: d983d746c931a0ee8ae86f2f1407779a4fafc8f3
X-Git-Newrev: 984b341d78ddbc4ed3ad90dad7cb607edfa1fd12
Message-Id: <20230428175744.0F8603858D37@sourceware.org>
Date: Fri, 28 Apr 2023 17:57:44 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:984b341d78ddbc4ed3ad90dad7cb607edfa1fd12

commit 984b341d78ddbc4ed3ad90dad7cb607edfa1fd12
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Apr 28 13:57:19 2023 -0400

    Optimize vec_extract of V4SF from memory with constant element numbers.
    
    This patch updates vec_extract of V4SF from memory with constant element
    numbers.
    
    This patch changes the splits so that they can be done before register
    allocation.
    
    This patch corrects the ISA for loading SF values to altivec registers to be
    power8 vector, and not power7.
    
    This patch adds a combiner patch to combine loading up a SF element and
    converting it to double.
    
    In order to do the splitting before register allocation, I modified the various
    vec_extract insns to allow the split to occur before register allocation.  This
    patch goes through the support function rs6000_adjust_vec_address and the
    functions it calls to allow them to be called before register allocation.  The
    places that take a scratch register will allocate a new pseudo register if they
    are passed a SCRATCH register.
    
    I also added a new predicate that checks if the operand is a normal memory
    address but not an Altivec vector addresses (i.e. with an AND -16).  These
    addresses are used in power8 as part of the vector swap optimization.  In the
    past, because we use the 'Q' constraint, ira/reload would handle the AND
    etc. so that the address was only a single register.
    
    2023-04-28   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (non_altivec_memory_operand): New
            predicate.
            * config/rs6000/rs6000.cc (get_vector_offset): Allow function to be
            called before register allocation.
            (adjust_vec_address_pcrel): Likewise.
            (rs6000_adjust_vec_address): Likewise.
            * gcc/config/rs6000/vsx.md (vsx_extract_v4sf_load): Allow splitting
            before register allocation.  Fix ISA for loading up SFmode values to
            traditional Altivec registers.  Require that the memory being optimized
            does not use Altivec memory addresses.
            (vsx_extract_v4sf_load_to_df): New insn.
    
    gc/testsuite/
    
            * gcc.target/powerpc/vec-extract-mem-float-1.c: New file.

Diff:
---
 gcc/config/rs6000/predicates.md                    | 10 ++++
 gcc/config/rs6000/rs6000.cc                        | 58 +++++++++++++++-------
 gcc/config/rs6000/vsx.md                           | 28 +++++++++--
 .../gcc.target/powerpc/vec-extract-mem-float-1.c   | 29 +++++++++++
 4 files changed, 104 insertions(+), 21 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 52c65534e51..3b9265ef1c0 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -957,6 +957,16 @@
   return memory_operand (op, mode);
 })
 
+;; Anything that matches memory_operand but does not match
+;; altivec_indexed_or_indirect_operand.  This used by vec_extract memory
+;; optimizations.
+(define_predicate "non_altivec_memory_operand"
+  (match_code "mem")
+{
+  return (memory_operand (op, mode)
+	  && !altivec_indexed_or_indirect_operand (op, mode));
+})
+
 ;; Return 1 if the operand is a MEM with an indexed-form address.
 (define_special_predicate "indexed_address_mem"
   (match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3be5860dd9b..332cb862f54 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -7686,9 +7686,13 @@ get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size)
   if (CONST_INT_P (element))
     return GEN_INT (INTVAL (element) * scalar_size);
 
-  /* All insns should use the 'Q' constraint (address is a single register) if
-     the element number is not a constant.  */
-  gcc_assert (satisfies_constraint_Q (mem));
+  if (GET_CODE (base_tmp) == SCRATCH)
+    base_tmp = gen_reg_rtx (Pmode);
+
+  /* After register allocation, all insns should use the 'Q' constraint
+     (address is a single register) if the element number is not a
+     constant.  */
+  gcc_assert (can_create_pseudo_p () || satisfies_constraint_Q (mem));
 
   /* Mask the element to make sure the element number is between 0 and the
      maximum number of elements - 1 so that we don't generate an address
@@ -7704,6 +7708,9 @@ get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size)
   if (shift > 0)
     {
       rtx shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift));
+      if (can_create_pseudo_p ())
+	base_tmp = gen_reg_rtx (Pmode);
+
       emit_insn (gen_rtx_SET (base_tmp, shift_op));
     }
 
@@ -7747,6 +7754,9 @@ adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp)
 
       else
 	{
+	  if (GET_CODE (base_tmp) == SCRATCH)
+	    base_tmp = gen_reg_rtx (Pmode);
+
 	  emit_move_insn (base_tmp, addr);
 	  new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
 	}
@@ -7769,9 +7779,8 @@ adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp)
    temporary (BASE_TMP) to fixup the address.  Return the new memory address
    that is valid for reads or writes to a given register (SCALAR_REG).
 
-   This function is expected to be called after reload is completed when we are
-   splitting insns.  The temporary BASE_TMP might be set multiple times with
-   this code.  */
+   The temporary BASE_TMP might be set multiple times with this code if this is
+   called after register allocation.  */
 
 rtx
 rs6000_adjust_vec_address (rtx scalar_reg,
@@ -7784,8 +7793,11 @@ rs6000_adjust_vec_address (rtx scalar_reg,
   rtx addr = XEXP (mem, 0);
   rtx new_addr;
 
-  gcc_assert (!reg_mentioned_p (base_tmp, addr));
-  gcc_assert (!reg_mentioned_p (base_tmp, element));
+  if (GET_CODE (base_tmp) != SCRATCH)
+    {
+      gcc_assert (!reg_mentioned_p (base_tmp, addr));
+      gcc_assert (!reg_mentioned_p (base_tmp, element));
+    }
 
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
@@ -7841,6 +7853,9 @@ rs6000_adjust_vec_address (rtx scalar_reg,
 	     offset, it has the benefit that if D-FORM instructions are
 	     allowed, the offset is part of the memory access to the vector
 	     element. */
+	  if (GET_CODE (base_tmp) == SCRATCH)
+	    base_tmp = gen_reg_rtx (Pmode);
+
 	  emit_insn (gen_rtx_SET (base_tmp, gen_rtx_PLUS (Pmode, op0, op1)));
 	  new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
 	}
@@ -7848,26 +7863,33 @@ rs6000_adjust_vec_address (rtx scalar_reg,
 
   else
     {
-      emit_move_insn (base_tmp, addr);
+      if (GET_CODE (base_tmp) == SCRATCH)
+	base_tmp = gen_reg_rtx (Pmode);
+
+      emit_insn (gen_rtx_SET (base_tmp, addr));
       new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
     }
 
-    /* If the address isn't valid, move the address into the temporary base
-       register.  Some reasons it could not be valid include:
+    /* If register allocation has been done and the address isn't valid, move
+       the address into the temporary base register.  Some reasons it could not
+       be valid include:
 
        The address offset overflowed the 16 or 34 bit offset size;
        We need to use a DS-FORM load, and the bottom 2 bits are non-zero;
        We need to use a DQ-FORM load, and the bottom 4 bits are non-zero;
        Only X_FORM loads can be done, and the address is D_FORM.  */
 
-  enum insn_form iform
-    = address_to_insn_form (new_addr, scalar_mode,
-			    reg_to_non_prefixed (scalar_reg, scalar_mode));
-
-  if (iform == INSN_FORM_BAD)
+  if (!can_create_pseudo_p ())
     {
-      emit_move_insn (base_tmp, new_addr);
-      new_addr = base_tmp;
+      enum insn_form iform
+	= address_to_insn_form (new_addr, scalar_mode,
+				reg_to_non_prefixed (scalar_reg, scalar_mode));
+
+      if (iform == INSN_FORM_BAD)
+	{
+	  emit_move_insn (base_tmp, new_addr);
+	  new_addr = base_tmp;
+	}
     }
 
   return change_address (mem, scalar_mode, new_addr);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 417aff5e24b..ed4636f1e06 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3549,15 +3549,16 @@
   [(set_attr "length" "8")
    (set_attr "type" "fp")])
 
+;; V4SF extract from memory with constant element number.
 (define_insn_and_split "*vsx_extract_v4sf_load"
   [(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
 	(vec_select:SF
-	 (match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
+	 (match_operand:V4SF 1 "non_altivec_memory_operand" "m,Z,m,m")
 	 (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
    (clobber (match_scratch:P 3 "=&b,&b,&b,&b"))]
   "VECTOR_MEM_VSX_P (V4SFmode)"
   "#"
-  "&& reload_completed"
+  "&& 1"
   [(set (match_dup 0) (match_dup 4))]
 {
   operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
@@ -3565,7 +3566,28 @@
 }
   [(set_attr "type" "fpload,fpload,fpload,load")
    (set_attr "length" "8")
-   (set_attr "isa" "*,p7v,p9v,*")])
+   (set_attr "isa" "*,p8v,p9v,*")])
+
+;; V4SF extract from memory with constant element number and convert to DFmode.
+(define_insn_and_split "*vsx_extract_v4sf_load_to_df"
+  [(set (match_operand:DF 0 "register_operand" "=f,v,v")
+	(float_extend:DF
+	 (vec_select:SF
+	  (match_operand:V4SF 1 "non_altivec_memory_operand" "m,Z,m")
+	  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n")]))))
+   (clobber (match_scratch:P 3 "=&b,&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SFmode)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(float_extend:DF (match_dup 4)))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], SFmode);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")
+   (set_attr "isa" "*,p8v,p9v")])
 
 ;; Variable V4SF extract from a register
 (define_insn_and_split "vsx_extract_v4sf_var"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-extract-mem-float-1.c b/gcc/testsuite/gcc.target/powerpc/vec-extract-mem-float-1.c
new file mode 100644
index 00000000000..4670e261ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-extract-mem-float-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
+
+/* Test to verify that the vec_extract with constant element numbers can load
+   float elements into a GPR register without doing a LFS/STFS.  */
+
+#include <altivec.h>
+
+void
+extract_v4sf_gpr_0 (vector float *p, float *q)
+{
+  float x = vec_extract (*p, 0);
+  __asm__ (" # %0" : "+r" (x));		/* lwz, no lfs/stfs.  */
+  *q = x;
+}
+
+void
+extract_v4sf_gpr_1 (vector float *p, float *q)
+{
+  float x = vec_extract (*p, 1);
+  __asm__ (" # %0" : "+r" (x));		/* lwz, no lfs/stfs.  */
+  *q = x;
+}
+
+/* { dg-final { scan-assembler-times {\mlwzx?\M}               2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}                 2 } } */
+/* { dg-final { scan-assembler-not   {\mlfsx?\M|\mlxsspx?\M}     } } */
+/* { dg-final { scan-assembler-not   {\mstfsx?\M|\mstxsspx?\M}   } } */