From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id 0F8603858D37; Fri, 28 Apr 2023 17:57:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F8603858D37 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1682704664; bh=309l5cRayZX89EqdPeWVWe34d9mZtBZFzVDhcaATSxs=; h=From:To:Subject:Date:From; b=cln+cEIsZThv/3aU+zb3OBwPnlI2UYd76xtfKgfrm8T3p0WAAwo/SoFIzVF7jynY1 2l2VM80hibkHWYP9lR++FJPj3wpGxnAPTg4xxSA9bR2N30/AnO/37OQFeuBa4jyaCx PADuy/BKdOz19rMqnpwK7OCsGIlfhWEVhN3kDnzY= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work119)] Optimize vec_extract of V4SF from memory with constant element numbers. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work119 X-Git-Oldrev: d983d746c931a0ee8ae86f2f1407779a4fafc8f3 X-Git-Newrev: 984b341d78ddbc4ed3ad90dad7cb607edfa1fd12 Message-Id: <20230428175744.0F8603858D37@sourceware.org> Date: Fri, 28 Apr 2023 17:57:44 +0000 (GMT) List-Id: https://gcc.gnu.org/g:984b341d78ddbc4ed3ad90dad7cb607edfa1fd12 commit 984b341d78ddbc4ed3ad90dad7cb607edfa1fd12 Author: Michael Meissner Date: Fri Apr 28 13:57:19 2023 -0400 Optimize vec_extract of V4SF from memory with constant element numbers. This patch updates vec_extract of V4SF from memory with constant element numbers. This patch changes the splits so that they can be done before register allocation. This patch corrects the ISA for loading SF values to altivec registers to be power8 vector, and not power7. This patch adds a combiner patch to combine loading up a SF element and converting it to double. In order to do the splitting before register allocation, I modified the various vec_extract insns to allow the split to occur before register allocation. This patch goes through the support function rs6000_adjust_vec_address and the functions it calls to allow them to be called before register allocation. The places that take a scratch register will allocate a new pseudo register if they are passed a SCRATCH register. I also added a new predicate that checks if the operand is a normal memory address but not an Altivec vector addresses (i.e. with an AND -16). These addresses are used in power8 as part of the vector swap optimization. In the past, because we use the 'Q' constraint, ira/reload would handle the AND etc. so that the address was only a single register. 2023-04-28 Michael Meissner gcc/ * config/rs6000/predicates.md (non_altivec_memory_operand): New predicate. * config/rs6000/rs6000.cc (get_vector_offset): Allow function to be called before register allocation. (adjust_vec_address_pcrel): Likewise. (rs6000_adjust_vec_address): Likewise. * gcc/config/rs6000/vsx.md (vsx_extract_v4sf_load): Allow splitting before register allocation. Fix ISA for loading up SFmode values to traditional Altivec registers. Require that the memory being optimized does not use Altivec memory addresses. (vsx_extract_v4sf_load_to_df): New insn. gc/testsuite/ * gcc.target/powerpc/vec-extract-mem-float-1.c: New file. Diff: --- gcc/config/rs6000/predicates.md | 10 ++++ gcc/config/rs6000/rs6000.cc | 58 +++++++++++++++------- gcc/config/rs6000/vsx.md | 28 +++++++++-- .../gcc.target/powerpc/vec-extract-mem-float-1.c | 29 +++++++++++ 4 files changed, 104 insertions(+), 21 deletions(-) diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 52c65534e51..3b9265ef1c0 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -957,6 +957,16 @@ return memory_operand (op, mode); }) +;; Anything that matches memory_operand but does not match +;; altivec_indexed_or_indirect_operand. This used by vec_extract memory +;; optimizations. +(define_predicate "non_altivec_memory_operand" + (match_code "mem") +{ + return (memory_operand (op, mode) + && !altivec_indexed_or_indirect_operand (op, mode)); +}) + ;; Return 1 if the operand is a MEM with an indexed-form address. (define_special_predicate "indexed_address_mem" (match_test "(MEM_P (op) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 3be5860dd9b..332cb862f54 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -7686,9 +7686,13 @@ get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size) if (CONST_INT_P (element)) return GEN_INT (INTVAL (element) * scalar_size); - /* All insns should use the 'Q' constraint (address is a single register) if - the element number is not a constant. */ - gcc_assert (satisfies_constraint_Q (mem)); + if (GET_CODE (base_tmp) == SCRATCH) + base_tmp = gen_reg_rtx (Pmode); + + /* After register allocation, all insns should use the 'Q' constraint + (address is a single register) if the element number is not a + constant. */ + gcc_assert (can_create_pseudo_p () || satisfies_constraint_Q (mem)); /* Mask the element to make sure the element number is between 0 and the maximum number of elements - 1 so that we don't generate an address @@ -7704,6 +7708,9 @@ get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size) if (shift > 0) { rtx shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift)); + if (can_create_pseudo_p ()) + base_tmp = gen_reg_rtx (Pmode); + emit_insn (gen_rtx_SET (base_tmp, shift_op)); } @@ -7747,6 +7754,9 @@ adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp) else { + if (GET_CODE (base_tmp) == SCRATCH) + base_tmp = gen_reg_rtx (Pmode); + emit_move_insn (base_tmp, addr); new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } @@ -7769,9 +7779,8 @@ adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp) temporary (BASE_TMP) to fixup the address. Return the new memory address that is valid for reads or writes to a given register (SCALAR_REG). - This function is expected to be called after reload is completed when we are - splitting insns. The temporary BASE_TMP might be set multiple times with - this code. */ + The temporary BASE_TMP might be set multiple times with this code if this is + called after register allocation. */ rtx rs6000_adjust_vec_address (rtx scalar_reg, @@ -7784,8 +7793,11 @@ rs6000_adjust_vec_address (rtx scalar_reg, rtx addr = XEXP (mem, 0); rtx new_addr; - gcc_assert (!reg_mentioned_p (base_tmp, addr)); - gcc_assert (!reg_mentioned_p (base_tmp, element)); + if (GET_CODE (base_tmp) != SCRATCH) + { + gcc_assert (!reg_mentioned_p (base_tmp, addr)); + gcc_assert (!reg_mentioned_p (base_tmp, element)); + } /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY. */ gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC); @@ -7841,6 +7853,9 @@ rs6000_adjust_vec_address (rtx scalar_reg, offset, it has the benefit that if D-FORM instructions are allowed, the offset is part of the memory access to the vector element. */ + if (GET_CODE (base_tmp) == SCRATCH) + base_tmp = gen_reg_rtx (Pmode); + emit_insn (gen_rtx_SET (base_tmp, gen_rtx_PLUS (Pmode, op0, op1))); new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } @@ -7848,26 +7863,33 @@ rs6000_adjust_vec_address (rtx scalar_reg, else { - emit_move_insn (base_tmp, addr); + if (GET_CODE (base_tmp) == SCRATCH) + base_tmp = gen_reg_rtx (Pmode); + + emit_insn (gen_rtx_SET (base_tmp, addr)); new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } - /* If the address isn't valid, move the address into the temporary base - register. Some reasons it could not be valid include: + /* If register allocation has been done and the address isn't valid, move + the address into the temporary base register. Some reasons it could not + be valid include: The address offset overflowed the 16 or 34 bit offset size; We need to use a DS-FORM load, and the bottom 2 bits are non-zero; We need to use a DQ-FORM load, and the bottom 4 bits are non-zero; Only X_FORM loads can be done, and the address is D_FORM. */ - enum insn_form iform - = address_to_insn_form (new_addr, scalar_mode, - reg_to_non_prefixed (scalar_reg, scalar_mode)); - - if (iform == INSN_FORM_BAD) + if (!can_create_pseudo_p ()) { - emit_move_insn (base_tmp, new_addr); - new_addr = base_tmp; + enum insn_form iform + = address_to_insn_form (new_addr, scalar_mode, + reg_to_non_prefixed (scalar_reg, scalar_mode)); + + if (iform == INSN_FORM_BAD) + { + emit_move_insn (base_tmp, new_addr); + new_addr = base_tmp; + } } return change_address (mem, scalar_mode, new_addr); diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 417aff5e24b..ed4636f1e06 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3549,15 +3549,16 @@ [(set_attr "length" "8") (set_attr "type" "fp")]) +;; V4SF extract from memory with constant element number. (define_insn_and_split "*vsx_extract_v4sf_load" [(set (match_operand:SF 0 "register_operand" "=f,v,v,?r") (vec_select:SF - (match_operand:V4SF 1 "memory_operand" "m,Z,m,m") + (match_operand:V4SF 1 "non_altivec_memory_operand" "m,Z,m,m") (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")]))) (clobber (match_scratch:P 3 "=&b,&b,&b,&b"))] "VECTOR_MEM_VSX_P (V4SFmode)" "#" - "&& reload_completed" + "&& 1" [(set (match_dup 0) (match_dup 4))] { operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], @@ -3565,7 +3566,28 @@ } [(set_attr "type" "fpload,fpload,fpload,load") (set_attr "length" "8") - (set_attr "isa" "*,p7v,p9v,*")]) + (set_attr "isa" "*,p8v,p9v,*")]) + +;; V4SF extract from memory with constant element number and convert to DFmode. +(define_insn_and_split "*vsx_extract_v4sf_load_to_df" + [(set (match_operand:DF 0 "register_operand" "=f,v,v") + (float_extend:DF + (vec_select:SF + (match_operand:V4SF 1 "non_altivec_memory_operand" "m,Z,m") + (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n")])))) + (clobber (match_scratch:P 3 "=&b,&b,&b"))] + "VECTOR_MEM_VSX_P (V4SFmode)" + "#" + "&& 1" + [(set (match_dup 0) + (float_extend:DF (match_dup 4)))] +{ + operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], + operands[3], SFmode); +} + [(set_attr "type" "fpload") + (set_attr "length" "8") + (set_attr "isa" "*,p8v,p9v")]) ;; Variable V4SF extract from a register (define_insn_and_split "vsx_extract_v4sf_var" diff --git a/gcc/testsuite/gcc.target/powerpc/vec-extract-mem-float-1.c b/gcc/testsuite/gcc.target/powerpc/vec-extract-mem-float-1.c new file mode 100644 index 00000000000..4670e261ba8 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-extract-mem-float-1.c @@ -0,0 +1,29 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ + +/* Test to verify that the vec_extract with constant element numbers can load + float elements into a GPR register without doing a LFS/STFS. */ + +#include + +void +extract_v4sf_gpr_0 (vector float *p, float *q) +{ + float x = vec_extract (*p, 0); + __asm__ (" # %0" : "+r" (x)); /* lwz, no lfs/stfs. */ + *q = x; +} + +void +extract_v4sf_gpr_1 (vector float *p, float *q) +{ + float x = vec_extract (*p, 1); + __asm__ (" # %0" : "+r" (x)); /* lwz, no lfs/stfs. */ + *q = x; +} + +/* { dg-final { scan-assembler-times {\mlwzx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mlfsx?\M|\mlxsspx?\M} } } */ +/* { dg-final { scan-assembler-not {\mstfsx?\M|\mstxsspx?\M} } } */