From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6D1253857C6C for ; Fri, 9 Jul 2021 23:14:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6D1253857C6C Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 169N33EY151634; Fri, 9 Jul 2021 19:14:53 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 39phqm69q9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 09 Jul 2021 19:14:53 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 169N8ARE006450; Fri, 9 Jul 2021 23:14:52 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma04dal.us.ibm.com with ESMTP id 39jfhfuyhv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 09 Jul 2021 23:14:52 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 169NEpXh40763754 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 9 Jul 2021 23:14:51 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3AE4EAE062; Fri, 9 Jul 2021 23:14:51 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 80E4BAE060; Fri, 9 Jul 2021 23:14:50 +0000 (GMT) Received: from [9.65.201.128] (unknown [9.65.201.128]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 9 Jul 2021 23:14:50 +0000 (GMT) Subject: Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions From: Peter Bergner To: Segher Boessenkool Cc: GCC Patches References: <20210708232836.GT1583@gate.crashing.org> <680c1d6a-0662-f609-f0b5-2547011ea4b6@linux.ibm.com> Message-ID: <43d74cc8-2019-8173-7bdb-110bb5dc3e29@linux.ibm.com> Date: Fri, 9 Jul 2021 18:14:49 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <680c1d6a-0662-f609-f0b5-2547011ea4b6@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 2_S8LnE9ewkZVSA2WqI_c7Q22-yc-t_A X-Proofpoint-GUID: 2_S8LnE9ewkZVSA2WqI_c7Q22-yc-t_A X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-09_18:2021-07-09, 2021-07-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 lowpriorityscore=0 adultscore=0 suspectscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 phishscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107090118 X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jul 2021 23:14:57 -0000 On 7/8/21 8:26 PM, Peter Bergner wrote: > We do need different code for LE versus BE. So you want something like > > if (WORDS_BIG_ENDIAN) {...} else {...} > > ...instead? I can try that to see if the code is easier to read. [snip] > Let me make the changes you want and I'll repost with what I come up with. Ok, I removed the consecutive_mem_locations() function from the previous patch and just call adjacent_mem_locations() directly now. I also moved rs6000_split_multireg_move() to later in the file to fix the declaration issue. However, since rs6000_split_multireg_move() is where the new code was added to emit the lxvp's, it can be hard to see what I changed because of the move. I'll note that all of my changes are restrictd to within the if (GET_CODE (src) == UNSPEC) { gcc_assert (XINT (src, 1) == UNSPEC_MMA_ASSEMBLE); ... } ...code section. Does this look better? I'm currently running bootstraps and regtests on LE and BE. Peter gcc/ * config/rs6000/rs6000.c (rs6000_split_multireg_move): Move to later in the file. Handle MMA build built-ins with operands in adjacent memory locations. (adjacent_mem_locations): Test that MEM1 and MEM2 are MEMs. Return the lower addressed memory rtx, if any. (power6_sched_reorder2): Update for adjacent_mem_locations change. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-9.c: New test. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 9a5db63d0ef..8edf7a4a81c 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -16690,382 +16690,6 @@ rs6000_expand_atomic_op (enum rtx_code code, rtx mem, rtx val, emit_move_insn (orig_after, after); } -/* Emit instructions to move SRC to DST. Called by splitters for - multi-register moves. It will emit at most one instruction for - each register that is accessed; that is, it won't emit li/lis pairs - (or equivalent for 64-bit code). One of SRC or DST must be a hard - register. */ - -void -rs6000_split_multireg_move (rtx dst, rtx src) -{ - /* The register number of the first register being moved. */ - int reg; - /* The mode that is to be moved. */ - machine_mode mode; - /* The mode that the move is being done in, and its size. */ - machine_mode reg_mode; - int reg_mode_size; - /* The number of registers that will be moved. */ - int nregs; - - reg = REG_P (dst) ? REGNO (dst) : REGNO (src); - mode = GET_MODE (dst); - nregs = hard_regno_nregs (reg, mode); - - /* If we have a vector quad register for MMA, and this is a load or store, - see if we can use vector paired load/stores. */ - if (mode == XOmode && TARGET_MMA - && (MEM_P (dst) || MEM_P (src))) - { - reg_mode = OOmode; - nregs /= 2; - } - /* If we have a vector pair/quad mode, split it into two/four separate - vectors. */ - else if (mode == OOmode || mode == XOmode) - reg_mode = V1TImode; - else if (FP_REGNO_P (reg)) - reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : - (TARGET_HARD_FLOAT ? DFmode : SFmode); - else if (ALTIVEC_REGNO_P (reg)) - reg_mode = V16QImode; - else - reg_mode = word_mode; - reg_mode_size = GET_MODE_SIZE (reg_mode); - - gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode)); - - /* TDmode residing in FP registers is special, since the ISA requires that - the lower-numbered word of a register pair is always the most significant - word, even in little-endian mode. This does not match the usual subreg - semantics, so we cannnot use simplify_gen_subreg in those cases. Access - the appropriate constituent registers "by hand" in little-endian mode. - - Note we do not need to check for destructive overlap here since TDmode - can only reside in even/odd register pairs. */ - if (FP_REGNO_P (reg) && DECIMAL_FLOAT_MODE_P (mode) && !BYTES_BIG_ENDIAN) - { - rtx p_src, p_dst; - int i; - - for (i = 0; i < nregs; i++) - { - if (REG_P (src) && FP_REGNO_P (REGNO (src))) - p_src = gen_rtx_REG (reg_mode, REGNO (src) + nregs - 1 - i); - else - p_src = simplify_gen_subreg (reg_mode, src, mode, - i * reg_mode_size); - - if (REG_P (dst) && FP_REGNO_P (REGNO (dst))) - p_dst = gen_rtx_REG (reg_mode, REGNO (dst) + nregs - 1 - i); - else - p_dst = simplify_gen_subreg (reg_mode, dst, mode, - i * reg_mode_size); - - emit_insn (gen_rtx_SET (p_dst, p_src)); - } - - return; - } - - /* The __vector_pair and __vector_quad modes are multi-register - modes, so if we have to load or store the registers, we have to be - careful to properly swap them if we're in little endian mode - below. This means the last register gets the first memory - location. We also need to be careful of using the right register - numbers if we are splitting XO to OO. */ - if (mode == OOmode || mode == XOmode) - { - nregs = hard_regno_nregs (reg, mode); - int reg_mode_nregs = hard_regno_nregs (reg, reg_mode); - if (MEM_P (dst)) - { - unsigned offset = 0; - unsigned size = GET_MODE_SIZE (reg_mode); - - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA - && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) - emit_insn (gen_mma_xxmfacc (src, src)); - - for (int i = 0; i < nregs; i += reg_mode_nregs) - { - unsigned subreg = - (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); - rtx dst2 = adjust_address (dst, reg_mode, offset); - rtx src2 = gen_rtx_REG (reg_mode, reg + subreg); - offset += size; - emit_insn (gen_rtx_SET (dst2, src2)); - } - - return; - } - - if (MEM_P (src)) - { - unsigned offset = 0; - unsigned size = GET_MODE_SIZE (reg_mode); - - for (int i = 0; i < nregs; i += reg_mode_nregs) - { - unsigned subreg = - (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); - rtx dst2 = gen_rtx_REG (reg_mode, reg + subreg); - rtx src2 = adjust_address (src, reg_mode, offset); - offset += size; - emit_insn (gen_rtx_SET (dst2, src2)); - } - - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA - && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) - emit_insn (gen_mma_xxmtacc (dst, dst)); - - return; - } - - if (GET_CODE (src) == UNSPEC) - { - gcc_assert (XINT (src, 1) == UNSPEC_MMA_ASSEMBLE); - gcc_assert (REG_P (dst)); - if (GET_MODE (src) == XOmode) - gcc_assert (FP_REGNO_P (REGNO (dst))); - if (GET_MODE (src) == OOmode) - gcc_assert (VSX_REGNO_P (REGNO (dst))); - - reg_mode = GET_MODE (XVECEXP (src, 0, 0)); - int nvecs = XVECLEN (src, 0); - for (int i = 0; i < nvecs; i++) - { - int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; - rtx dst_i = gen_rtx_REG (reg_mode, reg + index); - emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); - } - - /* We are writing an accumulator register, so we have to - prime it after we've written it. */ - if (GET_MODE (src) == XOmode) - emit_insn (gen_mma_xxmtacc (dst, dst)); - - return; - } - - /* Register -> register moves can use common code. */ - } - - if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) - { - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA - && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) - emit_insn (gen_mma_xxmfacc (src, src)); - - /* Move register range backwards, if we might have destructive - overlap. */ - int i; - /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) - { - for (i = nregs - 1; i >= 0; i--) - { - rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + i); - rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + i); - emit_insn (gen_rtx_SET (dst_i, src_i)); - } - } - else - { - for (i = nregs - 1; i >= 0; i--) - emit_insn (gen_rtx_SET (simplify_gen_subreg (reg_mode, dst, mode, - i * reg_mode_size), - simplify_gen_subreg (reg_mode, src, mode, - i * reg_mode_size))); - } - - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA - && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) - emit_insn (gen_mma_xxmtacc (dst, dst)); - } - else - { - int i; - int j = -1; - bool used_update = false; - rtx restore_basereg = NULL_RTX; - - if (MEM_P (src) && INT_REGNO_P (reg)) - { - rtx breg; - - if (GET_CODE (XEXP (src, 0)) == PRE_INC - || GET_CODE (XEXP (src, 0)) == PRE_DEC) - { - rtx delta_rtx; - breg = XEXP (XEXP (src, 0), 0); - delta_rtx = (GET_CODE (XEXP (src, 0)) == PRE_INC - ? GEN_INT (GET_MODE_SIZE (GET_MODE (src))) - : GEN_INT (-GET_MODE_SIZE (GET_MODE (src)))); - emit_insn (gen_add3_insn (breg, breg, delta_rtx)); - src = replace_equiv_address (src, breg); - } - else if (! rs6000_offsettable_memref_p (src, reg_mode, true)) - { - if (GET_CODE (XEXP (src, 0)) == PRE_MODIFY) - { - rtx basereg = XEXP (XEXP (src, 0), 0); - if (TARGET_UPDATE) - { - rtx ndst = simplify_gen_subreg (reg_mode, dst, mode, 0); - emit_insn (gen_rtx_SET (ndst, - gen_rtx_MEM (reg_mode, - XEXP (src, 0)))); - used_update = true; - } - else - emit_insn (gen_rtx_SET (basereg, - XEXP (XEXP (src, 0), 1))); - src = replace_equiv_address (src, basereg); - } - else - { - rtx basereg = gen_rtx_REG (Pmode, reg); - emit_insn (gen_rtx_SET (basereg, XEXP (src, 0))); - src = replace_equiv_address (src, basereg); - } - } - - breg = XEXP (src, 0); - if (GET_CODE (breg) == PLUS || GET_CODE (breg) == LO_SUM) - breg = XEXP (breg, 0); - - /* If the base register we are using to address memory is - also a destination reg, then change that register last. */ - if (REG_P (breg) - && REGNO (breg) >= REGNO (dst) - && REGNO (breg) < REGNO (dst) + nregs) - j = REGNO (breg) - REGNO (dst); - } - else if (MEM_P (dst) && INT_REGNO_P (reg)) - { - rtx breg; - - if (GET_CODE (XEXP (dst, 0)) == PRE_INC - || GET_CODE (XEXP (dst, 0)) == PRE_DEC) - { - rtx delta_rtx; - breg = XEXP (XEXP (dst, 0), 0); - delta_rtx = (GET_CODE (XEXP (dst, 0)) == PRE_INC - ? GEN_INT (GET_MODE_SIZE (GET_MODE (dst))) - : GEN_INT (-GET_MODE_SIZE (GET_MODE (dst)))); - - /* We have to update the breg before doing the store. - Use store with update, if available. */ - - if (TARGET_UPDATE) - { - rtx nsrc = simplify_gen_subreg (reg_mode, src, mode, 0); - emit_insn (TARGET_32BIT - ? (TARGET_POWERPC64 - ? gen_movdi_si_update (breg, breg, delta_rtx, nsrc) - : gen_movsi_si_update (breg, breg, delta_rtx, nsrc)) - : gen_movdi_di_update (breg, breg, delta_rtx, nsrc)); - used_update = true; - } - else - emit_insn (gen_add3_insn (breg, breg, delta_rtx)); - dst = replace_equiv_address (dst, breg); - } - else if (!rs6000_offsettable_memref_p (dst, reg_mode, true) - && GET_CODE (XEXP (dst, 0)) != LO_SUM) - { - if (GET_CODE (XEXP (dst, 0)) == PRE_MODIFY) - { - rtx basereg = XEXP (XEXP (dst, 0), 0); - if (TARGET_UPDATE) - { - rtx nsrc = simplify_gen_subreg (reg_mode, src, mode, 0); - emit_insn (gen_rtx_SET (gen_rtx_MEM (reg_mode, - XEXP (dst, 0)), - nsrc)); - used_update = true; - } - else - emit_insn (gen_rtx_SET (basereg, - XEXP (XEXP (dst, 0), 1))); - dst = replace_equiv_address (dst, basereg); - } - else - { - rtx basereg = XEXP (XEXP (dst, 0), 0); - rtx offsetreg = XEXP (XEXP (dst, 0), 1); - gcc_assert (GET_CODE (XEXP (dst, 0)) == PLUS - && REG_P (basereg) - && REG_P (offsetreg) - && REGNO (basereg) != REGNO (offsetreg)); - if (REGNO (basereg) == 0) - { - rtx tmp = offsetreg; - offsetreg = basereg; - basereg = tmp; - } - emit_insn (gen_add3_insn (basereg, basereg, offsetreg)); - restore_basereg = gen_sub3_insn (basereg, basereg, offsetreg); - dst = replace_equiv_address (dst, basereg); - } - } - else if (GET_CODE (XEXP (dst, 0)) != LO_SUM) - gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true)); - } - - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA && REG_P (src) - && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) - emit_insn (gen_mma_xxmfacc (src, src)); - - for (i = 0; i < nregs; i++) - { - /* Calculate index to next subword. */ - ++j; - if (j == nregs) - j = 0; - - /* If compiler already emitted move of first word by - store with update, no need to do anything. */ - if (j == 0 && used_update) - continue; - - /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) - { - rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j); - rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j); - emit_insn (gen_rtx_SET (dst_i, src_i)); - } - else - emit_insn (gen_rtx_SET (simplify_gen_subreg (reg_mode, dst, mode, - j * reg_mode_size), - simplify_gen_subreg (reg_mode, src, mode, - j * reg_mode_size))); - } - - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA && REG_P (dst) - && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) - emit_insn (gen_mma_xxmtacc (dst, dst)); - - if (restore_basereg != NULL_RTX) - emit_insn (restore_basereg); - } -} - static GTY(()) alias_set_type TOC_alias_set = -1; alias_set_type @@ -18427,23 +18051,29 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset, return true; } -/* The function returns true if the target storage location of - mem1 is adjacent to the target storage location of mem2 */ -/* Return 1 if memory locations are adjacent. */ +/* If the target storage locations of arguments MEM1 and MEM2 are + adjacent, then return the argument that has the lower address. + Otherwise, return NULL_RTX. */ -static bool +static rtx adjacent_mem_locations (rtx mem1, rtx mem2) { rtx reg1, reg2; HOST_WIDE_INT off1, size1, off2, size2; - if (get_memref_parts (mem1, ®1, &off1, &size1) - && get_memref_parts (mem2, ®2, &off2, &size2)) - return ((REGNO (reg1) == REGNO (reg2)) - && ((off1 + size1 == off2) - || (off2 + size2 == off1))); + if (MEM_P (mem1) + && MEM_P (mem2) + && get_memref_parts (mem1, ®1, &off1, &size1) + && get_memref_parts (mem2, ®2, &off2, &size2) + && REGNO (reg1) == REGNO (reg2)) + { + if (off1 + size1 == off2) + return mem1; + else if (off2 + size2 == off1) + return mem2; + } - return false; + return NULL_RTX; } /* This function returns true if it can be determined that the two MEM @@ -19009,7 +18639,7 @@ power6_sched_reorder2 (rtx_insn **ready, int lastpos) first_store_pos = pos; if (is_store_insn (last_scheduled_insn, &str_mem2) - && adjacent_mem_locations (str_mem, str_mem2)) + && adjacent_mem_locations (str_mem, str_mem2) != NULL_RTX) { /* Found an adjacent store. Move it to the head of the ready list, and adjust it's priority so that it is @@ -26982,6 +26612,416 @@ rs6000_split_logical (rtx operands[3], return; } +/* Emit instructions to move SRC to DST. Called by splitters for + multi-register moves. It will emit at most one instruction for + each register that is accessed; that is, it won't emit li/lis pairs + (or equivalent for 64-bit code). One of SRC or DST must be a hard + register. */ + +void +rs6000_split_multireg_move (rtx dst, rtx src) +{ + /* The register number of the first register being moved. */ + int reg; + /* The mode that is to be moved. */ + machine_mode mode; + /* The mode that the move is being done in, and its size. */ + machine_mode reg_mode; + int reg_mode_size; + /* The number of registers that will be moved. */ + int nregs; + + reg = REG_P (dst) ? REGNO (dst) : REGNO (src); + mode = GET_MODE (dst); + nregs = hard_regno_nregs (reg, mode); + + /* If we have a vector quad register for MMA, and this is a load or store, + see if we can use vector paired load/stores. */ + if (mode == XOmode && TARGET_MMA + && (MEM_P (dst) || MEM_P (src))) + { + reg_mode = OOmode; + nregs /= 2; + } + /* If we have a vector pair/quad mode, split it into two/four separate + vectors. */ + else if (mode == OOmode || mode == XOmode) + reg_mode = V1TImode; + else if (FP_REGNO_P (reg)) + reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : + (TARGET_HARD_FLOAT ? DFmode : SFmode); + else if (ALTIVEC_REGNO_P (reg)) + reg_mode = V16QImode; + else + reg_mode = word_mode; + reg_mode_size = GET_MODE_SIZE (reg_mode); + + gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode)); + + /* TDmode residing in FP registers is special, since the ISA requires that + the lower-numbered word of a register pair is always the most significant + word, even in little-endian mode. This does not match the usual subreg + semantics, so we cannnot use simplify_gen_subreg in those cases. Access + the appropriate constituent registers "by hand" in little-endian mode. + + Note we do not need to check for destructive overlap here since TDmode + can only reside in even/odd register pairs. */ + if (FP_REGNO_P (reg) && DECIMAL_FLOAT_MODE_P (mode) && !BYTES_BIG_ENDIAN) + { + rtx p_src, p_dst; + int i; + + for (i = 0; i < nregs; i++) + { + if (REG_P (src) && FP_REGNO_P (REGNO (src))) + p_src = gen_rtx_REG (reg_mode, REGNO (src) + nregs - 1 - i); + else + p_src = simplify_gen_subreg (reg_mode, src, mode, + i * reg_mode_size); + + if (REG_P (dst) && FP_REGNO_P (REGNO (dst))) + p_dst = gen_rtx_REG (reg_mode, REGNO (dst) + nregs - 1 - i); + else + p_dst = simplify_gen_subreg (reg_mode, dst, mode, + i * reg_mode_size); + + emit_insn (gen_rtx_SET (p_dst, p_src)); + } + + return; + } + + /* The __vector_pair and __vector_quad modes are multi-register + modes, so if we have to load or store the registers, we have to be + careful to properly swap them if we're in little endian mode + below. This means the last register gets the first memory + location. We also need to be careful of using the right register + numbers if we are splitting XO to OO. */ + if (mode == OOmode || mode == XOmode) + { + nregs = hard_regno_nregs (reg, mode); + int reg_mode_nregs = hard_regno_nregs (reg, reg_mode); + if (MEM_P (dst)) + { + unsigned offset = 0; + unsigned size = GET_MODE_SIZE (reg_mode); + + /* If we are reading an accumulator register, we have to + deprime it before we can access it. */ + if (TARGET_MMA + && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) + emit_insn (gen_mma_xxmfacc (src, src)); + + for (int i = 0; i < nregs; i += reg_mode_nregs) + { + unsigned subreg = + (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); + rtx dst2 = adjust_address (dst, reg_mode, offset); + rtx src2 = gen_rtx_REG (reg_mode, reg + subreg); + offset += size; + emit_insn (gen_rtx_SET (dst2, src2)); + } + + return; + } + + if (MEM_P (src)) + { + unsigned offset = 0; + unsigned size = GET_MODE_SIZE (reg_mode); + + for (int i = 0; i < nregs; i += reg_mode_nregs) + { + unsigned subreg = + (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); + rtx dst2 = gen_rtx_REG (reg_mode, reg + subreg); + rtx src2 = adjust_address (src, reg_mode, offset); + offset += size; + emit_insn (gen_rtx_SET (dst2, src2)); + } + + /* If we are writing an accumulator register, we have to + prime it after we've written it. */ + if (TARGET_MMA + && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) + emit_insn (gen_mma_xxmtacc (dst, dst)); + + return; + } + + if (GET_CODE (src) == UNSPEC) + { + gcc_assert (XINT (src, 1) == UNSPEC_MMA_ASSEMBLE); + gcc_assert (REG_P (dst)); + if (GET_MODE (src) == XOmode) + gcc_assert (FP_REGNO_P (REGNO (dst))); + if (GET_MODE (src) == OOmode) + gcc_assert (VSX_REGNO_P (REGNO (dst))); + + int nvecs = XVECLEN (src, 0); + for (int i = 0; i < nvecs; i++) + { + rtx opnd; + int regno = reg + i; + + if (WORDS_BIG_ENDIAN) + opnd = XVECEXP (src, 0, i); + else + opnd = XVECEXP (src, 0, nvecs - i - 1); + + /* If we are loading an even VSX register and the memory location + is adjacent to the next register's memory location (if any), + then we can load them both with one LXVP instruction. */ + if ((regno & 1) == 0) + { + if (WORDS_BIG_ENDIAN) + { + rtx opnd2 = XVECEXP (src, 0, i + 1); + if (adjacent_mem_locations (opnd, opnd2) == opnd) + { + opnd = adjust_address (opnd, OOmode, 0); + /* Skip the next register, since we're going to + load it together with this register. */ + i++; + } + } + else + { + rtx opnd2 = XVECEXP (src, 0, nvecs - i - 2); + if (adjacent_mem_locations (opnd2, opnd) == opnd2) + { + opnd = adjust_address (opnd2, OOmode, 0); + /* Skip the next register, since we're going to + load it together with this register. */ + i++; + } + } + } + + rtx dst_i = gen_rtx_REG (GET_MODE (opnd), regno); + emit_insn (gen_rtx_SET (dst_i, opnd)); + } + + /* We are writing an accumulator register, so we have to + prime it after we've written it. */ + if (GET_MODE (src) == XOmode) + emit_insn (gen_mma_xxmtacc (dst, dst)); + + return; + } + + /* Register -> register moves can use common code. */ + } + + if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) + { + /* If we are reading an accumulator register, we have to + deprime it before we can access it. */ + if (TARGET_MMA + && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) + emit_insn (gen_mma_xxmfacc (src, src)); + + /* Move register range backwards, if we might have destructive + overlap. */ + int i; + /* XO/OO are opaque so cannot use subregs. */ + if (mode == OOmode || mode == XOmode ) + { + for (i = nregs - 1; i >= 0; i--) + { + rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + i); + rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + i); + emit_insn (gen_rtx_SET (dst_i, src_i)); + } + } + else + { + for (i = nregs - 1; i >= 0; i--) + emit_insn (gen_rtx_SET (simplify_gen_subreg (reg_mode, dst, mode, + i * reg_mode_size), + simplify_gen_subreg (reg_mode, src, mode, + i * reg_mode_size))); + } + + /* If we are writing an accumulator register, we have to + prime it after we've written it. */ + if (TARGET_MMA + && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) + emit_insn (gen_mma_xxmtacc (dst, dst)); + } + else + { + int i; + int j = -1; + bool used_update = false; + rtx restore_basereg = NULL_RTX; + + if (MEM_P (src) && INT_REGNO_P (reg)) + { + rtx breg; + + if (GET_CODE (XEXP (src, 0)) == PRE_INC + || GET_CODE (XEXP (src, 0)) == PRE_DEC) + { + rtx delta_rtx; + breg = XEXP (XEXP (src, 0), 0); + delta_rtx = (GET_CODE (XEXP (src, 0)) == PRE_INC + ? GEN_INT (GET_MODE_SIZE (GET_MODE (src))) + : GEN_INT (-GET_MODE_SIZE (GET_MODE (src)))); + emit_insn (gen_add3_insn (breg, breg, delta_rtx)); + src = replace_equiv_address (src, breg); + } + else if (! rs6000_offsettable_memref_p (src, reg_mode, true)) + { + if (GET_CODE (XEXP (src, 0)) == PRE_MODIFY) + { + rtx basereg = XEXP (XEXP (src, 0), 0); + if (TARGET_UPDATE) + { + rtx ndst = simplify_gen_subreg (reg_mode, dst, mode, 0); + emit_insn (gen_rtx_SET (ndst, + gen_rtx_MEM (reg_mode, + XEXP (src, 0)))); + used_update = true; + } + else + emit_insn (gen_rtx_SET (basereg, + XEXP (XEXP (src, 0), 1))); + src = replace_equiv_address (src, basereg); + } + else + { + rtx basereg = gen_rtx_REG (Pmode, reg); + emit_insn (gen_rtx_SET (basereg, XEXP (src, 0))); + src = replace_equiv_address (src, basereg); + } + } + + breg = XEXP (src, 0); + if (GET_CODE (breg) == PLUS || GET_CODE (breg) == LO_SUM) + breg = XEXP (breg, 0); + + /* If the base register we are using to address memory is + also a destination reg, then change that register last. */ + if (REG_P (breg) + && REGNO (breg) >= REGNO (dst) + && REGNO (breg) < REGNO (dst) + nregs) + j = REGNO (breg) - REGNO (dst); + } + else if (MEM_P (dst) && INT_REGNO_P (reg)) + { + rtx breg; + + if (GET_CODE (XEXP (dst, 0)) == PRE_INC + || GET_CODE (XEXP (dst, 0)) == PRE_DEC) + { + rtx delta_rtx; + breg = XEXP (XEXP (dst, 0), 0); + delta_rtx = (GET_CODE (XEXP (dst, 0)) == PRE_INC + ? GEN_INT (GET_MODE_SIZE (GET_MODE (dst))) + : GEN_INT (-GET_MODE_SIZE (GET_MODE (dst)))); + + /* We have to update the breg before doing the store. + Use store with update, if available. */ + + if (TARGET_UPDATE) + { + rtx nsrc = simplify_gen_subreg (reg_mode, src, mode, 0); + emit_insn (TARGET_32BIT + ? (TARGET_POWERPC64 + ? gen_movdi_si_update (breg, breg, delta_rtx, nsrc) + : gen_movsi_si_update (breg, breg, delta_rtx, nsrc)) + : gen_movdi_di_update (breg, breg, delta_rtx, nsrc)); + used_update = true; + } + else + emit_insn (gen_add3_insn (breg, breg, delta_rtx)); + dst = replace_equiv_address (dst, breg); + } + else if (!rs6000_offsettable_memref_p (dst, reg_mode, true) + && GET_CODE (XEXP (dst, 0)) != LO_SUM) + { + if (GET_CODE (XEXP (dst, 0)) == PRE_MODIFY) + { + rtx basereg = XEXP (XEXP (dst, 0), 0); + if (TARGET_UPDATE) + { + rtx nsrc = simplify_gen_subreg (reg_mode, src, mode, 0); + emit_insn (gen_rtx_SET (gen_rtx_MEM (reg_mode, + XEXP (dst, 0)), + nsrc)); + used_update = true; + } + else + emit_insn (gen_rtx_SET (basereg, + XEXP (XEXP (dst, 0), 1))); + dst = replace_equiv_address (dst, basereg); + } + else + { + rtx basereg = XEXP (XEXP (dst, 0), 0); + rtx offsetreg = XEXP (XEXP (dst, 0), 1); + gcc_assert (GET_CODE (XEXP (dst, 0)) == PLUS + && REG_P (basereg) + && REG_P (offsetreg) + && REGNO (basereg) != REGNO (offsetreg)); + if (REGNO (basereg) == 0) + { + rtx tmp = offsetreg; + offsetreg = basereg; + basereg = tmp; + } + emit_insn (gen_add3_insn (basereg, basereg, offsetreg)); + restore_basereg = gen_sub3_insn (basereg, basereg, offsetreg); + dst = replace_equiv_address (dst, basereg); + } + } + else if (GET_CODE (XEXP (dst, 0)) != LO_SUM) + gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true)); + } + + /* If we are reading an accumulator register, we have to + deprime it before we can access it. */ + if (TARGET_MMA && REG_P (src) + && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) + emit_insn (gen_mma_xxmfacc (src, src)); + + for (i = 0; i < nregs; i++) + { + /* Calculate index to next subword. */ + ++j; + if (j == nregs) + j = 0; + + /* If compiler already emitted move of first word by + store with update, no need to do anything. */ + if (j == 0 && used_update) + continue; + + /* XO/OO are opaque so cannot use subregs. */ + if (mode == OOmode || mode == XOmode ) + { + rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j); + rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j); + emit_insn (gen_rtx_SET (dst_i, src_i)); + } + else + emit_insn (gen_rtx_SET (simplify_gen_subreg (reg_mode, dst, mode, + j * reg_mode_size), + simplify_gen_subreg (reg_mode, src, mode, + j * reg_mode_size))); + } + + /* If we are writing an accumulator register, we have to + prime it after we've written it. */ + if (TARGET_MMA && REG_P (dst) + && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) + emit_insn (gen_mma_xxmtacc (dst, dst)); + + if (restore_basereg != NULL_RTX) + emit_insn (restore_basereg); + } +} /* Return true if the peephole2 can combine a load involving a combination of an addis instruction and a load with an offset that can be fused together on diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c new file mode 100644 index 00000000000..397d0f1db35 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +typedef unsigned char vec_t __attribute__((vector_size(16))); + +void +foo (__vector_pair *dst, vec_t *src) +{ + __vector_pair pair; + /* Adjacent loads should be combined into one lxvp instruction. */ + __builtin_vsx_build_pair (&pair, src[0], src[1]); + *dst = pair; +} + +void +bar (__vector_quad *dst, vec_t *src) +{ + __vector_quad quad; + /* Adjacent loads should be combined into two lxvp instructions. */ + __builtin_mma_build_acc (&quad, src[0], src[1], src[2], src[3]); + *dst = quad; +} + +/* { dg-final { scan-assembler-not {\mlxv\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */