From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 0A5183846012 for ; Tue, 13 Jul 2021 17:14:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0A5183846012 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 16DH4lk9030970; Tue, 13 Jul 2021 13:14:24 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com with ESMTP id 39qs3cd1wr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jul 2021 13:14:24 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 16DH8MAM018024; Tue, 13 Jul 2021 17:14:23 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma02dal.us.ibm.com with ESMTP id 39qt3bbctt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jul 2021 17:14:23 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 16DHENle40436192 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jul 2021 17:14:23 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0BF06AE05C; Tue, 13 Jul 2021 17:14:23 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9E6E2AE064; Tue, 13 Jul 2021 17:14:22 +0000 (GMT) Received: from [9.65.206.231] (unknown [9.65.206.231]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Tue, 13 Jul 2021 17:14:22 +0000 (GMT) Subject: Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions To: segher@gate.crashing.org Cc: GCC Patches References: <20210708232836.GT1583@gate.crashing.org> <680c1d6a-0662-f609-f0b5-2547011ea4b6@linux.ibm.com> <43d74cc8-2019-8173-7bdb-110bb5dc3e29@linux.ibm.com> <20210711003921.GA1583@gate.crashing.org> From: Peter Bergner Message-ID: <269df715-1ced-b0fe-df9d-d88efa810944@linux.ibm.com> Date: Tue, 13 Jul 2021 12:14:22 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210711003921.GA1583@gate.crashing.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: JnUBiBiAU9HvFKX4mhI2mBH3sFeCEfZg X-Proofpoint-ORIG-GUID: JnUBiBiAU9HvFKX4mhI2mBH3sFeCEfZg X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-13_10:2021-07-13, 2021-07-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 phishscore=0 impostorscore=0 suspectscore=0 mlxscore=0 adultscore=0 malwarescore=0 clxscore=1015 bulkscore=0 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107130108 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jul 2021 17:14:29 -0000 ...and patch 2: On 7/10/21 7:39 PM, segher@gate.crashing.org wrote: >> + unsigned subreg = >> + (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); > > This is not new code, but it caught my eye, so just for the record: the > "=" should start a new line: > unsigned subreg > = WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i); > (and don't put parens around random words please :-) ). Fixed. >> + int nvecs = XVECLEN (src, 0); >> + for (int i = 0; i < nvecs; i++) >> + { >> + rtx opnd; > > Just "op" (and "op2") please? If you use long names you might as well > just spell "operand" :-) Done. >> + if (WORDS_BIG_ENDIAN) >> + opnd = XVECEXP (src, 0, i); >> + else >> + opnd = XVECEXP (src, 0, nvecs - i - 1); > > Put this together with the case below as well? Probably keep the > WORDS_BIG_ENDIAN test as the outer "if"? Ok, reworked a little bit. I'm currently bootstrapping and regtesting these two patches and will report back. Better now? Peter rs6000: Generate an lxvp instead of two adjacent lxv instructions The MMA build built-ins currently use individual lxv instructions to load up the registers of a __vector_pair or __vector_quad. If the memory addresses of the built-in operands are to adjacent locations, then we can use an lxvp in some cases to load up two registers at once. The patch below adds support for checking whether memory addresses are adjacent and emitting an lxvp instead of two lxv instructions. gcc/ * config/rs6000/rs6000.c (adjacent_mem_locations): Return the lower addressed memory rtx, if any. (power6_sched_reorder2): Update for adjacent_mem_locations change. (rs6000_split_multireg_move): Fix code formatting. Handle MMA build built-ins with operands in adjacent memory locations. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-9.c: New test. --- gcc/config/rs6000/rs6000.c | 84 ++++++++++++++----- .../gcc.target/powerpc/mma-builtin-9.c | 28 +++++++ 2 files changed, 93 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ae11b8d52cb..5fed3bc3ac1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -18051,23 +18051,29 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset, return true; } -/* The function returns true if the target storage location of - mem1 is adjacent to the target storage location of mem2 */ -/* Return 1 if memory locations are adjacent. */ +/* If the target storage locations of arguments MEM1 and MEM2 are + adjacent, then return the argument that has the lower address. + Otherwise, return NULL_RTX. */ -static bool +static rtx adjacent_mem_locations (rtx mem1, rtx mem2) { rtx reg1, reg2; HOST_WIDE_INT off1, size1, off2, size2; - if (get_memref_parts (mem1, ®1, &off1, &size1) - && get_memref_parts (mem2, ®2, &off2, &size2)) - return ((REGNO (reg1) == REGNO (reg2)) - && ((off1 + size1 == off2) - || (off2 + size2 == off1))); + if (MEM_P (mem1) + && MEM_P (mem2) + && get_memref_parts (mem1, ®1, &off1, &size1) + && get_memref_parts (mem2, ®2, &off2, &size2) + && REGNO (reg1) == REGNO (reg2)) + { + if (off1 + size1 == off2) + return mem1; + else if (off2 + size2 == off1) + return mem2; + } - return false; + return NULL_RTX; } /* This function returns true if it can be determined that the two MEM @@ -18633,7 +18639,7 @@ power6_sched_reorder2 (rtx_insn **ready, int lastpos) first_store_pos = pos; if (is_store_insn (last_scheduled_insn, &str_mem2) - && adjacent_mem_locations (str_mem, str_mem2)) + && adjacent_mem_locations (str_mem, str_mem2) != NULL_RTX) { /* Found an adjacent store. Move it to the head of the ready list, and adjust it's priority so that it is @@ -26708,8 +26714,8 @@ rs6000_split_multireg_move (rtx dst, rtx src) for (int i = 0; i < nregs; i += reg_mode_nregs) { - unsigned subreg = - (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); + unsigned subreg + = WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i); rtx dst2 = adjust_address (dst, reg_mode, offset); rtx src2 = gen_rtx_REG (reg_mode, reg + subreg); offset += size; @@ -26726,8 +26732,8 @@ rs6000_split_multireg_move (rtx dst, rtx src) for (int i = 0; i < nregs; i += reg_mode_nregs) { - unsigned subreg = - (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i); + unsigned subreg + = WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i); rtx dst2 = gen_rtx_REG (reg_mode, reg + subreg); rtx src2 = adjust_address (src, reg_mode, offset); offset += size; @@ -26752,13 +26758,53 @@ rs6000_split_multireg_move (rtx dst, rtx src) if (GET_MODE (src) == OOmode) gcc_assert (VSX_REGNO_P (REGNO (dst))); - reg_mode = GET_MODE (XVECEXP (src, 0, 0)); int nvecs = XVECLEN (src, 0); for (int i = 0; i < nvecs; i++) { - int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; - rtx dst_i = gen_rtx_REG (reg_mode, reg + index); - emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); + rtx op; + int regno = reg + i; + + if (WORDS_BIG_ENDIAN) + { + op = XVECEXP (src, 0, i); + + /* If we are loading an even VSX register and the memory location + is adjacent to the next register's memory location (if any), + then we can load them both with one LXVP instruction. */ + if ((regno & 1) == 0) + { + rtx op2 = XVECEXP (src, 0, i + 1); + if (adjacent_mem_locations (op, op2) == op) + { + op = adjust_address (op, OOmode, 0); + /* Skip the next register, since we're going to + load it together with this register. */ + i++; + } + } + } + else + { + op = XVECEXP (src, 0, nvecs - i - 1); + + /* If we are loading an even VSX register and the memory location + is adjacent to the next register's memory location (if any), + then we can load them both with one LXVP instruction. */ + if ((regno & 1) == 0) + { + rtx op2 = XVECEXP (src, 0, nvecs - i - 2); + if (adjacent_mem_locations (op2, op) == op2) + { + op = adjust_address (op2, OOmode, 0); + /* Skip the next register, since we're going to + load it together with this register. */ + i++; + } + } + } + + rtx dst_i = gen_rtx_REG (GET_MODE (op), regno); + emit_insn (gen_rtx_SET (dst_i, op)); } /* We are writing an accumulator register, so we have to diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c new file mode 100644 index 00000000000..397d0f1db35 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +typedef unsigned char vec_t __attribute__((vector_size(16))); + +void +foo (__vector_pair *dst, vec_t *src) +{ + __vector_pair pair; + /* Adjacent loads should be combined into one lxvp instruction. */ + __builtin_vsx_build_pair (&pair, src[0], src[1]); + *dst = pair; +} + +void +bar (__vector_quad *dst, vec_t *src) +{ + __vector_quad quad; + /* Adjacent loads should be combined into two lxvp instructions. */ + __builtin_mma_build_acc (&quad, src[0], src[1], src[2], src[3]); + *dst = quad; +} + +/* { dg-final { scan-assembler-not {\mlxv\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */ -- 2.17.1