From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 0BE5E3857354 for ; Thu, 9 Nov 2023 08:22:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0BE5E3857354 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0BE5E3857354 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518150; cv=none; b=NRELS3wtU34ncec+NIIlyy43bo9lvHqxR0OJ4srdV4CMTp/X3AWdxSqN0GekMWfjf6Zx704UjYLQe1Mr8KtMNOGAZS3lUP3qjY56dJR8qD9Su2ZCFlR9gRy+XbbVUMd19p8+Olin6VR2KXMDTTxgJ+L09jRRgVNBo9VaIXhzW90= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518150; c=relaxed/simple; bh=j6VnWun12FGllYo9LtYi16ZBUzqe5ZzwniCZ3P+a7Ds=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FUQ2nhq5WGZXesvhFlNAo8MlSYEE7dFsNCnWszpOoIMHG/pCQaNQbGUH2MqfxfC4jhfVD7CMWNfs26op6YbtLEt12RAcQ3IB8FXW5xdlAChjsb49H2hGcA9GKpk+k5fTLMc8hw5STGKDpOKlC5AL7XTiXJyjioM+xzHMMywS+tQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A98HSFm031692 for ; Thu, 9 Nov 2023 08:22:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=qKqq6tb9r5IGo7mHq9w/89hxCu9dpaDaJ6KRxSA3WNo=; b=jttbXWnkSt14JG3GTU9LjBMRen990hVTG30eKoPhJMQQp1USSWtVqRnfkzKEw98om0qy NZSTpXhAfElzfR3QnyOlgX8r03EbuLQXKzj9RREhxdCcfglgl9fYHx3ZCpQ3QVQ/K9PE vTVSjKjBfjJpA6UC51u3z1G49UuAt72twDUVcXiC7Hs+YQSrMAv+pn6AnppfGytVShVh d0u8znXmgJ7VB7iRnfv2VBXqhF3ipyD9a/gNfM6IXV9f7mBbLs/xZhhPwpnfUneoeJXO 7CrYn/XlnUrmG4E96BN+hnN++z+mbhkK61tkO2kn3QDepJkq5BhzpXh/6rTIS3cODGXV Ug== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8usn06mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:26 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A97mC95000700 for ; Thu, 9 Nov 2023 08:22:25 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3u7w232j3a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:25 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A98MMel14746314 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Nov 2023 08:22:22 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 43E902004D; Thu, 9 Nov 2023 08:22:22 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2317720040; Thu, 9 Nov 2023 08:22:22 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Thu, 9 Nov 2023 08:22:22 +0000 (GMT) From: Stefan Schulze Frielinghaus To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 1/3] s390: Recognize further vpdi and vmr{l,h} pattern Date: Thu, 9 Nov 2023 09:22:09 +0100 Message-ID: <20231109082211.2505-1-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: XZPYoLC-yR8exrLNa7KSi1PGUcINHiHi X-Proofpoint-ORIG-GUID: XZPYoLC-yR8exrLNa7KSi1PGUcINHiHi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-09_07,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 suspectscore=0 mlxlogscore=972 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311090068 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Deal with cases where vpdi and vmr{l,h} are still applicable if the operands of those instructions are swapped. For example, currently for V2DI foo (V2DI x) { return (V2DI) {x[1], x[0]}; } the assembler sequence vlgvg %r1,%v24,1 vzero %v0 vlvgg %v0,%r1,0 vmrhg %v24,%v0,%v24 is emitted. With this patch a single vpdi is emitted. Extensive tests are included in a subsequent patch of this series where more cases are covered. Bootstrapped and regtested on s390. Ok for mainline? gcc/ChangeLog: * config/s390/s390.cc (expand_perm_with_merge): Deal with cases where vmr{l,h} are still applicable if the operands are swapped. (expand_perm_with_vpdi): Likewise for vpdi. --- gcc/config/s390/s390.cc | 118 ++++++++++++++++++++++++++++++---------- 1 file changed, 90 insertions(+), 28 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 64f56d8effa..185eb59f8b8 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -17532,40 +17532,86 @@ struct expand_vec_perm_d static bool expand_perm_with_merge (const struct expand_vec_perm_d &d) { - bool merge_lo_p = true; - bool merge_hi_p = true; - - if (d.nelt % 2) + static const unsigned char hi_perm_di[2] = {0, 2}; + static const unsigned char hi_perm_si[4] = {0, 4, 1, 5}; + static const unsigned char hi_perm_hi[8] = {0, 8, 1, 9, 2, 10, 3, 11}; + static const unsigned char hi_perm_qi[16] + = {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}; + + static const unsigned char hi_perm_di_swap[2] = {2, 0}; + static const unsigned char hi_perm_si_swap[4] = {4, 0, 6, 2}; + static const unsigned char hi_perm_hi_swap[8] = {8, 0, 10, 2, 12, 4, 14, 6}; + static const unsigned char hi_perm_qi_swap[16] + = {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14}; + + static const unsigned char lo_perm_di[2] = {1, 3}; + static const unsigned char lo_perm_si[4] = {2, 6, 3, 7}; + static const unsigned char lo_perm_hi[8] = {4, 12, 5, 13, 6, 14, 7, 15}; + static const unsigned char lo_perm_qi[16] + = {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}; + + static const unsigned char lo_perm_di_swap[2] = {3, 1}; + static const unsigned char lo_perm_si_swap[4] = {5, 1, 7, 3}; + static const unsigned char lo_perm_hi_swap[8] = {9, 1, 11, 3, 13, 5, 15, 7}; + static const unsigned char lo_perm_qi_swap[16] + = {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15}; + + bool merge_lo_p = false; + bool merge_hi_p = false; + bool swap_operands_p = false; + + if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, hi_perm_si, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, hi_perm_hi, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, hi_perm_qi, 16) == 0)) + { + merge_hi_p = true; + } + else if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di_swap, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, hi_perm_si_swap, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, hi_perm_hi_swap, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, hi_perm_qi_swap, 16) == 0)) + { + merge_hi_p = true; + swap_operands_p = true; + } + else if ((d.nelt == 2 && memcmp (d.perm, lo_perm_di, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, lo_perm_si, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, lo_perm_hi, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, lo_perm_qi, 16) == 0)) + { + merge_lo_p = true; + } + else if ((d.nelt == 2 && memcmp (d.perm, lo_perm_di_swap, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, lo_perm_si_swap, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, lo_perm_hi_swap, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, lo_perm_qi_swap, 16) == 0)) + { + merge_lo_p = true; + swap_operands_p = true; + } + + if (!merge_lo_p && !merge_hi_p) return false; - // For V4SI this checks for: { 0, 4, 1, 5 } - for (int telt = 0; telt < d.nelt; telt++) - if (d.perm[telt] != telt / 2 + (telt % 2) * d.nelt) - { - merge_hi_p = false; - break; - } + if (d.testing_p) + return merge_lo_p || merge_hi_p; - if (!merge_hi_p) + rtx op0, op1; + if (swap_operands_p) { - // For V4SI this checks for: { 2, 6, 3, 7 } - for (int telt = 0; telt < d.nelt; telt++) - if (d.perm[telt] != (telt + d.nelt) / 2 + (telt % 2) * d.nelt) - { - merge_lo_p = false; - break; - } + op0 = d.op1; + op1 = d.op0; } else - merge_lo_p = false; - - if (d.testing_p) - return merge_lo_p || merge_hi_p; + { + op0 = d.op0; + op1 = d.op1; + } - if (merge_lo_p || merge_hi_p) - s390_expand_merge (d.target, d.op0, d.op1, merge_hi_p); + s390_expand_merge (d.target, op0, op1, merge_hi_p); - return merge_lo_p || merge_hi_p; + return true; } /* Try to expand the vector permute operation described by D using the @@ -17582,6 +17628,7 @@ expand_perm_with_vpdi (const struct expand_vec_perm_d &d) { bool vpdi1_p = false; bool vpdi4_p = false; + bool swap_operands_p = false; rtx op0_reg, op1_reg; // Only V2DI and V2DF are supported here. @@ -17590,11 +17637,20 @@ expand_perm_with_vpdi (const struct expand_vec_perm_d &d) if (d.perm[0] == 0 && d.perm[1] == 3) vpdi1_p = true; - - if ((d.perm[0] == 1 && d.perm[1] == 2) + else if (d.perm[0] == 2 && d.perm[1] == 1) + { + vpdi1_p = true; + swap_operands_p = true; + } + else if ((d.perm[0] == 1 && d.perm[1] == 2) || (d.perm[0] == 1 && d.perm[1] == 0) || (d.perm[0] == 3 && d.perm[1] == 2)) vpdi4_p = true; + else if (d.perm[0] == 3 && d.perm[1] == 0) + { + vpdi4_p = true; + swap_operands_p = true; + } if (!vpdi1_p && !vpdi4_p) return false; @@ -17611,6 +17667,12 @@ expand_perm_with_vpdi (const struct expand_vec_perm_d &d) op1_reg = op0_reg; else if (d.only_op1) op0_reg = op1_reg; + else if (swap_operands_p) + { + rtx tmp = op0_reg; + op0_reg = op1_reg; + op1_reg = tmp; + } if (vpdi1_p) emit_insn (gen_vpdi1 (d.vmode, d.target, op0_reg, op1_reg)); -- 2.41.0