From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 281E03858402; Thu, 28 Oct 2021 05:39:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 281E03858402 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19S5HRf3018916; Thu, 28 Oct 2021 05:39:00 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3bynk98aat-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Oct 2021 05:38:59 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 19S5LqwW013047; Thu, 28 Oct 2021 05:38:59 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 3bynk98aa3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Oct 2021 05:38:59 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19S5WQ76021783; Thu, 28 Oct 2021 05:38:57 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma03fra.de.ibm.com with ESMTP id 3bx4eq4faf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Oct 2021 05:38:56 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19S5crRY64684488 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Oct 2021 05:38:53 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 68F40A4060; Thu, 28 Oct 2021 05:38:52 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B7303A407E; Thu, 28 Oct 2021 05:38:42 +0000 (GMT) Received: from [9.197.231.42] (unknown [9.197.231.42]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 28 Oct 2021 05:38:41 +0000 (GMT) Message-ID: <279fd64f-505a-5b1d-dd06-2ec201db3009@linux.ibm.com> Date: Thu, 28 Oct 2021 13:38:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Subject: [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868] Content-Language: en-US To: David Edelsohn Cc: GCC Patches , Segher Boessenkool , Bill Schmidt , guojiufu , linkw@gcc.gnu.org References: <20211025025056.1002396-1-luoxhu@linux.ibm.com> From: Xionghu Luo In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ojIJIS8GT0y_L_eDnHByuy5l2qOV5g_D X-Proofpoint-ORIG-GUID: PZFiChC6GkCLxP7EAaUFIvrwCKmp1dMz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-28_01,2021-10-26_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 suspectscore=0 spamscore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0 clxscore=1015 mlxscore=0 adultscore=0 mlxlogscore=999 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2110280027 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2021 05:39:03 -0000 On 2021/10/27 21:24, David Edelsohn wrote: > On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo wrote: >> >> If the second operand of __builtin_shuffle is const vector 0, and with >> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv. >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add >> patterns match and emit for VSX xxpermdi. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr102868.c: New test. >> --- >> gcc/config/rs6000/rs6000.c | 47 ++++++++++++++++-- >> gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +++++++++++++++++++++ >> 2 files changed, 97 insertions(+), 3 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c >> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index d0730253bcc..5d802c1fa96 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> {OPTION_MASK_P8_VECTOR, >> BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct >> : CODE_FOR_p8_vmrgew_v4sf_direct, >> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}}; >> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}, >> + {OPTION_MASK_VSX, >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi >> + : CODE_FOR_vsx_xxpermdi_v16qi), >> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}}, >> + {OPTION_MASK_VSX, >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi >> + : CODE_FOR_vsx_xxpermdi_v16qi), >> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}}, >> + {OPTION_MASK_VSX, >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi >> + : CODE_FOR_vsx_xxpermdi_v16qi), >> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}}, >> + {OPTION_MASK_VSX, >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi >> + : CODE_FOR_vsx_xxpermdi_v16qi), >> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}}; > > If the insn_code is the same for big endian and little endian, why > does the new code test BYTES_BIG_ENDIAN to set the same value > (CODE_FOR_vsx_xxpermdi_v16qi)? > Thanks for the catch, updated the patch as below: [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868] If the second operand of __builtin_shuffle is const vector 0, and with specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv. gcc/ChangeLog: * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add patterns match and emit for VSX xxpermdi. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr102868.c: New test. --- gcc/config/rs6000/rs6000.c | 39 +++++++++++++-- gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +++++++++++++++++++++ 2 files changed, 89 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d0730253bcc..533560bb9ba 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -23046,7 +23046,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct : CODE_FOR_p8_vmrgew_v4sf_direct, - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}}; + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}, + {OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi, + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}}, + {OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi, + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}}, + {OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi, + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}}, + {OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi, + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}}; unsigned int i, j, elt, which; unsigned char perm[16]; @@ -23169,6 +23177,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, machine_mode omode = insn_data[icode].operand[0].mode; machine_mode imode = insn_data[icode].operand[1].mode; + rtx perm_idx = GEN_INT (0); + if (icode == CODE_FOR_vsx_xxpermdi_v16qi) + { + int perm_val = 0; + if (one_vec) + { + if (perm[0] == 8) + perm_val |= 2; + if (perm[8] == 8) + perm_val |= 1; + } + else + { + if (perm[0] != 0) + perm_val |= 2; + if (perm[8] != 16) + perm_val |= 1; + } + perm_idx = GEN_INT (perm_val); + } + /* For little-endian, don't use vpkuwum and vpkuhum if the underlying vector type is not V4SI and V8HI, respectively. For example, using vpkuwum with a V8HI picks up the even @@ -23192,7 +23221,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN) + if (swapped ^ !BYTES_BIG_ENDIAN + && icode != CODE_FOR_vsx_xxpermdi_v16qi) std::swap (op0, op1); if (imode != V16QImode) { @@ -23203,7 +23233,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, x = target; else x = gen_reg_rtx (omode); - emit_insn (GEN_FCN (icode) (x, op0, op1)); + if (icode == CODE_FOR_vsx_xxpermdi_v16qi) + emit_insn (GEN_FCN (icode) (x, op0, op1, perm_idx)); + else + emit_insn (GEN_FCN (icode) (x, op0, op1)); if (omode != V16QImode) emit_move_insn (target, gen_lowpart (V16QImode, x)); return true; diff --git a/gcc/testsuite/gcc.target/powerpc/pr102868.c b/gcc/testsuite/gcc.target/powerpc/pr102868.c new file mode 100644 index 00000000000..eb45d193f66 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr102868.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#include +vector float b = {0.0f, 0.0f, 0.0f, 0.0f}; + + +vector float foo1 (vector float x) +{ + vector int c = {0, 1, 4, 5}; + return __builtin_shuffle (x, b, c); +} + +vector float foo2 (vector float x) +{ + vector int c = {2, 3, 4, 5}; + return __builtin_shuffle (x, b, c); +} + +vector float foo3 (vector float x) +{ + vector int c = {0, 1, 6, 7}; + return __builtin_shuffle (x, b, c); +} + +vector float foo4 (vector float x) +{ + vector int c = {2, 3, 6, 7}; + return __builtin_shuffle (x, b, c); +} + +vector unsigned char foo5 (vector unsigned char x) +{ + vector unsigned char c = {0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7}; + return __builtin_shuffle (x, c); +} + +vector unsigned char foo6 (vector unsigned char x) +{ + vector unsigned char c = {8, 9, 10, 11, 12, 13, 14, 15, 8, 9, 10, 11, 12, 13, 14, 15}; + return __builtin_shuffle (x, c); +} + +vector unsigned char foo7 (vector unsigned char x) +{ + vector unsigned char c = {8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7}; + return __builtin_shuffle (x, c); +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 7 { target has_arch_pwr9 } } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 7 { target { {! has_arch_pwr9} && be } } } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 11 { target { {! has_arch_pwr9} && le } } } } */ -- 2.25.1