From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 2F1613857356 for ; Mon, 24 Jul 2023 02:43:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2F1613857356 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36O27k5E023493; Mon, 24 Jul 2023 02:43:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=49yDAkID9dgR/ZyPApv9eFppdAG8BMjDIgQeUJZlfm8=; b=MmXp+IFAzDlzXnTVMgZBjigT6G9VE4N89Ncscg3QHncqHzPq4AGqNLZibxeJb29ffq4L Jxo/5vZ3wDUJkWKaRkoljChret+d/2cs78uj3tzpWJVyKS39NwwEn2cvE74P9/bZxarT w0Rn9zZWmYfRgfGYMbdnfT3YTiYkwjuBrz98DX4hZz9Bo2W8mC/gjLkLKvGsawod1/Kx Ae1zTUBuLLW5yyCYQ0KO8OnQ3N9pu0zsXEupRSfyO1UtaKhE96tPtsDmNaFLUV7a7Goz 2xiSd4BAoYfxVKlP5JNkSOyMyaU1n463qIgmhJJVByd5a28ruE9YodRnnRCicacpfZ8J 2g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s0n4bdudx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jul 2023 02:43:33 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36O2cDbF023898; Mon, 24 Jul 2023 02:43:32 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s0n4bdudc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jul 2023 02:43:32 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36O2Gssu026217; Mon, 24 Jul 2023 02:43:31 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3s0serg25f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jul 2023 02:43:31 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36O2hS2h39518494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Jul 2023 02:43:28 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8F5CB20043; Mon, 24 Jul 2023 02:43:28 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6AD6020040; Mon, 24 Jul 2023 02:43:26 +0000 (GMT) Received: from [9.177.25.45] (unknown [9.177.25.45]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 24 Jul 2023 02:43:26 +0000 (GMT) Message-ID: Date: Mon, 24 Jul 2023 10:43:24 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCHv2, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769] Content-Language: en-US To: HAO CHEN GUI Cc: Segher Boessenkool , David , Peter Bergner , gcc-patches References: <894768a2-5ebe-60f0-e6e9-73bdc9f1425d@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: <894768a2-5ebe-60f0-e6e9-73bdc9f1425d@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: dCnf77zAIUo5UJKF_oK9Zx5JdXQOZMJD X-Proofpoint-ORIG-GUID: 3cqGnBNTu2RpvaLPUBMYNIOV-ki4vI4h Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-24_02,2023-07-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 spamscore=0 suspectscore=0 malwarescore=0 bulkscore=0 adultscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 priorityscore=1501 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307240022 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Haochen, on 2023/7/21 09:32, HAO CHEN GUI wrote: > Hi, > This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx > for all subtargets when the mode is V4SI and the index of extracted element > is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz > which can help eliminate redundant zero extend. > > Compared to last version, the main change is to add a new expand for V4SI > and separate "vsx_extract_si" to 2 insn patterns. > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622101.html > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > > Thanks > Gui Haochen > > > ChangeLog > rs6000: Generate mfvsrwz for all subtargets and remove redundant zero extend > > mfvsrwz has lower latency than xxextractuw or vextuw[lr]x. So it should be > generated even with p9 vector enabled. Also the instruction is already > zero extended. A combine pattern is needed to eliminate redundant zero > extend instructions. > > gcc/ > PR target/106769 > * config/rs6000/vsx.md (expand vsx_extract_): Set it only > for V8HI and V16QI. > (vsx_extract_v4si): New expand for V4SI. > (*vsx_extract__di_p9): Not generate the insn when it can > be generated by mfvsrwz. > (mfvsrwz): New insn pattern for zero extended vsx_extract_v4si. > (*vsx_extract_si): Removed. > (vsx_extract_v4si_0): New insn pattern to deal with V4SI extract > when the index of extracted element is 1 with BE and 2 with LE. > (vsx_extract_v4si_1): New insn and split pattern which deals with > the cases not handled by vsx_extract_v4si_0. > > gcc/testsuite/ > PR target/106769 > * gcc.target/powerpc/pr106769.h: New. > * gcc.target/powerpc/pr106769-p8.c: New. > * gcc.target/powerpc/pr106769-p9.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 0a34ceebeb5..ad249441bcf 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3722,9 +3722,9 @@ (define_insn "vsx_xxpermdi2__1" > (define_expand "vsx_extract_" > [(parallel [(set (match_operand: 0 "gpc_reg_operand") > (vec_select: > - (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand") > + (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand") > (parallel [(match_operand:QI 2 "const_int_operand")]))) > - (clobber (match_scratch:VSX_EXTRACT_I 3))])] > + (clobber (match_scratch:VSX_EXTRACT_I2 3))])] > "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" > { > /* If we have ISA 3.0, we can do a xxextractuw/vextractu{b,h}. */ > @@ -3736,6 +3736,23 @@ (define_expand "vsx_extract_" > } > }) > > +(define_expand "vsx_extract_v4si" > + [(parallel [(set (match_operand:SI 0 "gpc_reg_operand") > + (vec_select:SI > + (match_operand:V4SI 1 "gpc_reg_operand") > + (parallel [(match_operand:QI 2 "const_0_to_3_operand")]))) > + (clobber (match_scratch:V4SI 3))])] > + "TARGET_DIRECT_MOVE_64BIT" > +{ Nit: Maybe add a comment here for why we special-case op2. > + if (TARGET_P9_VECTOR > + && INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)) > + { > + emit_insn (gen_vsx_extract_v4si_p9 (operands[0], operands[1], > + operands[2])); > + DONE; > + } > +}) > + Nit: Move "(define_insn \"vsx_extract_v4si_0\"..." up here to ensure it takes the first priority in matching. > (define_insn "vsx_extract__p9" > [(set (match_operand: 0 "gpc_reg_operand" "=r,") > (vec_select: > @@ -3798,7 +3815,9 @@ (define_insn_and_split "*vsx_extract__di_p9" > (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,") > (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))) > (clobber (match_scratch:SI 3 "=r,X"))] > - "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB" > + "TARGET_VEXTRACTUB > + && (mode != V4SImode > + || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))" I'd expect that under condition TARGET_VEXTRACTUB, we won't get this kind of pattern with V4SI and 1/2 op2 now? Instead of putting one condition to exclude it, IMHO it's better to assert op2 isn't 1 or 2 in its splitters. > "#" > "&& reload_completed" > [(parallel [(set (match_dup 4) > @@ -3830,58 +3849,78 @@ (define_insn_and_split "*vsx_extract__store_p9" > (set (match_dup 0) > (match_dup 3))]) > > -(define_insn_and_split "*vsx_extract_si" > +(define_insn "mfvsrwz" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (zero_extend:DI > + (vec_select:SI > + (match_operand:V4SI 1 "vsx_register_operand" "wa") > + (parallel [(match_operand:QI 2 "const_int_operand" "n")])))) > + (clobber (match_scratch:V4SI 3 "=v"))] > + "TARGET_DIRECT_MOVE_64BIT > + && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2)" > + "mfvsrwz %0,%x1" > + [(set_attr "type" "mfvsr") > + (set_attr "isa" "p8v")]) Nit: there is no call with gen_mfvswz, make it "*mfvsrwz"? > + > +(define_insn "vsx_extract_v4si_0" Nit: since it cares about word 1 (ISA, also BE order) of vector, may be better to name it as "vsx_extract_v4si_w1" with one more comment. > + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z,wa") > + (vec_select:SI > + (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v,0") > + (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")]))) > + (clobber (match_scratch:V4SI 3 "=v,v,v,v"))] > + "TARGET_DIRECT_MOVE_64BIT > + && (!TARGET_P9_VECTOR || INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2))" Could we remove the "!TARGET_P9_VECTOR"? It looks more clear to make this just focus on op2 == 1 or 2 ... > +{ > + if (which_alternative == 0) > + return "mfvsrwz %0,%x1"; > + > + if (which_alternative == 1) > + return "xxlor %x0,%x1,%x1"; > + > + if (which_alternative == 2) > + return "stxsiwx %x1,%y0"; > + > + return ASM_COMMENT_START " vec_extract to same register"; > +} > + [(set_attr "type" "mfvsr,veclogical,fpstore,*") > + (set_attr "length" "4,4,4,0") > + (set_attr "isa" "p8v,*,p8v,*")]) > + > +(define_insn_and_split "vsx_extract_v4si_1" ... and use this for the others ... > [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z") > (vec_select:SI > (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v") > (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n")]))) > (clobber (match_scratch:V4SI 3 "=v,v,v"))] > - "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT && !TARGET_P9_VECTOR" > + "TARGET_DIRECT_MOVE_64BIT > + && !TARGET_P9_VECTOR I'd expect TARGET_P9_VECTOR won't adopt this kind of pattern? If so, like above it's better to put one assert below. BR, Kewen > + && INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)" > "#" > - "&& reload_completed" > + "&& 1" > [(const_int 0)] > { > rtx dest = operands[0]; > rtx src = operands[1]; > rtx element = operands[2]; > - rtx vec_tmp = operands[3]; > - int value; > + rtx vec_tmp; > + > + if (GET_CODE (operands[3]) == SCRATCH) > + vec_tmp = gen_reg_rtx (V4SImode); > + else > + vec_tmp = operands[3]; > > /* Adjust index for LE element ordering, the below minuend 3 is computed by > GET_MODE_NUNITS (V4SImode) - 1. */ > if (!BYTES_BIG_ENDIAN) > element = GEN_INT (3 - INTVAL (element)); > > - /* If the value is in the correct position, we can avoid doing the VSPLT > - instruction. */ > - value = INTVAL (element); > - if (value != 1) > - emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element)); > - else > - vec_tmp = src; > + emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element)); > > - if (MEM_P (operands[0])) > - { > - if (can_create_pseudo_p ()) > - dest = rs6000_force_indexed_or_indirect_mem (dest); > - > - if (TARGET_P8_VECTOR) > - emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp))); > - else > - emit_insn (gen_stfiwx (dest, gen_rtx_REG (DImode, REGNO (vec_tmp)))); > - } > - > - else if (TARGET_P8_VECTOR) > - emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp))); > - else > - emit_move_insn (gen_rtx_REG (DImode, REGNO (dest)), > - gen_rtx_REG (DImode, REGNO (vec_tmp))); > + int value = BYTES_BIG_ENDIAN ? 1 : 2; > + emit_insn (gen_vsx_extract_v4si_0 (dest, vec_tmp, GEN_INT (value))); > > DONE; > -} > - [(set_attr "type" "mfvsr,vecperm,fpstore") > - (set_attr "length" "8") > - (set_attr "isa" "*,p8v,*")]) > +}) > > (define_insn_and_split "*vsx_extract__p8" > [(set (match_operand: 0 "nonimmediate_operand" "=r") > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106769-p8.c b/gcc/testsuite/gcc.target/powerpc/pr106769-p8.c > new file mode 100644 > index 00000000000..e7cdbc76298 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106769-p8.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +#include "pr106769.h" > + > +/* { dg-final { scan-assembler {\mmfvsrwz\M} } } */ > +/* { dg-final { scan-assembler {\mstxsiwx\M} } } */ > +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106769-p9.c b/gcc/testsuite/gcc.target/powerpc/pr106769-p9.c > new file mode 100644 > index 00000000000..2205e434a86 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106769-p9.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +#include "pr106769.h" > + > +/* { dg-final { scan-assembler {\mmfvsrwz\M} } } */ > +/* { dg-final { scan-assembler {\mstxsiwx\M} } } */ > +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */ > +/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */ > +/* { dg-final { scan-assembler-not "vextuw\[rl\]x" } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106769.h b/gcc/testsuite/gcc.target/powerpc/pr106769.h > new file mode 100644 > index 00000000000..1c8c8a024f3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106769.h > @@ -0,0 +1,17 @@ > +#include > + > +#ifdef __BIG_ENDIAN__ > +#define LANE 1 > +#else > +#define LANE 2 > +#endif > + > +unsigned int foo1 (vector unsigned int v) > +{ > + return vec_extract(v, LANE); > +} > + > +void foo2 (vector unsigned int v, unsigned int* p) > +{ > + *p = vec_extract(v, LANE); > +}