From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id A063D3858025; Fri, 27 Nov 2020 01:04:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A063D3858025 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0AR12BhZ069088; Thu, 26 Nov 2020 20:04:44 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 352ku348s3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 26 Nov 2020 20:04:44 -0500 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0AR12XLg070121; Thu, 26 Nov 2020 20:04:43 -0500 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0b-001b2d01.pphosted.com with ESMTP id 352ku348rp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 26 Nov 2020 20:04:43 -0500 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0AR0lftn031952; Fri, 27 Nov 2020 01:04:42 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma06ams.nl.ibm.com with ESMTP id 352kgk84u3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Nov 2020 01:04:41 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0AR14dmT49807678 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 27 Nov 2020 01:04:39 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 02ADAAE055; Fri, 27 Nov 2020 01:04:39 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4CA4FAE04D; Fri, 27 Nov 2020 01:04:36 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.48.31]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 27 Nov 2020 01:04:35 +0000 (GMT) Subject: Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8 To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, wschmidt@linux.ibm.com, guojiufu@linux.ibm.com, linkw@gcc.gnu.org References: <20201010080825.3599892-1-luoxhu@linux.ibm.com> <20201010080825.3599892-4-luoxhu@linux.ibm.com> From: Xionghu Luo Message-ID: <812895eb-bacc-2ffc-932b-f0836e198702@linux.ibm.com> Date: Fri, 27 Nov 2020 09:04:33 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201010080825.3599892-4-luoxhu@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-11-26_14:2020-11-26, 2020-11-26 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 suspectscore=1 bulkscore=0 mlxlogscore=999 spamscore=0 clxscore=1015 adultscore=0 phishscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011270000 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2020 01:04:46 -0000 Hi Segher, Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your opinion of this [PATCH 3/4] for P8, please? xxinsertw only exists since v3.0, so we had to implement by another way. Xionghu On 2020/10/10 16:08, Xionghu Luo wrote: > gcc/ChangeLog: > > 2020-10-10 Xionghu Luo > > * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): > Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later > platforms. > * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update > to call different path for P8 and P9. > (rs6000_expand_vector_set_var_p9): New function. > (rs6000_expand_vector_set_var_p8): New function. > > gcc/testsuite/ChangeLog: > > 2020-10-10 Xionghu Luo > > * gcc.target/powerpc/pr79251.p8.c: New test. > --- > gcc/config/rs6000/rs6000-c.c | 27 +++- > gcc/config/rs6000/rs6000.c | 117 +++++++++++++++++- > gcc/testsuite/gcc.target/powerpc/pr79251.p8.c | 17 +++ > 3 files changed, 155 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c > > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > index 5551a21d738..4bea8001ec6 100644 > --- a/gcc/config/rs6000/rs6000-c.c > +++ b/gcc/config/rs6000/rs6000-c.c > @@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, > SET_EXPR_LOCATION (stmt, loc); > stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt); > } > - stmt = build_array_ref (loc, stmt, arg2); > - stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, > - convert (TREE_TYPE (stmt), arg0)); > - stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); > + > + if (TARGET_P8_VECTOR) > + { > + stmt = build_array_ref (loc, stmt, arg2); > + stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, > + convert (TREE_TYPE (stmt), arg0)); > + stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); > + } > + else > + { > + tree arg1_inner_type; > + tree innerptrtype; > + arg1_inner_type = TREE_TYPE (arg1_type); > + innerptrtype = build_pointer_type (arg1_inner_type); > + > + stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0); > + stmt = convert (innerptrtype, stmt); > + stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1); > + stmt = build_indirect_ref (loc, stmt, RO_NULL); > + stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt, > + convert (TREE_TYPE (stmt), arg0)); > + stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); > + } > return stmt; > } > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 96f76c7a74c..33ca839cb28 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx) > } > > /* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX > - is variable and also counts by vector element size. */ > + is variable and also counts by vector element size for p9 and above. */ > > void > -rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx) > +rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx) > { > machine_mode mode = GET_MODE (target); > > @@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx) > emit_insn (perml); > } > > +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX > + is variable and also counts by vector element size for p8. */ > + > +void > +rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx) > +{ > + machine_mode mode = GET_MODE (target); > + > + gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); > + > + gcc_assert (GET_MODE (idx) == E_SImode); > + > + machine_mode inner_mode = GET_MODE (val); > + HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode); > + > + rtx tmp = gen_reg_rtx (GET_MODE (idx)); > + int width = GET_MODE_SIZE (inner_mode); > + > + gcc_assert (width >= 1 && width <= 4); > + > + if (!BYTES_BIG_ENDIAN) > + { > + /* idx = idx * width. */ > + emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > + /* idx = idx + 8. */ > + emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8))); > + } > + else > + { > + emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > + emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp)); > + } > + > + /* lxv vs33, mask. > + DImode: 0xffffffffffffffff0000000000000000 > + SImode: 0x00000000ffffffff0000000000000000 > + HImode: 0x000000000000ffff0000000000000000. > + QImode: 0x00000000000000ff0000000000000000. */ > + rtx mask = gen_reg_rtx (V16QImode); > + rtx mask_v2di = gen_reg_rtx (V2DImode); > + rtvec v = rtvec_alloc (2); > + if (!BYTES_BIG_ENDIAN) > + { > + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0); > + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask); > + } > + else > + { > + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask); > + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0); > + } > + emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL (V2DImode, v))); > + rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, V2DImode, 0); > + emit_insn (gen_rtx_SET (mask, sub_mask)); > + > + /* mtvsrd[wz] f0,tmp_val. */ > + rtx tmp_val = gen_reg_rtx (SImode); > + if (inner_mode == E_SFmode) > + emit_insn (gen_movsi_from_sf (tmp_val, val)); > + else > + tmp_val = force_reg (SImode, val); > + > + rtx val_v16qi = gen_reg_rtx (V16QImode); > + rtx val_v2di = gen_reg_rtx (V2DImode); > + rtvec vec_val = rtvec_alloc (2); > + if (!BYTES_BIG_ENDIAN) > + { > + RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0); > + RTVEC_ELT (vec_val, 1) = tmp_val; > + } > + else > + { > + RTVEC_ELT (vec_val, 0) = tmp_val; > + RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0); > + } > + emit_insn ( > + gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, vec_val))); > + rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 0); > + emit_insn (gen_rtx_SET (val_v16qi, sub_val)); > + > + /* lvsl 13,0,idx. */ > + tmp = convert_modes (DImode, SImode, tmp, 1); > + rtx pcv = gen_reg_rtx (V16QImode); > + emit_insn (gen_altivec_lvsl_reg (pcv, tmp)); > + > + /* vperm 1,1,1,13. */ > + /* vperm 0,0,0,13. */ > + rtx val_perm = gen_reg_rtx (V16QImode); > + rtx mask_perm = gen_reg_rtx (V16QImode); > + emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, val_v16qi, pcv)); > + emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, pcv)); > + > + rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0); > + > + /* xxsel 34,34,32,33. */ > + emit_insn ( > + gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, mask_perm)); > +} > + > +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX > + is variable and also counts by vector element size. */ > + > +void > +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx) > +{ > + machine_mode mode = GET_MODE (target); > + machine_mode inner_mode = GET_MODE_INNER (mode); > + if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8) > + rs6000_expand_vector_set_var_p9 (target, val, idx); > + else > + rs6000_expand_vector_set_var_p8 (target, val, idx); > +} > + > /* Extract field ELT from VEC into TARGET. */ > > void > diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c > new file mode 100644 > index 00000000000..06da47b7758 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */ > + > +#include > +#include > +#include "pr79251.h" > + > +TEST_VEC_INSERT_ALL (test) > + > +/* { dg-final { scan-assembler-not {\mstxw\M} } } */ > +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */ > +/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */ > +/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */ > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */ > +/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */ > + > -- Thanks, Xionghu