From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 335DF3857C52; Sat, 10 Oct 2020 08:08:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 335DF3857C52 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 09A82qAi063463; Sat, 10 Oct 2020 04:08:46 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 34389490e1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 10 Oct 2020 04:08:46 -0400 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 09A84uT7068904; Sat, 10 Oct 2020 04:08:46 -0400 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 34389490d6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 10 Oct 2020 04:08:45 -0400 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 09A88Awm002550; Sat, 10 Oct 2020 08:08:43 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma04fra.de.ibm.com with ESMTP id 3434k7r30n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 10 Oct 2020 08:08:43 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 09A88eDZ30933356 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 10 Oct 2020 08:08:40 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7343B52050; Sat, 10 Oct 2020 08:08:40 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 571B752054; Sat, 10 Oct 2020 08:08:39 +0000 (GMT) From: Xionghu Luo To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, wschmidt@linux.ibm.com, guojiufu@linux.ibm.com, linkw@gcc.gnu.org, Xionghu Luo Subject: [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251] Date: Sat, 10 Oct 2020 03:08:23 -0500 Message-Id: <20201010080825.3599892-3-luoxhu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201010080825.3599892-1-luoxhu@linux.ibm.com> References: <20201010080825.3599892-1-luoxhu@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-10-10_03:2020-10-09, 2020-10-10 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 spamscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 mlxscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 impostorscore=0 clxscore=1015 suspectscore=1 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010100074 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Oct 2020 08:08:49 -0000 vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value to be insert, arg2 is the place to insert arg1 to arg0. Current expander generates stxv+stwx+lxv if arg2 is variable instead of constant, which causes serious store hit load performance issue on Power. This patch tries 1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to unify the gimple code, then expander could use vec_set_optab to expand. 2) Expand the IFN VEC_SET to fast instructions: lvsr+insert+lvsl. In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too early in gimple stage if arg2 is variable, avoid generating store hit load instructions. For Power9 V4SI: addi 9,1,-16 rldic 6,6,2,60 stxv 34,-16(1) stwx 5,9,6 lxv 34,-16(1) => rlwinm 6,6,2,28,29 mtvsrwz 0,5 lvsr 1,0,6 lvsl 0,0,6 xxperm 34,34,33 xxinsertw 34,0,12 xxperm 34,34,32 Though instructions increase from 5 to 7, the performance is improved 60% in typical cases. Tested with V2DI, V2DF V4SI, V4SF, V8HI, V16QI on Power9-LE. gcc/ChangeLog: 2020-10-10 Xionghu Luo * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Ajdust variable index vec_insert from address dereference to ARRAY_REF(VIEW_CONVERT_EXPR) tree expression. * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var): New declaration. * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): New function. * config/rs6000/vector.md (vec_set): Support both constant and variable index vec_set. gcc/testsuite/ChangeLog: 2020-10-10 Xionghu Luo * gcc.target/powerpc/pr79251.p9.c: New test. * gcc.target/powerpc/pr79251-run.c: New test. * gcc.target/powerpc/pr79251.h: New header. --- gcc/config/rs6000/rs6000-c.c | 25 ++++----- gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c | 53 +++++++++++++++++++ .../gcc.target/powerpc/pr79251-run.c | 28 ++++++++++ gcc/testsuite/gcc.target/powerpc/pr79251.h | 19 +++++++ gcc/testsuite/gcc.target/powerpc/pr79251.p9.c | 18 +++++++ 6 files changed, 130 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index cc1e997524e..5551a21d738 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -1512,9 +1512,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, tree arg1; tree arg2; tree arg1_type; - tree arg1_inner_type; tree decl, stmt; - tree innerptrtype; machine_mode mode; /* No second or third arguments. */ @@ -1566,8 +1564,13 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, return build_call_expr (call, 3, arg1, arg0, arg2); } - /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0. */ - arg1_inner_type = TREE_TYPE (arg1_type); + /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0 with + VIEW_CONVERT_EXPR. i.e.: + D.3192 = v1; + _1 = n & 3; + VIEW_CONVERT_EXPR(D.3192)[_1] = i; + v1 = D.3192; + D.3194 = v1; */ if (TYPE_VECTOR_SUBPARTS (arg1_type) == 1) arg2 = build_int_cst (TREE_TYPE (arg2), 0); else @@ -1582,6 +1585,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, TREE_USED (decl) = 1; TREE_TYPE (decl) = arg1_type; TREE_READONLY (decl) = TYPE_READONLY (arg1_type); + TREE_ADDRESSABLE (decl) = 1; if (c_dialect_cxx ()) { stmt = build4 (TARGET_EXPR, arg1_type, decl, arg1, @@ -1592,19 +1596,12 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, { DECL_INITIAL (decl) = arg1; stmt = build1 (DECL_EXPR, arg1_type, decl); - TREE_ADDRESSABLE (decl) = 1; SET_EXPR_LOCATION (stmt, loc); stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt); } - - innerptrtype = build_pointer_type (arg1_inner_type); - - stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0); - stmt = convert (innerptrtype, stmt); - stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1); - stmt = build_indirect_ref (loc, stmt, RO_NULL); - stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt, - convert (TREE_TYPE (stmt), arg0)); + stmt = build_array_ref (loc, stmt, arg2); + stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, + convert (TREE_TYPE (stmt), arg0)); stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); return stmt; } diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 3578136e79b..4b6131a5145 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -58,6 +58,7 @@ extern bool rs6000_split_128bit_ok_p (rtx []); extern void rs6000_expand_float128_convert (rtx, rtx, bool); extern void rs6000_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, rtx); +extern void rs6000_expand_vector_set_var (rtx, rtx, rtx); extern void rs6000_expand_vector_extract (rtx, rtx, rtx); extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx); extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index a5b59395abd..96f76c7a74c 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -6709,6 +6709,12 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx) if (VECTOR_MEM_VSX_P (mode)) { + if (!CONST_INT_P (elt_rtx)) + { + rs6000_expand_vector_set_var (target, val, elt_rtx); + return; + } + rtx insn = NULL_RTX; if (mode == V2DFmode) @@ -6799,6 +6805,53 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx) emit_insn (gen_rtx_SET (target, x)); } +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX + is variable and also counts by vector element size. */ + +void +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx) +{ + machine_mode mode = GET_MODE (target); + + gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); + + gcc_assert (GET_MODE (idx) == E_SImode); + + machine_mode inner_mode = GET_MODE (val); + + rtx tmp = gen_reg_rtx (GET_MODE (idx)); + int width = GET_MODE_SIZE (inner_mode); + + gcc_assert (width >= 1 && width <= 8); + + int shift = exact_log2 (width); + /* Generate the IDX for permute shift, width is the vector element size. + idx = idx * width. */ + emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift))); + + tmp = convert_modes (DImode, SImode, tmp, 1); + + /* lvsr v1,0,idx. */ + rtx pcvr = gen_reg_rtx (V16QImode); + emit_insn (gen_altivec_lvsr_reg (pcvr, tmp)); + + /* lvsl v2,0,idx. */ + rtx pcvl = gen_reg_rtx (V16QImode); + emit_insn (gen_altivec_lvsl_reg (pcvl, tmp)); + + rtx sub_target = simplify_gen_subreg (V16QImode, target, mode, 0); + + rtx permr + = gen_altivec_vperm_v8hiv16qi (sub_target, sub_target, sub_target, pcvr); + emit_insn (permr); + + rs6000_expand_vector_set (target, val, const0_rtx); + + rtx perml + = gen_altivec_vperm_v8hiv16qi (sub_target, sub_target, sub_target, pcvl); + emit_insn (perml); +} + /* Extract field ELT from VEC into TARGET. */ void diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c new file mode 100644 index 00000000000..08f69df1146 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c @@ -0,0 +1,28 @@ +/* { dg-options "-O2 -maltivec" } */ + +#include +#include +#include "pr79251.h" + +TEST_VEC_INSERT_ALL (test) + +#define run_test(TYPE, num) \ + { \ + vector TYPE v; \ + vector TYPE u = {0x0}; \ + for (long k = 0; k < 16 / sizeof (TYPE); k++) \ + v[k] = 0xaa; \ + for (long k = 0; k < 16 / sizeof (TYPE); k++) \ + { \ + u = test##num (v, 254, k); \ + if (u[k] != (TYPE) 254) \ + __builtin_abort (); \ + } \ + } + +int +main (void) +{ + TEST_VEC_INSERT_ALL (run_test) + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.h b/gcc/testsuite/gcc.target/powerpc/pr79251.h new file mode 100644 index 00000000000..addb067f9ed --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.h @@ -0,0 +1,19 @@ + +#define test(TYPE, num) \ + __attribute__ ((noinline, noclone)) \ + vector TYPE test##num (vector TYPE v, TYPE i, signed int n) \ + { \ + return vec_insert (i, v, n); \ + } + +#define TEST_VEC_INSERT_ALL(T) \ + T (char, 0) \ + T (unsigned char, 1) \ + T (short, 2) \ + T (unsigned short, 3) \ + T (int, 4) \ + T (unsigned int, 5) \ + T (long long, 6) \ + T (unsigned long long, 7) \ + T (float, 8) \ + T (double, 9) diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p9.c b/gcc/testsuite/gcc.target/powerpc/pr79251.p9.c new file mode 100644 index 00000000000..ec1cb255888 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p9.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -maltivec" } */ + +#include +#include +#include "pr79251.h" + +TEST_VEC_INSERT_ALL (test) + +/* { dg-final { scan-assembler-not {\mstxw\M} } } */ +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mlvsr\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mxxperm\M} 20 } } */ +/* { dg-final { scan-assembler-times {\mxxinsertw\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mvinserth\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvinsertb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 } } */ -- 2.25.1