From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 439703858413 for ; Thu, 17 Mar 2022 22:03:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 439703858413 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 22HJDODN002681; Thu, 17 Mar 2022 22:03:38 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3euv2ymuy0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Mar 2022 22:03:37 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 22HLcYFJ000465; Thu, 17 Mar 2022 22:03:37 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 3euv2ymuxt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Mar 2022 22:03:37 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 22HM3YW5016403; Thu, 17 Mar 2022 22:03:36 GMT Received: from b03cxnp07027.gho.boulder.ibm.com (b03cxnp07027.gho.boulder.ibm.com [9.17.130.14]) by ppma04dal.us.ibm.com with ESMTP id 3erk5a7qna-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Mar 2022 22:03:36 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp07027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 22HM3YGF33227224 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 17 Mar 2022 22:03:34 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7C27013606F; Thu, 17 Mar 2022 22:03:34 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A7460136074; Thu, 17 Mar 2022 22:03:33 +0000 (GMT) Received: from sig-9-65-215-144.ibm.com (unknown [9.65.215.144]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 17 Mar 2022 22:03:33 +0000 (GMT) Message-ID: Subject: Re: [PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316] From: will schmidt To: HAO CHEN GUI , gcc-patches Cc: Peter Bergner , David , Segher Boessenkool Date: Thu, 17 Mar 2022 17:03:32 -0500 In-Reply-To: <3f35ec32-cb71-d827-02da-e4042091b8e5@linux.ibm.com> References: <3f35ec32-cb71-d827-02da-e4042091b8e5@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-18.el8) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Q6LK8dypCjF3AzyqrYYiGy-sUbEelsfH X-Proofpoint-ORIG-GUID: pAhGQInlcFiGHQI6v4rrg7SDhsHTH_O_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.850,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-03-17_07,2022-03-15_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 malwarescore=0 mlxscore=0 lowpriorityscore=0 suspectscore=0 adultscore=0 mlxlogscore=999 spamscore=0 bulkscore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2203170121 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2022 22:03:41 -0000 On Thu, 2022-03-17 at 13:35 +0800, HAO CHEN GUI via Gcc-patches wrote: > Hi, > This patch adds V1TI mode into a new mode iterator used in vector > comparison expands.With the patch, both built-ins and direct > comparison > could generate P10 new V1TI comparison instructions. Hi, - /* We deliberately omit RS6000_BIF_CMPGE_1TI ... - for now, because gimple folding produces worse code for 128-bit - compares. */ I assume it is the case, but don't see a before/after example to clarify the situation. A clear statement that the 'worse code' situation has been resolved with this addition of TI modes into the iterators, would be good. Otherwise lgtm. :-) Thanks, -Will > > Bootstrapped and tested on ppc64 Linux BE and LE with no > regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-03-16 Haochen Gui > > gcc/ > PR target/103316 > * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable > gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET, > RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT, > RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI. > * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10 > V1TI instructions. > (vec_cmp): Set mode iterator to VEC_IC. > (vec_cmpu): Likewise. > > gcc/testsuite/ > PR target/103316 > * gcc.target/powerpc/pr103316.c: New. > * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector > __int128. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index 5d34c1bcfc9..fac7f43f438 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_VCMPEQUH: > case RS6000_BIF_VCMPEQUW: > case RS6000_BIF_VCMPEQUD: > - /* We deliberately omit RS6000_BIF_VCMPEQUT for now, because > gimple > - folding produces worse code for 128-bit compares. */ > + case RS6000_BIF_VCMPEQUT: > fold_compare_helper (gsi, EQ_EXPR, stmt); > return true; > > case RS6000_BIF_VCMPNEB: > case RS6000_BIF_VCMPNEH: > case RS6000_BIF_VCMPNEW: > - /* We deliberately omit RS6000_BIF_VCMPNET for now, because > gimple > - folding produces worse code for 128-bit compares. */ > + case RS6000_BIF_VCMPNET: > fold_compare_helper (gsi, NE_EXPR, stmt); > return true; > > @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_CMPGE_U4SI: > case RS6000_BIF_CMPGE_2DI: > case RS6000_BIF_CMPGE_U2DI: > - /* We deliberately omit RS6000_BIF_CMPGE_1TI and > RS6000_BIF_CMPGE_U1TI > - for now, because gimple folding produces worse code for 128- > bit > - compares. */ > + case RS6000_BIF_CMPGE_1TI: > + case RS6000_BIF_CMPGE_U1TI: > fold_compare_helper (gsi, GE_EXPR, stmt); > return true; > > @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_VCMPGTUW: > case RS6000_BIF_VCMPGTUD: > case RS6000_BIF_VCMPGTSD: > - /* We deliberately omit RS6000_BIF_VCMPGTUT and > RS6000_BIF_VCMPGTST > - for now, because gimple folding produces worse code for 128- > bit > - compares. */ > + case RS6000_BIF_VCMPGTUT: > + case RS6000_BIF_VCMPGTST: > fold_compare_helper (gsi, GT_EXPR, stmt); > return true; > > @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_CMPLE_U4SI: > case RS6000_BIF_CMPLE_2DI: > case RS6000_BIF_CMPLE_U2DI: > - /* We deliberately omit RS6000_BIF_CMPLE_1TI and > RS6000_BIF_CMPLE_U1TI > - for now, because gimple folding produces worse code for 128- > bit > - compares. */ > + case RS6000_BIF_CMPLE_1TI: > + case RS6000_BIF_CMPLE_U1TI: > fold_compare_helper (gsi, LE_EXPR, stmt); > return true; > > diff --git a/gcc/config/rs6000/vector.md > b/gcc/config/rs6000/vector.md > index b87a742cca8..d88869cc8d0 100644 > --- a/gcc/config/rs6000/vector.md > +++ b/gcc/config/rs6000/vector.md > @@ -26,6 +26,9 @@ > ;; Vector int modes > (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) > > +;; Vector int modes for comparison > +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI > "TARGET_POWER10")]) > + > ;; 128-bit int modes > (define_mode_iterator VEC_TI [V1TI TI]) > > @@ -533,10 +536,10 @@ (define_expand "vcond_mask_" > > ;; For signed integer vectors comparison. > (define_expand "vec_cmp" > - [(set (match_operand:VEC_I 0 "vint_operand") > + [(set (match_operand:VEC_IC 0 "vint_operand") > (match_operator 1 "signed_or_equality_comparison_operator" > - [(match_operand:VEC_I 2 "vint_operand") > - (match_operand:VEC_I 3 "vint_operand")]))] > + [(match_operand:VEC_IC 2 "vint_operand") > + (match_operand:VEC_IC 3 "vint_operand")]))] > "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" > { > enum rtx_code code = GET_CODE (operands[1]); > @@ -573,10 +576,10 @@ (define_expand "vec_cmp" > > ;; For unsigned integer vectors comparison. > (define_expand "vec_cmpu" > - [(set (match_operand:VEC_I 0 "vint_operand") > + [(set (match_operand:VEC_IC 0 "vint_operand") > (match_operator 1 "unsigned_or_equality_comparison_operator" > - [(match_operand:VEC_I 2 "vint_operand") > - (match_operand:VEC_I 3 "vint_operand")]))] > + [(match_operand:VEC_IC 2 "vint_operand") > + (match_operand:VEC_IC 3 "vint_operand")]))] > "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" > { > enum rtx_code code = GET_CODE (operands[1]); > diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-int128.c > b/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-int128.c > new file mode 100644 > index 00000000000..1a4db0f45d4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-int128.c > @@ -0,0 +1,86 @@ > +/* Verify that overloaded built-ins for vec_cmp with __int128 > + inputs produce the right code. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include > + > +vector bool __int128 > +test3_eq (vector signed __int128 x, vector signed __int128 y) > +{ > + return vec_cmpeq (x, y); > +} > + > +vector bool __int128 > +test6_eq (vector unsigned __int128 x, vector unsigned __int128 y) > +{ > + return vec_cmpeq (x, y); > +} > + > +vector bool __int128 > +test3_ge (vector signed __int128 x, vector signed __int128 y) > +{ > + return vec_cmpge (x, y); > +} > + > +vector bool __int128 > +test6_ge (vector unsigned __int128 x, vector unsigned __int128 y) > +{ > + return vec_cmpge (x, y); > +} > + > +vector bool __int128 > +test3_gt (vector signed __int128 x, vector signed __int128 y) > +{ > + return vec_cmpgt (x, y); > +} > + > +vector bool __int128 > +test6_gt (vector unsigned __int128 x, vector unsigned __int128 y) > +{ > + return vec_cmpgt (x, y); > +} > + > +vector bool __int128 > +test3_le (vector signed __int128 x, vector signed __int128 y) > +{ > + return vec_cmple (x, y); > +} > + > +vector bool __int128 > +test6_le (vector unsigned __int128 x, vector unsigned __int128 y) > +{ > + return vec_cmple (x, y); > +} > + > +vector bool __int128 > +test3_lt (vector signed __int128 x, vector signed __int128 y) > +{ > + return vec_cmplt (x, y); > +} > + > +vector bool __int128 > +test6_lt (vector unsigned __int128 x, vector unsigned __int128 y) > +{ > + return vec_cmplt (x, y); > +} > + > +vector bool __int128 > +test3_ne (vector signed __int128 x, vector signed __int128 y) > +{ > + return vec_cmpne (x, y); > +} > + > +vector bool __int128 > +test6_ne (vector unsigned __int128 x, vector unsigned __int128 y) > +{ > + return vec_cmpne (x, y); > +} > + > +/* { dg-final { scan-assembler-times "vcmpequq" 4 } } */ > +/* { dg-final { scan-assembler-times "vcmpgtsq" 4 } } */ > +/* { dg-final { scan-assembler-times "vcmpgtuq" 4 } } */ > +/* { dg-final { scan-assembler-times "xxlnor" 6 } } */ > + > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103316.c > b/gcc/testsuite/gcc.target/powerpc/pr103316.c > new file mode 100644 > index 00000000000..02f7dc5ca1b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr103316.c > @@ -0,0 +1,80 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +vector bool __int128 > +test_eq (vector signed __int128 a, vector signed __int128 b) > +{ > + return a == b; > +} > + > +vector bool __int128 > +test_ne (vector signed __int128 a, vector signed __int128 b) > +{ > + return a != b; > +} > + > +vector bool __int128 > +test_gt (vector signed __int128 a, vector signed __int128 b) > +{ > + return a > b; > +} > + > +vector bool __int128 > +test_ge (vector signed __int128 a, vector signed __int128 b) > +{ > + return a >= b; > +} > + > +vector bool __int128 > +test_lt (vector signed __int128 a, vector signed __int128 b) > +{ > + return a < b; > +} > + > +vector bool __int128 > +test_le (vector signed __int128 a, vector signed __int128 b) > +{ > + return a <= b; > +} > + > +vector bool __int128 > +testu_eq (vector unsigned __int128 a, vector unsigned __int128 b) > +{ > + return a == b; > +} > + > +vector bool __int128 > +testu_ne (vector unsigned __int128 a, vector unsigned __int128 b) > +{ > + return a != b; > +} > + > +vector bool __int128 > +testu_gt (vector unsigned __int128 a, vector unsigned __int128 b) > +{ > + return a > b; > +} > + > +vector bool __int128 > +testu_ge (vector unsigned __int128 a, vector unsigned __int128 b) > +{ > + return a >= b; > +} > + > +vector bool __int128 > +testu_lt (vector unsigned __int128 a, vector unsigned __int128 b) > +{ > + return a < b; > +} > + > +vector bool __int128 > +testu_le (vector unsigned __int128 a, vector unsigned __int128 b) > +{ > + return a <= b; > +} > + > +/* { dg-final { scan-assembler-times "vcmpequq" 4 } } */ > +/* { dg-final { scan-assembler-times "vcmpgtsq" 4 } } */ > +/* { dg-final { scan-assembler-times "vcmpgtuq" 4 } } */ > +/* { dg-final { scan-assembler-times "xxlnor" 6 } } */