From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 5FA583858D20 for ; Mon, 22 May 2023 09:40:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5FA583858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34M9HVFC017842; Mon, 22 May 2023 09:40:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=5zlBaod2WdYLDNgcspptHM7Gwo/qXwNiZmHBU1d5ROA=; b=FKAaiXCrt+KhhqG1dKURdv6Fl3kevC1qy1HtOJfSPGeGH5gA7ZpkpQ7EYfmckCBNwfdG 6CV1jipHSblL7FJ9RvqaQnL/ui2hz/WTZTja6ES0yYWIDhq+JzQfTkOGb4OlrmyM6Nw1 LugB1PZ1l+ho7p83fv0wH0WyfNaxKIOz+EJIrBCYKQQTeffzXPYkxaoIbeeSJQpity95 1wuAj3/XEaXxe/DWKbtBMdjFdBNsjAXFPat9aJTEMo/sUTEXT2JAI2DrL04yh5MQbepd j87BzAIAArkjAutQrA7IFq4FJdQez6ZT+iIHwwCLPzABag3ZQIoXea+BzPva95ZUlyzd 3g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qq78bf703-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 May 2023 09:40:51 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34M8lL8i006251; Mon, 22 May 2023 09:40:51 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qq78bf6yd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 May 2023 09:40:51 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34M6RqES022612; Mon, 22 May 2023 09:40:49 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3qppcf0r3d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 May 2023 09:40:48 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34M9ejEO44957988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 May 2023 09:40:45 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 426EC20043; Mon, 22 May 2023 09:40:45 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3016820040; Mon, 22 May 2023 09:40:43 +0000 (GMT) Received: from [9.177.82.45] (unknown [9.177.82.45]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 May 2023 09:40:42 +0000 (GMT) Message-ID: <926c008b-3156-074b-ade4-c671cee131af@linux.ibm.com> Date: Mon, 22 May 2023 17:40:41 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694] Content-Language: en-US To: HAO CHEN GUI Cc: Segher Boessenkool , David , Peter Bergner , gcc-patches References: <740e9ed6-8730-1dec-ca78-a002df8d431a@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: <740e9ed6-8730-1dec-ca78-a002df8d431a@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: MwVAOxl0N3Mq3DO82KRxT8MGI2w0CB3g X-Proofpoint-GUID: QqY8fD1wiJV0WriieinbB9JNUbwHV8Ux X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-22_06,2023-05-17_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 malwarescore=0 phishscore=0 impostorscore=0 priorityscore=1501 mlxlogscore=999 adultscore=0 lowpriorityscore=0 clxscore=1015 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305220081 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Haochen, on 2023/2/8 13:08, HAO CHEN GUI wrote: > Hi, > The logical operations for TImode is split after reload pass right now. Some > potential optimizations miss as the split is too late. This patch removes > TImode from "AND", "IOR", "XOR" and "NOT" expander so that these logical > operations can be split at expand pass. The new test case illustrates the > optimization. > > Two test cases of pr92398 are merged into one as all sub-targets generates > the same sequence of instructions with the patch. IIUC, this can also help PR target/93123. Add it to the PR marker too if so. This patch aligns with what the other ports do, I think it's good, but note that it can regress some case like: ``` vector unsigned __int128 test(unsigned __int128 *a, unsigned __int128 *b, unsigned __int128 *c, unsigned __int128 *d) { unsigned __int128 t1 = *a | *b; unsigned __int128 t2 = *c & *d; unsigned __int128 t3 = t1 ^ t2; return (vector unsigned __int128)t3; } ``` w/o the proposed patch: lxv 32,0(5) lxv 0,0(6) lxv 45,0(3) lxv 33,0(4) xxland 32,32,0 vor 2,1,13 vxor 2,2,0 vs. w/ this patch: ld 9,8(6) ld 8,0(5) ld 10,8(5) ld 0,0(6) ld 11,0(3) ld 6,8(3) ld 5,0(4) ld 7,8(4) and 8,8,0 and 10,10,9 or 9,5,11 xor 9,9,8 or 8,7,6 xor 8,8,10 mtvsrdd 34,8,9 It can get the optimal insn seq before, but fails to with the proposed patch. Apparently we don't have some support to get back the operation in vector when it's beneficial for now. I guess the cases in PR100694 and PR93123 are dominated and the regressed case is corner. So we can probably install this patch first and open a bug for further enhancement. Segher, what do you think of this? BR, Kewen > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > > Thanks > Gui Haochen > > > ChangeLog > 2023-02-08 Haochen Gui > > gcc/ > PR target/100694> * config/rs6000/rs6000.md (BOOL_128_V): New mode iterator for 128-bit > vector types. > (and3): Replace BOOL_128 with BOOL_128_V. > (ior3): Likewise. > (xor3): Likewise. > (one_cmpl2 expander): New expander with BOOL_128_V. > (one_cmpl2 insn_and_split): Rename to ... > (*one_cmpl2): ... this. > > gcc/testsuite/ > PR target/100694 > * gcc.target/powerpc/pr100694.c: New. > * gcc.target/powerpc/pr92398.c: New. > * gcc.target/powerpc/pr92398.h: Remove. > * gcc.target/powerpc/pr92398.p9-.c: Remove. > * gcc.target/powerpc/pr92398.p9+.c: Remove. > > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 4bd1dfd3da9..455b7329643 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -743,6 +743,15 @@ (define_mode_iterator BOOL_128 [TI > (V2DF "TARGET_ALTIVEC") > (V1TI "TARGET_ALTIVEC")]) > > +;; Mode iterator for logical operations on 128-bit vector types > +(define_mode_iterator BOOL_128_V [(V16QI "TARGET_ALTIVEC") > + (V8HI "TARGET_ALTIVEC") > + (V4SI "TARGET_ALTIVEC") > + (V4SF "TARGET_ALTIVEC") > + (V2DI "TARGET_ALTIVEC") > + (V2DF "TARGET_ALTIVEC") > + (V1TI "TARGET_ALTIVEC")]) > + > ;; For the GPRs we use 3 constraints for register outputs, two that are the > ;; same as the output register, and a third where the output register is an > ;; early clobber, so we don't have to deal with register overlaps. For the > @@ -7135,23 +7144,23 @@ (define_expand "subti3" > ;; 128-bit logical operations expanders > > (define_expand "and3" > - [(set (match_operand:BOOL_128 0 "vlogical_operand") > - (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") > - (match_operand:BOOL_128 2 "vlogical_operand")))] > + [(set (match_operand:BOOL_128_V 0 "vlogical_operand") > + (and:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand") > + (match_operand:BOOL_128_V 2 "vlogical_operand")))] > "" > "") > > (define_expand "ior3" > - [(set (match_operand:BOOL_128 0 "vlogical_operand") > - (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") > - (match_operand:BOOL_128 2 "vlogical_operand")))] > + [(set (match_operand:BOOL_128_V 0 "vlogical_operand") > + (ior:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand") > + (match_operand:BOOL_128_V 2 "vlogical_operand")))] > "" > "") > > (define_expand "xor3" > - [(set (match_operand:BOOL_128 0 "vlogical_operand") > - (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") > - (match_operand:BOOL_128 2 "vlogical_operand")))] > + [(set (match_operand:BOOL_128_V 0 "vlogical_operand") > + (xor:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand") > + (match_operand:BOOL_128_V 2 "vlogical_operand")))] > "" > "") > > @@ -7449,7 +7458,14 @@ (define_insn_and_split "*eqv3_internal2" > (const_string "16")))]) > > ;; 128-bit one's complement > -(define_insn_and_split "one_cmpl2" > +(define_expand "one_cmpl2" > +[(set (match_operand:BOOL_128_V 0 "vlogical_operand" "=") > + (not:BOOL_128_V > + (match_operand:BOOL_128_V 1 "vlogical_operand" "")))] > + "" > + "") > + > +(define_insn_and_split "*one_cmpl2" > [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") > (not:BOOL_128 > (match_operand:BOOL_128 1 "vlogical_operand" "")))] > diff --git a/gcc/testsuite/gcc.target/powerpc/pr100694.c b/gcc/testsuite/gcc.target/powerpc/pr100694.c > new file mode 100644 > index 00000000000..96a895d6c44 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 3 } } */ > + > +/* It just needs two std and one blr. */ > +void foo (unsigned __int128* res, unsigned long long hi, unsigned long long lo) > +{ > + unsigned __int128 i = hi; > + i <<= 64; > + i |= lo; > + *res = i; > +} > + > diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.c b/gcc/testsuite/gcc.target/powerpc/pr92398.c > new file mode 100644 > index 00000000000..7d6201cc5bb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr92398.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-times {\mnot\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */ > + > +/* All platforms should generate the same instructions: not;not;std;std. */ > +void bar (__int128_t *dst, __int128_t src) > +{ > + *dst = ~src; > +} > + > diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.h b/gcc/testsuite/gcc.target/powerpc/pr92398.h > deleted file mode 100644 > index 5a4a8bcab80..00000000000 > --- a/gcc/testsuite/gcc.target/powerpc/pr92398.h > +++ /dev/null > @@ -1,17 +0,0 @@ > -/* This test code is included into pr92398.p9-.c and pr92398.p9+.c. > - The two files have the tests for the number of instructions generated for > - P9- versus P9+. > - > - store generates difference instructions as below: > - P9+: mtvsrdd;xxlnot;stxv. > - P8/P7/P6 LE: not;not;std;std. > - P8 BE: mtvsrd;mtvsrd;xxpermdi;xxlnor;stxvd2x. > - P7/P6 BE: std;std;addi;lxvd2x;xxlnor;stxvd2x. > - P9+ and P9- LE are expected, P6/P7/P8 BE are unexpected. */ > - > -void > -bar (__int128_t *dst, __int128_t src) > -{ > - *dst = ~src; > -} > - > diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c > deleted file mode 100644 > index 72dd1d9a274..00000000000 > --- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c > +++ /dev/null > @@ -1,12 +0,0 @@ > -/* { dg-do compile { target { lp64 && has_arch_pwr9 } } } */ > -/* { dg-require-effective-target powerpc_vsx_ok } */ > -/* { dg-options "-O2 -mvsx" } */ > - > -/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */ > -/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */ > -/* { dg-final { scan-assembler-times {\mstxv\M} 1 } } */ > -/* { dg-final { scan-assembler-not {\mld\M} } } */ > -/* { dg-final { scan-assembler-not {\mnot\M} } } */ > - > -/* Source code for the test in pr92398.h */ > -#include "pr92398.h" > diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c > deleted file mode 100644 > index bd7fa98af51..00000000000 > --- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c > +++ /dev/null > @@ -1,10 +0,0 @@ > -/* { dg-do compile { target { lp64 && {! has_arch_pwr9} } } } */ > -/* { dg-require-effective-target powerpc_vsx_ok } */ > -/* { dg-options "-O2 -mvsx" } */ > - > -/* { dg-final { scan-assembler-times {\mnot\M} 2 { xfail be } } } */ > -/* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { { {! has_arch_pwr9} && has_arch_pwr8 } && be } } } } */ > - > -/* Source code for the test in pr92398.h */ > -#include "pr92398.h" > -