From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 699353858C54 for ; Tue, 9 Aug 2022 03:14:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 699353858C54 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27938iBZ009930; Tue, 9 Aug 2022 03:14:26 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3huer9977c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Aug 2022 03:14:25 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2792oCXi020149; Tue, 9 Aug 2022 03:14:24 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3huer9976w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Aug 2022 03:14:24 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 27936L2v013050; Tue, 9 Aug 2022 03:14:23 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma02fra.de.ibm.com with ESMTP id 3hsfx8tu44-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Aug 2022 03:14:23 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2793EJCj27394478 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Aug 2022 03:14:19 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C123E42042; Tue, 9 Aug 2022 03:14:19 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E7A814203F; Tue, 9 Aug 2022 03:14:17 +0000 (GMT) Received: from [9.197.252.244] (unknown [9.197.252.244]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 9 Aug 2022 03:14:17 +0000 (GMT) Message-ID: <3a98158e-a26a-8e40-151d-6910782b27da@linux.ibm.com> Date: Tue, 9 Aug 2022 11:14:16 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH v2, rs6000] Add multiply-add expand pattern [PR103109] Content-Language: en-US To: HAO CHEN GUI Cc: Segher Boessenkool , David , Peter Bergner , gcc-patches References: <9c7df6aa-c768-114e-4b1e-df6ce65a63b9@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: <9c7df6aa-c768-114e-4b1e-df6ce65a63b9@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Olob64NrH4fpTjOFgI-PdD3SwEz9qtHA X-Proofpoint-ORIG-GUID: LUPPJUyBjHsPxAIg8iTksmzSjDoi4ue9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-08_14,2022-08-08_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 phishscore=0 mlxlogscore=999 adultscore=0 spamscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 mlxscore=0 malwarescore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2208090013 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2022 03:14:28 -0000 Hi Haochen, Thanks for the patch. on 2022/8/8 14:04, HAO CHEN GUI wrote: > Hi, > This patch adds an expand and several insns for multiply-add with three > 64bit operands. > > Compared with last version, the main changes are: > 1 The "maddld" pattern is reused for the low-part generation. > 2 A runnable testcase replaces the original compiling case. > 3 Fixes indention problems. > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-08-08 Haochen Gui > > gcc/ > PR target/103109 > * config/rs6000/rs6000.md (maddditi4): New pattern for multiply-add. > (madddi4_highpart): New. > (madddi4_highpart_le): New. > > gcc/testsuite/ > PR target/103109 > * gcc.target/powerpc/pr103109.c: New. > > > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index c55ee7e171a..4c58023490a 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -3217,7 +3217,7 @@ (define_expand "mul3" > DONE; > }) > > -(define_insn "*maddld4" > +(define_insn "maddld4" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > (match_operand:GPR 2 "gpc_reg_operand" "r")) > @@ -3226,6 +3226,52 @@ (define_insn "*maddld4" > "maddld %0,%1,%2,%3" > [(set_attr "type" "mul")]) > > +(define_expand "maddditi4" > + [(set (match_operand:TI 0 "gpc_reg_operand") > + (plus:TI > + (mult:TI (any_extend:TI (match_operand:DI 1 "gpc_reg_operand")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand"))) > + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand"))))] > + "TARGET_MADDLD && TARGET_POWERPC64" > +{ > + rtx op0_lo = gen_rtx_SUBREG (DImode, operands[0], BYTES_BIG_ENDIAN ? 8 : 0); > + rtx op0_hi = gen_rtx_SUBREG (DImode, operands[0], BYTES_BIG_ENDIAN ? 0 : 8); > + > + emit_insn (gen_maddlddi4 (op0_lo, operands[1], operands[2], operands[3])); > + > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_madddi4_highpart (op0_hi, operands[1], operands[2], > + operands[3])); > + else > + emit_insn (gen_madddi4_highpart_le (op0_hi, operands[1], operands[2], > + operands[3])); > + DONE; > +}) > + > +(define_insn "madddi4_highpart" > + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") > + (subreg:DI > + (plus:TI > + (mult:TI (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) > + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))) > + 0))] > + "TARGET_MADDLD && BYTES_BIG_ENDIAN && TARGET_POWERPC64" > + "maddhd %0,%1,%2,%3" > + [(set_attr "type" "mul")]) > + > +(define_insn "madddi4_highpart_le" > + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") > + (subreg:DI > + (plus:TI > + (mult:TI (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) > + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))) > + 8))] > + "TARGET_MADDLD && !BYTES_BIG_ENDIAN && TARGET_POWERPC64" > + "maddhd %0,%1,%2,%3" > + [(set_attr "type" "mul")]) > + > (define_insn "udiv3" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103109.c b/gcc/testsuite/gcc.target/powerpc/pr103109.c > new file mode 100644 > index 00000000000..969b9751b21 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr103109.c > @@ -0,0 +1,110 @@ > +/* { dg-do run { target { has_arch_ppc64 } } } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-require-effective-target p9modulo_hw } */ > +/* { dg-final { scan-assembler-times {\mmaddld\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mmaddhd\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mmaddhdu\M} 1 } } */ > + Maybe it's good to split this case into two, one for compiling and the other for running. Since the generated asm is a test point here, with one separated case for compiling, we can still have that part of test coverage on hosts which are unable to run this case. You can move functions multiply_add and multiply_addu into one common header file, then include it in both source files. > +union U { > + __int128 i128; > + struct { > + long l1; > + long l2; > + } s; > +}; > + > +__int128 > +create_i128 (long most_sig, long least_sig) > +{ > + union U u; > + > +#if __LITTLE_ENDIAN__ > + u.s.l1 = least_sig; > + u.s.l2 = most_sig; > +#else > + u.s.l1 = most_sig; > + u.s.l2 = least_sig; > +#endif > + return u.i128; > +} > + > + > +#define DEBUG 0 > + > +#if DEBUG > +#include > +#include > + > +void print_i128(__int128 val, int unsignedp) > +{ > + if (unsignedp) > + printf(" %llu ", (unsigned long long)(val >> 64)); > + else > + printf(" %lld ", (signed long long)(val >> 64)); > + > + printf("%llu (0x%llx %llx)", > + (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF), > + (unsigned long long)(val >> 64), > + (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF)); > +} > +#endif > + > +void abort (void); > + > +__attribute__((noinline)) > +__int128 multiply_add (long a, long b, long c) > +{ > + return (__int128) a * b + c; > +} > + > +__attribute__((noinline)) > +unsigned __int128 multiply_addu (unsigned long a, unsigned long b, > + unsigned long c) > +{ > + return (unsigned __int128) a * b + c; > +} > + > +int main () > +{ > + long a = 0xFEDCBA9876543210L; > + long b = 0x1000000L; > + long c = 0x123456L; > + __int128 expected_result = create_i128 (0xFFFFFFFFFFFEDCBAL, > + 0x9876543210123456L); > + > + __int128 result = multiply_add (a, b, c); > + > + if (result != expected_result) > + { > +#if DEBUG > + printf ("ERROR: multiply_add (%lld, %lld, %lld) = ", a, b, c); > + print_i128 (result, 0); > + printf ("\n does not match expected_result = "); > + print_i128 (expected_result, 0); > + printf ("\n\n"); > +#else > + abort(); > +#endif > + } > + > + unsigned long au = 0xFEDCBA9876543210UL; > + unsigned long bu = 0x1000000UL; > + unsigned long cu = 0x123456UL; > + unsigned __int128 expected_resultu = create_i128 (0x0000000000FEDCBAL, > + 0x9876543210123456L); > + > + unsigned __int128 resultu = multiply_addu (au, bu, cu); > + if (resultu != expected_resultu) > + { > +#if DEBUG > + printf ("ERROR: multiply_addu (%llu, %llu, %llu) = ", au, bu, cu); > + print_i128 (resultu, 1); > + printf ("\n does not match expected_result = "); > + print_i128 (expected_resultu, 1); > + printf ("\n\n"); > +#else > + abort(); > +#endif Nit: better to add one explicit "return 0;" to avoid possible warning. BR, Kewen > + } > +}