From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <will_schmidt@vnet.ibm.com> Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 351B1396DC0A for <gcc-patches@gcc.gnu.org>; Fri, 13 May 2022 18:20:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 351B1396DC0A Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24DHKcln022294; Fri, 13 May 2022 18:20:33 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1un8h0y9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 May 2022 18:20:33 +0000 Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24DHqCp8007035; Fri, 13 May 2022 18:20:33 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1un8h0xx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 May 2022 18:20:33 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24DI01nL018797; Fri, 13 May 2022 18:20:32 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma04wdc.us.ibm.com with ESMTP id 3fwgdac3vg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 May 2022 18:20:32 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 24DIKV6928442922 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 13 May 2022 18:20:31 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 653AA6E05B; Fri, 13 May 2022 18:20:31 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A07836E052; Fri, 13 May 2022 18:20:30 +0000 (GMT) Received: from lexx (unknown [9.160.59.210]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Fri, 13 May 2022 18:20:30 +0000 (GMT) Message-ID: <cdd45e5910a8ec59efc64db256b976f8277b118f.camel@vnet.ibm.com> Subject: Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109. From: will schmidt <will_schmidt@vnet.ibm.com> To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org, Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com> Date: Fri, 13 May 2022 13:20:30 -0500 In-Reply-To: <Yn6Ek+CgdLQiXV73@toto.the-meissners.org> References: <Yn6Ek+CgdLQiXV73@toto.the-meissners.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-18.el8) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: cyRJhy-LwKRzvXUT0OTJeaJbl6DCtRz5 X-Proofpoint-ORIG-GUID: _8LfnFJh8B7v6JN_n-JsG1x3VjhhfVYX X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-13_09,2022-05-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 mlxscore=0 phishscore=0 spamscore=0 clxscore=1015 adultscore=0 lowpriorityscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205130075 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> X-List-Received-Date: Fri, 13 May 2022 18:20:36 -0000 On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote: > Optimize multiply/add of DImode extended to TImode, PR target/103109. > > On power9 and power10 systems, we have instructions that support doing > 64-bit integers converted to 128-bit integers and producing 128-bit > results. This patch adds support to generate these instructions. > > Previously GCC had define_expands to handle conversion of the 64-bit > extend to 128-bit and multiply. This patch changes these define_expands > to define_insn_and_split and then it provides combiner patterns to > generate thes multiply/add instructions. > > To support using this optimization on power9, this patch extends the sign > extend DImode to TImode to also run on power9 (added for PR > target/104698). > > This patch needs the previous patch to add unsigned DImode to TImode > conversion so that the combiner can combine the extend, multiply, and add > instructions. > > I have built this patch on little endian power10, little endian power9, and big > endian power8 systems. There were no regressions when I ran it. Can I install > this patch into the GCC 13 master branch? > > 2022-05-13 Michael Meissner <meissner@linux.ibm.com> > > gcc/ > PR target/103109 > * config/rs6000/rs6000.md (su_int32): New code attribute. > (<u>mul<mode><dmode>3): Convert from define_expand to > define_insn_and_split. > (maddld<mode>4): Add generator function. -(define_insn "*maddld<mode>4" +(define_insn "maddld<mode>4" Is the removal of the "*" considering adding generator? (Thats terminology that I'm not immediately familiar with). > (<u>mulditi3_<u>adddi3): New insn. > (<u>mulditi3_add_const): New insn. > (<u>mulditi3_<u>adddi3_upper): New insn. > > gcc/testsuite/ > PR target/103109 > * gcc.target/powerpc/pr103109.c: New test. ok > --- > gcc/config/rs6000/rs6000.md | 128 +++++++++++++++++++- > gcc/testsuite/gcc.target/powerpc/pr103109.c | 62 ++++++++++ > 2 files changed, 184 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 2aba70393d8..83eacec57ba 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -667,6 +667,9 @@ (define_code_attr uns [(fix "") > (float "") > (unsigned_float "uns")]) > > +(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand") > + (zero_extend "c32bit_cint_operand")]) > + > ; Various instructions that come in SI and DI forms. > ; A generic w/d attribute, for things like cmpw/cmpd. > (define_mode_attr wd [(QI "b") > @@ -3190,13 +3193,16 @@ (define_insn "<su>mulsi3_highpart_64" > "mulhw<u> %0,%1,%2" > [(set_attr "type" "mul")]) > > -(define_expand "<u>mul<mode><dmode>3" > - [(set (match_operand:<DMODE> 0 "gpc_reg_operand") > +(define_insn_and_split "<u>mul<mode><dmode>3" > + [(set (match_operand:<DMODE> 0 "gpc_reg_operand" "=&r") > (mult:<DMODE> (any_extend:<DMODE> > - (match_operand:GPR 1 "gpc_reg_operand")) > + (match_operand:GPR 1 "gpc_reg_operand" "r")) > (any_extend:<DMODE> > - (match_operand:GPR 2 "gpc_reg_operand"))))] > + (match_operand:GPR 2 "gpc_reg_operand" "r"))))] > "!(<MODE>mode == SImode && TARGET_POWERPC64)" > + "#" > + "&& 1" > + [(pc)] > { > rtx l = gen_reg_rtx (<MODE>mode); > rtx h = gen_reg_rtx (<MODE>mode); > @@ -3205,9 +3211,10 @@ (define_expand "<u>mul<mode><dmode>3" > emit_move_insn (gen_lowpart (<MODE>mode, operands[0]), l); > emit_move_insn (gen_highpart (<MODE>mode, operands[0]), h); > DONE; > -}) > +} > + [(set_attr "length" "8")]) > ok > -(define_insn "*maddld<mode>4" > +(define_insn "maddld<mode>4" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > (match_operand:GPR 2 "gpc_reg_operand" "r")) ok > @@ -3216,6 +3223,115 @@ (define_insn "*maddld<mode>4" > "maddld %0,%1,%2,%3" > [(set_attr "type" "mul")]) > > +(define_insn_and_split "*<u>mulditi3_<u>adddi3" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r") > + (plus:TI > + (mult:TI > + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) > + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))))] > + "TARGET_MADDLD && TARGET_POWERPC64" > + "#" > + "&& 1" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx dest_hi = gen_highpart (DImode, dest); > + rtx dest_lo = gen_lowpart (DImode, dest); > + rtx op1 = operands[1]; > + rtx op2 = operands[2]; > + rtx op3 = operands[3]; > + rtx tmp_hi, tmp_lo; > + > + if (can_create_pseudo_p ()) > + { > + tmp_hi = gen_reg_rtx (DImode); > + tmp_lo = gen_reg_rtx (DImode); > + } > + else > + { > + tmp_hi = dest_hi; > + tmp_lo = dest_lo; > + } > + > + emit_insn (gen_<u>mulditi3_<u>adddi3_upper (tmp_hi, op1, op2, op3)); > + emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3)); > + > + if (can_create_pseudo_p ()) > + { > + emit_move_insn (dest_hi, tmp_hi); > + emit_move_insn (dest_lo, tmp_lo); > + } > + DONE; > +} > + [(set_attr "length" "8")]) Seems OK, I did not closely scrutinize. > + > +;; Optimize 128-bit multiply with zero/sign extend and adding a constant. We > +;; force the constant into a register to generate li, maddhd, and maddld, > +;; instead of mulld, mulhd, addic, and addze. We can't combine this pattern > +;; with the pattern that handles registers, since constants don't have a sign > +;; or zero extend around them. ok. > +(define_insn_and_split "*<u>mulditi3_add_const" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r") > + (plus:TI > + (mult:TI > + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) > + (match_operand 3 "<su_int32>" "r")))] > + "TARGET_MADDLD && TARGET_POWERPC64 > +" > + "#" > + "&& 1" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx dest_hi = gen_highpart (DImode, dest); > + rtx dest_lo = gen_lowpart (DImode, dest); > + rtx op1 = operands[1]; > + rtx op2 = operands[2]; > + rtx op3 = force_reg (DImode, operands[3]); > + rtx tmp_hi, tmp_lo; > + > + if (can_create_pseudo_p ()) > + { > + tmp_hi = gen_reg_rtx (DImode); > + tmp_lo = gen_reg_rtx (DImode); > + } > + else > + { > + tmp_hi = dest_hi; > + tmp_lo = dest_lo; > + } > + > + emit_insn (gen_<u>mulditi3_<u>adddi3_upper (tmp_hi, op1, op2, op3)); > + emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3)); > + > + if (can_create_pseudo_p ()) > + { > + emit_move_insn (dest_hi, tmp_hi); > + emit_move_insn (dest_lo, tmp_lo); > + } > + DONE; > +} > + [(set_attr "length" "8") > + (set_attr "type" "mul") > + (set_attr "size" "64")]) > + > +(define_insn "<u>mulditi3_<u>adddi3_upper" > + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") > + (truncate:DI > + (lshiftrt:TI > + (plus:TI > + (mult:TI > + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) > + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))) > + (const_int 64))))] > + "TARGET_MADDLD && TARGET_POWERPC64" > + "maddhd<u> %0,%1,%2,%3" > + [(set_attr "type" "mul") > + (set_attr "size" "64")]) > + > (define_insn "udiv<mode>3" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") appears OK. I defer to others for scrutiny :-) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103109.c b/gcc/testsuite/gcc.target/powerpc/pr103109.c > new file mode 100644 > index 00000000000..ae2cfb9eda7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr103109.c > @@ -0,0 +1,62 @@ > +/* { dg-require-effective-target int128 } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +/* This test makes sure that GCC generates the maddhd, maddhdu, and maddld > + power9 instructions when doing some forms of 64-bit integers converted to > + 128-bit integers and used with multiply/add operations. */ > + > +__int128_t > +s_mult_add (long long a, > + long long b, > + long long c) > +{ > + /* maddhd, maddld. */ > + return ((__int128_t)a * (__int128_t)b) + (__int128_t)c; > +} > + > +/* Test 32-bit constants that are loaded into GPRs instead of doing the > + mulld/mulhd and then addic/addime or addc/addze. */ addme ? Thanks, -Will > +__int128_t > +s_mult_add_m10 (long long a, > + long long b) > +{ > + /* maddhd, maddld. */ > + return ((__int128_t)a * (__int128_t)b) - 10; > +} > + > +__int128_t > +s_mult_add_70000 (long long a, > + long long b) > +{ > + /* maddhd, maddld. */ > + return ((__int128_t)a * (__int128_t)b) + 70000; > +} > + > +__uint128_t > +u_mult_add (unsigned long long a, > + unsigned long long b, > + unsigned long long c) > +{ > + /* maddhd, maddld. */ > + return ((__uint128_t)a * (__uint128_t)b) + (__uint128_t)c; > +} > + > +__uint128_t > +u_mult_add_0x80000000 (unsigned long long a, > + unsigned long long b) > +{ > + /* maddhd, maddld. */ > + return ((__uint128_t)a * (__uint128_t)b) + 0x80000000UL; > +} > + > +/* { dg-final { scan-assembler-not {\maddc\M} } } */ > +/* { dg-final { scan-assembler-not {\madde\M} } } */ > +/* { dg-final { scan-assembler-not {\maddid\M} } } */ > +/* { dg-final { scan-assembler-not {\maddme\M} } } */ > +/* { dg-final { scan-assembler-not {\maddze\M} } } */ > +/* { dg-final { scan-assembler-not {\mmulhd\M} } } */ > +/* { dg-final { scan-assembler-not {\mmulld\M} } } */ > +/* { dg-final { scan-assembler-times {\mmaddhd\M} 3 } } */ > +/* { dg-final { scan-assembler-times {\mmaddhdu\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mmaddld\M} 5 } } */ > -- > 2.35.3 > >