From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <will_schmidt@vnet.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 351B1396DC0A
 for <gcc-patches@gcc.gnu.org>; Fri, 13 May 2022 18:20:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 351B1396DC0A
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24DHKcln022294;
 Fri, 13 May 2022 18:20:33 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1un8h0y9-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 18:20:33 +0000
Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24DHqCp8007035;
 Fri, 13 May 2022 18:20:33 GMT
Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com
 [169.47.144.26])
 by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1un8h0xx-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 18:20:33 +0000
Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1])
 by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24DI01nL018797;
 Fri, 13 May 2022 18:20:32 GMT
Received: from b03cxnp08025.gho.boulder.ibm.com
 (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17])
 by ppma04wdc.us.ibm.com with ESMTP id 3fwgdac3vg-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 18:20:32 +0000
Received: from b03ledav001.gho.boulder.ibm.com
 (b03ledav001.gho.boulder.ibm.com [9.17.130.232])
 by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 24DIKV6928442922
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 13 May 2022 18:20:31 GMT
Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 653AA6E05B;
 Fri, 13 May 2022 18:20:31 +0000 (GMT)
Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id A07836E052;
 Fri, 13 May 2022 18:20:30 +0000 (GMT)
Received: from lexx (unknown [9.160.59.210])
 by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP;
 Fri, 13 May 2022 18:20:30 +0000 (GMT)
Message-ID: <cdd45e5910a8ec59efc64db256b976f8277b118f.camel@vnet.ibm.com>
Subject: Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR
 target/103109.
From: will schmidt <will_schmidt@vnet.ibm.com>
To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin"
 <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>, Peter Bergner
 <bergner@linux.ibm.com>
Date: Fri, 13 May 2022 13:20:30 -0500
In-Reply-To: <Yn6Ek+CgdLQiXV73@toto.the-meissners.org>
References: <Yn6Ek+CgdLQiXV73@toto.the-meissners.org>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.28.5 (3.28.5-18.el8) 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: cyRJhy-LwKRzvXUT0OTJeaJbl6DCtRz5
X-Proofpoint-ORIG-GUID: _8LfnFJh8B7v6JN_n-JsG1x3VjhhfVYX
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514
 definitions=2022-05-13_09,2022-05-13_01,2022-02-23_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0 bulkscore=0
 priorityscore=1501 impostorscore=0 mlxscore=0 phishscore=0 spamscore=0
 clxscore=1015 adultscore=0 lowpriorityscore=0 mlxlogscore=999
 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2202240000 definitions=main-2205130075
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2022 18:20:36 -0000

On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote:
> Optimize multiply/add of DImode extended to TImode, PR target/103109.
> 
> On power9 and power10 systems, we have instructions that support doing
> 64-bit integers converted to 128-bit integers and producing 128-bit
> results.  This patch adds support to generate these instructions.
> 
> Previously GCC had define_expands to handle conversion of the 64-bit
> extend to 128-bit and multiply.  This patch changes these define_expands
> to define_insn_and_split and then it provides combiner patterns to
> generate thes multiply/add instructions.
> 
> To support using this optimization on power9, this patch extends the sign
> extend DImode to TImode to also run on power9 (added for PR
> target/104698).
> 
> This patch needs the previous patch to add unsigned DImode to TImode
> conversion so that the combiner can combine the extend, multiply, and add
> instructions.
> 
> I have built this patch on little endian power10, little endian power9, and big
> endian power8 systems.  There were no regressions when I ran it.  Can I install
> this patch into the GCC 13 master branch?
> 
> 2022-05-13   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 	PR target/103109
> 	* config/rs6000/rs6000.md (su_int32): New code attribute.
> 	(<u>mul<mode><dmode>3): Convert from define_expand to
> 	define_insn_and_split.
> 	(maddld<mode>4): Add generator function.

-(define_insn "*maddld<mode>4"
+(define_insn "maddld<mode>4"

Is the removal of the "*" considering adding generator?  (Thats
terminology that I'm not immediately familiar with). 




> 	(<u>mulditi3_<u>adddi3): New insn.
> 	(<u>mulditi3_add_const): New insn.
> 	(<u>mulditi3_<u>adddi3_upper): New insn.
> 
> gcc/testsuite/
> 	PR target/103109
> 	* gcc.target/powerpc/pr103109.c: New test.


ok


> ---
>  gcc/config/rs6000/rs6000.md                 | 128 +++++++++++++++++++-
>  gcc/testsuite/gcc.target/powerpc/pr103109.c |  62 ++++++++++
>  2 files changed, 184 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 2aba70393d8..83eacec57ba 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -667,6 +667,9 @@ (define_code_attr uns [(fix		"")
>  		       (float		"")
>  		       (unsigned_float	"uns")])
> 
> +(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand")
> +			    (zero_extend "c32bit_cint_operand")])
> +
>  ; Various instructions that come in SI and DI forms.
>  ; A generic w/d attribute, for things like cmpw/cmpd.
>  (define_mode_attr wd [(QI    "b")
> @@ -3190,13 +3193,16 @@ (define_insn "<su>mulsi3_highpart_64"
>    "mulhw<u> %0,%1,%2"
>    [(set_attr "type" "mul")])
> 
> -(define_expand "<u>mul<mode><dmode>3"
> -  [(set (match_operand:<DMODE> 0 "gpc_reg_operand")
> +(define_insn_and_split "<u>mul<mode><dmode>3"
> +  [(set (match_operand:<DMODE> 0 "gpc_reg_operand" "=&r")
>  	(mult:<DMODE> (any_extend:<DMODE>
> -			(match_operand:GPR 1 "gpc_reg_operand"))
> +		       (match_operand:GPR 1 "gpc_reg_operand" "r"))
>  		      (any_extend:<DMODE>
> -			(match_operand:GPR 2 "gpc_reg_operand"))))]
> +		       (match_operand:GPR 2 "gpc_reg_operand" "r"))))]
>    "!(<MODE>mode == SImode && TARGET_POWERPC64)"
> +  "#"
> +  "&& 1"
> +  [(pc)]
>  {
>    rtx l = gen_reg_rtx (<MODE>mode);
>    rtx h = gen_reg_rtx (<MODE>mode);
> @@ -3205,9 +3211,10 @@ (define_expand "<u>mul<mode><dmode>3"
>    emit_move_insn (gen_lowpart (<MODE>mode, operands[0]), l);
>    emit_move_insn (gen_highpart (<MODE>mode, operands[0]), h);
>    DONE;
> -})
> +}
> +  [(set_attr "length" "8")])
> 

ok


> -(define_insn "*maddld<mode>4"
> +(define_insn "maddld<mode>4"
>    [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>  	(plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>  			    (match_operand:GPR 2 "gpc_reg_operand" "r"))

ok

> @@ -3216,6 +3223,115 @@ (define_insn "*maddld<mode>4"
>    "maddld %0,%1,%2,%3"
>    [(set_attr "type" "mul")])
> 
> +(define_insn_and_split "*<u>mulditi3_<u>adddi3"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r")
> +	(plus:TI
> +	 (mult:TI
> +	  (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
> +	  (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
> +	 (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))))]
> +  "TARGET_MADDLD && TARGET_POWERPC64"
> +  "#"
> +  "&& 1"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +  rtx op1 = operands[1];
> +  rtx op2 = operands[2];
> +  rtx op3 = operands[3];
> +  rtx tmp_hi, tmp_lo;
> +
> +  if (can_create_pseudo_p ())
> +    {
> +      tmp_hi = gen_reg_rtx (DImode);
> +      tmp_lo = gen_reg_rtx (DImode);
> +    }
> +  else
> +    {
> +      tmp_hi = dest_hi;
> +      tmp_lo = dest_lo;
> +    }
> +
> +  emit_insn (gen_<u>mulditi3_<u>adddi3_upper (tmp_hi, op1, op2, op3));
> +  emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3));
> +
> +  if (can_create_pseudo_p ())
> +    {
> +      emit_move_insn (dest_hi, tmp_hi);
> +      emit_move_insn (dest_lo, tmp_lo);
> +    }
> +  DONE;
> +}
> +  [(set_attr "length" "8")])

Seems OK, I did not closely scrutinize.


> +
> +;; Optimize 128-bit multiply with zero/sign extend and adding a constant.  We
> +;; force the constant into a register to generate li, maddhd, and maddld,
> +;; instead of mulld, mulhd, addic, and addze.  We can't combine this pattern
> +;; with the pattern that handles registers, since constants don't have a sign
> +;; or zero extend around them.

ok.


> +(define_insn_and_split "*<u>mulditi3_add_const"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r")
> +	(plus:TI
> +	 (mult:TI
> +	  (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
> +	  (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
> +	 (match_operand 3 "<su_int32>" "r")))]
> +  "TARGET_MADDLD && TARGET_POWERPC64
> +"
> +  "#"
> +  "&& 1"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +  rtx op1 = operands[1];
> +  rtx op2 = operands[2];
> +  rtx op3 = force_reg (DImode, operands[3]);
> +  rtx tmp_hi, tmp_lo;
> +
> +  if (can_create_pseudo_p ())
> +    {
> +      tmp_hi = gen_reg_rtx (DImode);
> +      tmp_lo = gen_reg_rtx (DImode);
> +    }
> +  else
> +    {
> +      tmp_hi = dest_hi;
> +      tmp_lo = dest_lo;
> +    }
> +
> +  emit_insn (gen_<u>mulditi3_<u>adddi3_upper (tmp_hi, op1, op2, op3));
> +  emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3));
> +
> +  if (can_create_pseudo_p ())
> +    {
> +      emit_move_insn (dest_hi, tmp_hi);
> +      emit_move_insn (dest_lo, tmp_lo);
> +    }
> +  DONE;
> +}
> +  [(set_attr "length" "8")
> +   (set_attr "type" "mul")
> +   (set_attr "size" "64")])
> +
> +(define_insn "<u>mulditi3_<u>adddi3_upper"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> +	(truncate:DI
> +	 (lshiftrt:TI
> +	  (plus:TI
> +	   (mult:TI
> +	    (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
> +	    (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
> +	   (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r")))
> +	  (const_int 64))))]
> +  "TARGET_MADDLD && TARGET_POWERPC64"
> +  "maddhd<u> %0,%1,%2,%3"
> +  [(set_attr "type" "mul")
> +   (set_attr "size" "64")])
> +
>  (define_insn "udiv<mode>3"
>    [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>          (udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")


appears OK.  I defer to others for scrutiny :-)


> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103109.c b/gcc/testsuite/gcc.target/powerpc/pr103109.c
> new file mode 100644
> index 00000000000..ae2cfb9eda7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103109.c
> @@ -0,0 +1,62 @@
> +/* { dg-require-effective-target int128     } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +/* This test makes sure that GCC generates the maddhd, maddhdu, and maddld
> +   power9 instructions when doing some forms of 64-bit integers converted to
> +   128-bit integers and used with multiply/add operations.  */
> +
> +__int128_t
> +s_mult_add (long long a,
> +	    long long b,
> +	    long long c)
> +{
> +  /* maddhd, maddld.  */
> +  return ((__int128_t)a * (__int128_t)b) + (__int128_t)c;
> +}
> +
> +/* Test 32-bit constants that are loaded into GPRs instead of doing the
> +   mulld/mulhd and then addic/addime or addc/addze.  */

addme ?



Thanks,
-Will



> +__int128_t
> +s_mult_add_m10 (long long a,
> +		long long b)
> +{
> +  /* maddhd, maddld.  */
> +  return ((__int128_t)a * (__int128_t)b) - 10;
> +}
> +
> +__int128_t
> +s_mult_add_70000 (long long a,
> +		  long long b)
> +{
> +  /* maddhd, maddld.  */
> +  return ((__int128_t)a * (__int128_t)b) + 70000;
> +}
> +
> +__uint128_t
> +u_mult_add (unsigned long long a,
> +	    unsigned long long b,
> +	    unsigned long long c)
> +{
> +  /* maddhd, maddld.  */
> +  return ((__uint128_t)a * (__uint128_t)b) + (__uint128_t)c;
> +}
> +
> +__uint128_t
> +u_mult_add_0x80000000 (unsigned long long a,
> +		       unsigned long long b)
> +{
> +  /* maddhd, maddld.  */
> +  return ((__uint128_t)a * (__uint128_t)b) + 0x80000000UL;
> +}
> +
> +/* { dg-final { scan-assembler-not   {\maddc\M}      } } */
> +/* { dg-final { scan-assembler-not   {\madde\M}      } } */
> +/* { dg-final { scan-assembler-not   {\maddid\M}     } } */
> +/* { dg-final { scan-assembler-not   {\maddme\M}     } } */
> +/* { dg-final { scan-assembler-not   {\maddze\M}     } } */
> +/* { dg-final { scan-assembler-not   {\mmulhd\M}     } } */
> +/* { dg-final { scan-assembler-not   {\mmulld\M}     } } */
> +/* { dg-final { scan-assembler-times {\mmaddhd\M}  3 } } */
> +/* { dg-final { scan-assembler-times {\mmaddhdu\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mmaddld\M}  5 } } */
> -- 
> 2.35.3
> 
>