From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id BA1C03858CDA for ; Wed, 14 Jun 2023 02:14:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BA1C03858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35E1p674017127; Wed, 14 Jun 2023 02:14:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : mime-version : content-type; s=pp1; bh=TRMnFkmVj+Eok9RglrLuGRuJ+bE8fRWUDAqQ+2a+7xw=; b=XavrPj+JolLAGYaxvhLyeF4lnP4eqhkXrPfwUumPPq00i1BONlCRKdKp/DXCWrdoXVQP ti1aS5M7ICfBh2t+egk+hdp2/5NVn0/36EHda9XNEEgxb+EJyBfuop6n/tjOA3FyNf9F WlwmlxNdYdFS7O+n4imF7lMjGV/6OeF5zr6NY+Wh90lGyLJOq4lB/DeNlSSx4qUeabCg 1W32cFTWht0YqZNBrfGqZJF8hiXgiXwYnQ1QSNvARrLbCrvfdQqttZoNKMBGEdsjHibU GCfHTk6aJR+GUsU9b8d7xpajXDmKBRFbrRJ1/N65hmky+ELqMwG7UV8AIDKPLLnJ4lb+ cg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r7483rd05-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Jun 2023 02:14:07 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35E1rCbJ024301; Wed, 14 Jun 2023 02:14:07 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r7483rd01-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Jun 2023 02:14:07 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35DKeBDL029784; Wed, 14 Jun 2023 02:14:06 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([9.208.130.102]) by ppma03dal.us.ibm.com (PPS) with ESMTPS id 3r4gt4x2f4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Jun 2023 02:14:06 +0000 Received: from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com [10.39.53.233]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35E2E5W257082234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Jun 2023 02:14:05 GMT Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DE3BC58054; Wed, 14 Jun 2023 02:14:04 +0000 (GMT) Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2953658056; Wed, 14 Jun 2023 02:14:04 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.54.135]) by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Wed, 14 Jun 2023 02:14:04 +0000 (GMT) Date: Tue, 13 Jun 2023 22:14:02 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325 Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 0mP8NZSNUgwIGpaFDIPo4WF19PeIH4ph X-Proofpoint-ORIG-GUID: oGUfMlcQ9644R0gyCHGzVSboH5LCT9cS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-13_24,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 mlxscore=0 clxscore=1011 impostorscore=0 malwarescore=0 adultscore=0 suspectscore=0 lowpriorityscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306140013 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch fixes an issue where if you use the -fstack-protector and -mcpu=power10 options and you have a large stack frame, the GCC compiler will generate a LWA instruction with a large offset. Unlike the previous versions of this patch, I dug into it, and I found it was much more complex that I originally thought. The important thing in the bug is that -fstack-protector is used, but it could potentially happen with fused load-compare to any stack location when the stack frame is larger than 32K without -fstack-protector. Here is the initial fused initial insn that was created. It refers to the stack location based off of the virtrual frame pointer: (insn 6 5 7 2 (parallel [ (set (reg:CC 119) (compare:CC (mem/c:SI (plus:DI (reg/f:DI 110 sfp) (const_int -4)) (const_int 0 [0]))) (clobber (scratch:DI)) ]) (nil)) After the stack size is finalized, the frame pointer removed, and the post reload phase is run, the insn is now: (insn 6 5 7 2 (parallel [ (set (reg:CC 100 0 [119]) (compare:CC (mem/c:SI (plus:DI (reg/f:DI 1 1) (const_int 40044)) (const_int 0 [0]))) (clobber (reg:DI 9 9 [120])) ]) (nil)) When the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. This means that since the operand predicates aren't recognized, it won't be split. Thus, it goes all of the way to final. The automatic prefix instruction support was not run because the type was changed from "load" to "fused_load_cmpi". This meant that it was assume that the insn was only 8 bytes, and that we did not need to prefer the lwa with a 'p'. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the instruction length is 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes". 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). I have built bootstrap compilers and tested them on the following environments. There were no regressions in any of the runs. Little endian power10, long double is IBM 128-bit Little endian power9, long double is IBM 128-bit Little endian power9, long double is IEEE 128-bit Big endian power8, long double is IBM 128-bit (32/64-bit tests run) Can I check this patch into the master GCC branch? After a waiting period, once the previous changes to genfusion.pl are checked in, can I install this patch in previous GCC compilers? 2023-06-12 Michael Meissner gcc/ * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ * g++.target/powerpc/pr105325.C: New test. * gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. --- gcc/config/rs6000/fusion.md | 27 +++++++------- gcc/config/rs6000/genfusion.pl | 36 +++++++++++++++---- gcc/config/rs6000/predicates.md | 14 -------- gcc/config/rs6000/rs6000.md | 4 +-- gcc/testsuite/g++.target/powerpc/pr105325.C | 26 ++++++++++++++ .../gcc.target/powerpc/fusion-p10-ldcmpi.c | 16 +++++---- 6 files changed, 81 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index d45fb138a70..e286bf526a2 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -22,7 +22,7 @@ ;; load mode is DI result mode is clobber compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" ;; load mode is DI result mode is DI compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" ;; load mode is DI result mode is DI compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -104,17 +104,17 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 ;; load mode is SI result mode is clobber compare mode is CC extend is none -(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none" +(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:SI 0 "=r"))] "(TARGET_P10_FUSION)" - "lwa%X1 %0,%1\;cmpdi %2,%0,%3" + "lwz%X1 %0,%1\;cmpwi %2,%0,%3" "&& reload_completed && (cc_reg_not_cr0_operand (operands[2], CCmode) || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0), - SImode, NON_PREFIXED_DS))" + SImode, NON_PREFIXED_D))" [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (compare:CC (match_dup 0) (match_dup 3)))] @@ -146,17 +146,17 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none" ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 ;; load mode is SI result mode is SI compare mode is CC extend is none -(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none" +(define_insn_and_split "*lwz_cmpwi_cr0_SI_SI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" - "lwa%X1 %0,%1\;cmpdi %2,%0,%3" + "lwz%X1 %0,%1\;cmpwi %2,%0,%3" "&& reload_completed && (cc_reg_not_cr0_operand (operands[2], CCmode) || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0), - SImode, NON_PREFIXED_DS))" + SImode, NON_PREFIXED_D))" [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (compare:CC (match_dup 0) (match_dup 3)))] @@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none" ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))] "(TARGET_P10_FUSION)" @@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 82e8f863b02..c8b232847a8 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -61,20 +61,30 @@ sub gen_ld_cmpi_p10_one my $mempred = "non_update_memory_operand"; my $extend; + # We need to special case lwa. The prefixed_load_p function in rs6000.cc + # (which determines if a load instruction is prefixed) uses the fact that the + # register mode is different from the memory mode, and that the sign_extend + # attribute is set to use DS-form rules for the address instead of D-form. + # If the register size is the same, prefixed_load_p assumes we are doing a + # lwz. We change to use an lwz and word compare if we don't need to sign + # extend the SImode value. Otherwise if we need the value, we need to + # make sure the insn is marked as ds-form. + my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC"); + my $cmp_size = ($lwa_insn && $result !~ /^EXT|^DI$/) ? "w" : "d"; + if ($ccmode eq "CC") { # ld and lwa are both DS-FORM. - ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS"; - ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand"; + ($lmode eq "DI") and $np = "NON_PREFIXED_DS"; + ($lmode eq "SI" && $cmp_size eq "d") and $np = "NON_PREFIXED_DS"; } else { if ($lmode eq "DI") { # ld is DS-form, but lwz is not. $np = "NON_PREFIXED_DS"; - $mempred = "ds_form_mem_operand"; } } my $cmpl = ($ccmode eq "CC") ? "" : "l"; - my $echr = ($ccmode eq "CC") ? "a" : "z"; + my $echr = ($ccmode eq "CC" && $cmp_size eq "d") ? "a" : "z"; if ($lmode eq "DI") { $echr = ""; } my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand" : "const_0_to_1_operand"; @@ -91,12 +101,15 @@ sub gen_ld_cmpi_p10_one } my $ldst = mode_to_ldst_char($lmode); + + # DS-form addresses need YZ, and not m. + my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m"; print <