From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id E25A73858CD1; Fri, 23 Jun 2023 12:29:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E25A73858CD1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687523398; bh=CSK2VOd4nZm4IS6aXxgG25nCsl/8+1XEB9IenKx91gY=; h=From:To:Subject:Date:From; b=nGZr17RmEmI3+DTxkBbydHGUB/agFkvSYCfhgZU6qS5k4zBpqEjE8Qe1N1BMbMhy5 5RXU4gem5Mg9UjJnv7QDkP2aL0SIDQ9hTDAnO1SzdnpqakJl1Cqa6Xa8rRa30J7tFe ub9WN3uiFl986KWgp4yGTfnbo2Y0xaDOKByseMN4= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work123)] Fix power10 fusion bug with prefixed loads, PR target/105325 X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work123 X-Git-Oldrev: 11ab357ad492fb6a242385b7b87d711ee841f417 X-Git-Newrev: cc3db5e200c373b01b87a4387fc961fb3fb268d3 Message-Id: <20230623122958.E25A73858CD1@sourceware.org> Date: Fri, 23 Jun 2023 12:29:58 +0000 (GMT) List-Id: https://gcc.gnu.org/g:cc3db5e200c373b01b87a4387fc961fb3fb268d3 commit cc3db5e200c373b01b87a4387fc961fb3fb268d3 Author: Michael Meissner Date: Fri Jun 23 08:29:28 2023 -0400 Fix power10 fusion bug with prefixed loads, PR target/105325 This changes fixes PR target/105325. PR target/105325 is a bug where an invalid lwa instruction is generated due to power10 fusion of a load instruction to a GPR and an compare immediate instruction with the immediate being -1, 0, or 1. In some cases, when the load instruction is done, the GCC compiler would generate a load instruction with an offset that was too large to fit into the normal load instruction. In particular, loads from the stack might originally have a small offset, so that the load is not a prefixed load. However, after the stack is set up, and register allocation has been done, the offset now is large enough that we would have to use a prefixed load instruction. The support for prefixed loads did not consider that patterns with a fused load and compare might have a prefixed address. Without this support, the proper prefixed load won't be generated. In the original code, when the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. When the pattern was created, ds_form_mem_operand recognized the insn as being valid since the offset was small. But after register allocation, ds_form_mem_operand did not return true. Because it didn't return true, the insn could not be split. Since the insn was not split and the prefix support did not indicate a prefixed instruction was used, the wrong load is generated. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the insn length will be 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes" for lwa. 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). 2023-06-22 Michael Meissner gcc/ * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ * g++.target/powerpc/pr105325.C: New test. * gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. Diff: --- gcc/config/rs6000/fusion.md | 27 ++++++++-------- gcc/config/rs6000/genfusion.pl | 37 ++++++++++++++++++---- gcc/config/rs6000/predicates.md | 14 -------- gcc/config/rs6000/rs6000.md | 4 +-- gcc/testsuite/g++.target/powerpc/pr105325.C | 28 ++++++++++++++++ .../gcc.target/powerpc/fusion-p10-ldcmpi.c | 16 ++++++---- 6 files changed, 84 insertions(+), 42 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index d45fb138a70..e286bf526a2 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -22,7 +22,7 @@ ;; load mode is DI result mode is clobber compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -43,7 +43,7 @@ ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -64,7 +64,7 @@ ;; load mode is DI result mode is DI compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -85,7 +85,7 @@ ;; load mode is DI result mode is DI compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -104,17 +104,17 @@ ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 ;; load mode is SI result mode is clobber compare mode is CC extend is none -(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none" +(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:SI 0 "=r"))] "(TARGET_P10_FUSION)" - "lwa%X1 %0,%1\;cmpdi %2,%0,%3" + "lwz%X1 %0,%1\;cmpwi %2,%0,%3" "&& reload_completed && (cc_reg_not_cr0_operand (operands[2], CCmode) || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0), - SImode, NON_PREFIXED_DS))" + SImode, NON_PREFIXED_D))" [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (compare:CC (match_dup 0) (match_dup 3)))] @@ -146,17 +146,17 @@ ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 ;; load mode is SI result mode is SI compare mode is CC extend is none -(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none" +(define_insn_and_split "*lwz_cmpwi_cr0_SI_SI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" - "lwa%X1 %0,%1\;cmpdi %2,%0,%3" + "lwz%X1 %0,%1\;cmpwi %2,%0,%3" "&& reload_completed && (cc_reg_not_cr0_operand (operands[2], CCmode) || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0), - SImode, NON_PREFIXED_DS))" + SImode, NON_PREFIXED_D))" [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (compare:CC (match_dup 0) (match_dup 3)))] @@ -190,7 +190,7 @@ ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))] "(TARGET_P10_FUSION)" @@ -205,6 +205,7 @@ "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 82e8f863b02..4d1f825ad90 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -61,20 +61,31 @@ sub gen_ld_cmpi_p10_one my $mempred = "non_update_memory_operand"; my $extend; + # We need to special case lwa. The prefixed_load_p function in rs6000.cc + # (which determines if a load instruction is prefixed) uses the fact that the + # register mode is different from the memory mode, and that the sign_extend + # attribute is set to use DS-form rules for the address instead of D-form. + # If the register size is the same, prefixed_load_p assumes we are doing a + # lwz. We change to use an lwz and word compare if we don't need to sign + # extend the SImode value. Otherwise if we need the value, we need to + # make sure the insn is marked as ds-form. + my $cmp_size_char = ($lmode eq "SI" + && $ccmode eq "CC" + && $result !~ /^EXT|^DI$/) ? "w" : "d"; + if ($ccmode eq "CC") { # ld and lwa are both DS-FORM. - ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS"; - ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand"; + ($lmode eq "DI") and $np = "NON_PREFIXED_DS"; + ($lmode eq "SI" && $cmp_size_char eq "d") and $np = "NON_PREFIXED_DS"; } else { if ($lmode eq "DI") { # ld is DS-form, but lwz is not. $np = "NON_PREFIXED_DS"; - $mempred = "ds_form_mem_operand"; } } my $cmpl = ($ccmode eq "CC") ? "" : "l"; - my $echr = ($ccmode eq "CC") ? "a" : "z"; + my $echr = ($ccmode eq "CC" && $cmp_size_char eq "d") ? "a" : "z"; if ($lmode eq "DI") { $echr = ""; } my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand" : "const_0_to_1_operand"; @@ -91,12 +102,15 @@ sub gen_ld_cmpi_p10_one } my $ldst = mode_to_ldst_char($lmode); + + # DS-form addresses need YZ, and not m. + my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m"; print <