From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 259A63858C39 for ; Wed, 4 Aug 2021 21:16:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 259A63858C39 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 174L4hip013744; Wed, 4 Aug 2021 17:16:51 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a7un34hxm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 17:16:50 -0400 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 174L4urp014937; Wed, 4 Aug 2021 17:16:50 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a7un34hxd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 17:16:50 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 174Kxxhl008902; Wed, 4 Aug 2021 21:16:49 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma02dal.us.ibm.com with ESMTP id 3a6j2gp7xf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 21:16:49 +0000 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 174LGm9N18153970 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Aug 2021 21:16:48 GMT Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E8B8AC6055; Wed, 4 Aug 2021 21:16:47 +0000 (GMT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2A3B2C6059; Wed, 4 Aug 2021 21:16:47 +0000 (GMT) Received: from [9.160.54.214] (unknown [9.160.54.214]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTPS; Wed, 4 Aug 2021 21:16:46 +0000 (GMT) Subject: Re: [PATCH, rs6000] Add store fusion support for Power10 To: wschmidt@linux.ibm.com, GCC Patches Cc: Peter Bergner , David Edelsohn , Segher Boessenkool References: <21335366-a061-b56f-8885-5cea7517d808@linux.ibm.com> From: Pat Haugen Message-ID: Date: Wed, 4 Aug 2021 16:16:45 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <21335366-a061-b56f-8885-5cea7517d808@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: CN0tek1Vq6Mkus-oG-u-M9yHARdfaHo2 X-Proofpoint-ORIG-GUID: NxHco4Oi_MkVFEQ-Iz14H3VvVeDlp5a1 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-04_06:2021-08-04, 2021-08-04 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 priorityscore=1501 spamscore=0 phishscore=0 mlxlogscore=999 adultscore=0 impostorscore=0 suspectscore=0 bulkscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108040129 X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 21:16:53 -0000 On 8/4/21 9:23 AM, Bill Schmidt wrote: > Hi Pat, > > Good stuff!  Comments below. > > On 8/2/21 3:19 PM, Pat Haugen via Gcc-patches wrote: >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 279f00cc648..1460a0d7c5c 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p) >>         && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0) >>       rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD; >> >> +  if (TARGET_POWER10 >> +      && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) >> +    rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; >> + >>     /* Turn off vector pair/mma options on non-power10 systems.  */ >>     else if (!TARGET_POWER10 && TARGET_MMA) >>       { >> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem) >>     if (!pat || pat == NULL_RTX) >>       return false; >> >> -  if (GET_CODE (pat) == SET) >> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) >>       return find_mem_ref (SET_SRC (pat), load_mem); > Looks like this is just an optimization to quickly discard stores, right? Additional verification check to make sure destination is REG, yes. This will become a separate patch. >>     if (GET_CODE (pat) == PARALLEL) >> @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem) >>     if (!pat || pat == NULL_RTX) >>       return false; >> >> -  if (GET_CODE (pat) == SET) >> +  if (GET_CODE (pat) == SET >> +      && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))) >>       return find_mem_ref (SET_DEST (pat), str_mem); > > > Similar question. Similar answer. :) > >> >>     if (GET_CODE (pat) == PARALLEL) >> @@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos) >>     return cached_can_issue_more; >>   } >> >> +/* Determine if INSN is a store to memory that can be fused with a similar >> +   adjacent store.  */ >> + >> +static bool >> +is_fusable_store (rtx_insn *insn, rtx *str_mem) >> +{ >> +  /* Exit early if not doing store fusion.  */ >> +  if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE)) >> +    return false; >> + >> +  /* Insn must be a non-prefixed base+disp form store.  */ >> +  if (is_store_insn (insn, str_mem) >> +      && get_attr_prefixed (insn) == PREFIXED_NO >> +      && get_attr_update (insn) == UPDATE_NO >> +      && get_attr_indexed (insn) == INDEXED_NO) >> +    { >> +      /* Further restictions by mode and size.  */ >> +      machine_mode mode = GET_MODE (*str_mem); >> +      HOST_WIDE_INT size; >> +      if MEM_SIZE_KNOWN_P (*str_mem) >> +    size = MEM_SIZE (*str_mem); >> +      else >> +    return false; >> + >> +      if INTEGRAL_MODE_P (mode) >> +    { >> +      /* Must be word or dword size.  */ >> +      return (size == 4 || size == 8); >> +    } >> +      else if FLOAT_MODE_P (mode) >> +    { >> +      /* Must be dword size.  */ >> +      return (size == 8); >> +    } >> +    } >> + >> +  return false; >> +} >> + >> +/* Do Power10 specific reordering of the ready list.  */ >> + >> +static int >> +power10_sched_reorder (rtx_insn **ready, int lastpos) >> +{ >> +  int pos; >> +  rtx mem1, mem2; >> + >> +  /* Do store fusion during sched2 only.  */ >> +  if (!reload_completed) >> +    return cached_can_issue_more; >> + >> +  /* If the prior insn finished off a store fusion pair then simply >> +     reset the counter and return, nothing more to do.  */ > > > Good comments throughout, thanks! > >> +  if (load_store_pendulum != 0) >> +    { >> +      load_store_pendulum = 0; >> +      return cached_can_issue_more; >> +    } >> + >> +  /* Try to pair certain store insns to adjacent memory locations >> +     so that the hardware will fuse them to a single operation.  */ >> +  if (is_fusable_store (last_scheduled_insn, &mem1)) >> +    { >> +      /* A fusable store was just scheduled.  Scan the ready list for another >> +     store that it can fuse with.  */ >> +      pos = lastpos; >> +      while (pos >= 0) >> +    { >> +      /* GPR stores can be ascending or descending offsets, FPR/VSR stores > VSR?  I don't see how that applies here. Scalar floating point store from VSX reg (i.e. stxsd). >> +         must be ascending only.  */ >> +      if (is_fusable_store (ready[pos], &mem2) >> +          && ((INTEGRAL_MODE_P (GET_MODE (mem1)) >> +           && adjacent_mem_locations (mem1, mem2)) >> +          || (FLOAT_MODE_P (GET_MODE (mem1)) >> +           && (adjacent_mem_locations (mem1, mem2) == mem1)))) >> +        { >> +          /* Found a fusable store.  Move it to the end of the ready list >> +         so it is scheduled next.  */ >> +          move_to_end_of_ready (ready, pos, lastpos); >> + >> +          load_store_pendulum = -1; >> +          break; >> +        } >> +      pos--; >> +    } >> +    } >> + >> +  return cached_can_issue_more; >> +} >> + >>   /* We are about to begin issuing insns for this clock cycle. */ >> >>   static int >> @@ -18885,6 +18980,10 @@ rs6000_sched_reorder (FILE *dump ATTRIBUTE_UNUSED, int sched_verbose, >>     if (rs6000_tune == PROCESSOR_POWER6) >>       load_store_pendulum = 0; >> >> +  /* Do Power10 dependent reordering.  */ >> +  if (rs6000_tune == PROCESSOR_POWER10 && last_scheduled_insn) >> +    power10_sched_reorder (ready, *pn_ready - 1); >> + > > > I happened to notice that pn_ready is marked as ATTRIBUTE_UNUSED.  This predates your patch, but maybe you could clean that up too. Will do as a separate cleanup. > >>     return rs6000_issue_rate (); >>   } >> >> @@ -18906,6 +19005,10 @@ rs6000_sched_reorder2 (FILE *dump, int sched_verbose, rtx_insn **ready, >>         && recog_memoized (last_scheduled_insn) >= 0) >>       return power9_sched_reorder2 (ready, *pn_ready - 1); >> >> +  /* Do Power10 dependent reordering.  */ >> +  if (rs6000_tune == PROCESSOR_POWER10 && last_scheduled_insn) >> +    return power10_sched_reorder (ready, *pn_ready - 1); >> + >>     return cached_can_issue_more; >>   } >> >> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt >> index 0538db387dc..3753de19557 100644 >> --- a/gcc/config/rs6000/rs6000.opt >> +++ b/gcc/config/rs6000/rs6000.opt >> @@ -514,6 +514,10 @@ mpower10-fusion-2add >>   Target Undocumented Mask(P10_FUSION_2ADD) Var(rs6000_isa_flags) >>   Fuse dependent pairs of add or vaddudm instructions for better performance on power10. >> >> +mpower10-fusion-2store >> +Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags) >> +Fuse certain store operations together for better performance on power10. >> + >>   mcrypto >>   Target Mask(CRYPTO) Var(rs6000_isa_flags) >>   Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions. > > Can you think of any test cases we can use to demonstrate store fusion? Yeah, should be able to come up with something to verify two adjacent stores. Thanks for the review. > > Otherwise looks good to me.  Thanks! > > Bill >