From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) by sourceware.org (Postfix) with ESMTPS id 5F533385B81D for ; Tue, 17 Oct 2023 09:22:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5F533385B81D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5F533385B81D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697534548; cv=none; b=Sb6a1iwRrlLdrr5JR9+/XzY7x1fT2ZKrGjW1QG9PJCAap+O7HnkZ/oUcSVLT097Mul6SeqpwIFRKZ9MwZP7WZdxb7fk9D/VYazH7Y/hc3nN8/bowMgsM9jstYBVfkLIafdTYVP2Fr/jS03nHDW1DLUiW3TTG3cCWCLRQA6edYyM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697534548; c=relaxed/simple; bh=0MfQ42LdpWCXZzxB1QPzVrpsB3OvZ0uYNMWxlVgaYzo=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=uNqxBc+koiGQVfUlGevOjAg+yCZgY0omRWa7ESpVrwBUCZKzAmgMYhPSJzk8KftRHssbFV57yRI4SHX+Wd+rKPWyWG9cML21zuw8wqC67e9q++iSWZdOnYDj/MtMLHsY5A+p81bqEKrq/IDazZRJA1KPLZQ3nHBni3dv9orfzOY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-507a29c7eefso3659842e87.1 for ; Tue, 17 Oct 2023 02:22:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697534545; x=1698139345; darn=gcc.gnu.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RtHxnYGrwPa9Xf/YQX1VJlQq3iuT6GQIfAIhbEOkSa8=; b=JRVKE0tO19L44Q0uog1F0jZFv1DXw/dTfj38cbsx9zSEVTYj/vRGhPKdGh2kvOVpgV CdaSJlEncJT0rTNI50R/CKBqOkbIJNO7BPtJw4C7/LYLnXR+KW6PFvYEFcH2xj6p+Jqj o/ysJq3kxPvEFrQCvoRo2S6GAhTV1Ce78xNKSnJYWoT4EPVp0j2pnIlh4TLD+W5CPRCZ EO5JFMeb8rL7a8NDLgi41OWujIahLMc5ATVlFR9pmTwUYiN2Aosxf+HCyhCRZEZTH7/p 7s5CWI/MusHRWQtJBouqUolVCgUVWgubCLWua/nGZJU2JgUI+KO2VVDc5leAso1DO5rQ ONag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697534545; x=1698139345; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RtHxnYGrwPa9Xf/YQX1VJlQq3iuT6GQIfAIhbEOkSa8=; b=QiqBQl2raj6UYGUI+NYjXK3WCFKRNRbIgIYPRB9w93hs6zUu8YkywphtklSRmc5lMO xZRYPlV8thz3nb/LX04+NrVpxfxLK0zchyoX7dzHY6ZFEVBZSHfk7kU7lPXgeiUrE+Qt B8VyIA0/N90BOk93X6FIiP44CpoOHRTSeBIvHbnrRJU/XxEYHVWjFU3i+BfzV1rq9dsm Z3nkaPkq4kjcU9VhdiEOzvhgbyZ7+ODxUWH60/fiI24Y879sL44o9YLC5rsjssmQYFrj q8f6h7zcWN0doeFRGv6bA1iZID8QxR6pYgVKMlfR4VQZCpVIRs8Y5SFUkPw1flNvHTjK SkJQ== X-Gm-Message-State: AOJu0Yy37XpiSXRzIRNbwb+eU3oxmmxqwhjVORPRm2ZOjhil8SctQv96 kvL/2cQEaYWP/sTvCcRMjkzMTzVo+U8K2+RWib7VD0VD X-Google-Smtp-Source: AGHT+IH4Rmm47PUUdken9O3H9J1jqGyHYGSuuCNru2ZzU90B40eNcPJpf3Z8iFsz0c49uPE0K7YbUhfa5eOPho8ZfvE= X-Received: by 2002:ac2:4c83:0:b0:507:9787:6779 with SMTP id d3-20020ac24c83000000b0050797876779mr1191452lfl.3.1697534544581; Tue, 17 Oct 2023 02:22:24 -0700 (PDT) MIME-Version: 1.0 References: <103b7db0ce64fb9f2757e1a7d98cb7fa3103c4f2.1692699125.git.szabolcs.nagy@arm.com> In-Reply-To: From: Richard Biener Date: Tue, 17 Oct 2023 11:19:27 +0200 Message-ID: Subject: Re: [PATCH 02/11] Handle epilogues that contain jumps To: Richard Biener , Szabolcs Nagy , gcc-patches@gcc.gnu.org, Kyrylo.Tkachov@arm.com, richard.earnshaw@arm.com, richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Oct 12, 2023 at 10:15=E2=80=AFAM Richard Sandiford wrote: > > Richard Biener writes: > > On Tue, Aug 22, 2023 at 12:42=E2=80=AFPM Szabolcs Nagy via Gcc-patches > > wrote: > >> > >> From: Richard Sandiford > >> > >> The prologue/epilogue pass allows the prologue sequence > >> to contain jumps. The sequence is then partitioned into > >> basic blocks using find_many_sub_basic_blocks. > >> > >> This patch treats epilogues in the same way. It's needed for > >> a follow-on aarch64 patch that adds conditional code to both > >> the prologue and the epilogue. > >> > >> Tested on aarch64-linux-gnu (including with a follow-on patch) > >> and x86_64-linux-gnu. OK to install? > >> > >> Richard > >> > >> gcc/ > >> * function.cc (thread_prologue_and_epilogue_insns): Handle > >> epilogues that contain jumps. > >> --- > >> > >> This is a previously approved patch that was not committed > >> because it was not needed at the time, but i'd like to commit > >> it as it is needed for the followup aarch64 eh_return changes: > >> > >> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605769.html > >> > >> --- > >> gcc/function.cc | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/gcc/function.cc b/gcc/function.cc > >> index dd2c1136e07..70d1cd65303 100644 > >> --- a/gcc/function.cc > >> +++ b/gcc/function.cc > >> @@ -6120,6 +6120,11 @@ thread_prologue_and_epilogue_insns (void) > >> && returnjump_p (BB_END (e->src))) > >> e->flags &=3D ~EDGE_FALLTHRU; > >> } > >> + > >> + auto_sbitmap blocks (last_basic_block_for_fn (cfun)); > >> + bitmap_clear (blocks); > >> + bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->ind= ex); > >> + find_many_sub_basic_blocks (blocks); > >> } > >> else if (next_active_insn (BB_END (exit_fallthru_edge->src))) > >> { > >> @@ -6218,6 +6223,11 @@ thread_prologue_and_epilogue_insns (void) > >> set_insn_locations (seq, epilogue_location); > >> > >> emit_insn_before (seq, insn); > >> + > >> + auto_sbitmap blocks (last_basic_block_for_fn (cfun)); > >> + bitmap_clear (blocks); > >> + bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index); > >> + find_many_sub_basic_blocks (blocks); > > > > I'll note that clearing a full sbitmap to pass down a single basic bloc= k > > to find_many_sub_basic_blocks is a quite expensive operation. May I su= ggest > > to add an overload operating on a single basic block? It's only > > > > FOR_EACH_BB_FN (bb, cfun) > > SET_STATE (bb, > > bitmap_bit_p (blocks, bb->index) ? BLOCK_TO_SPLIT : > > BLOCK_ORIGINAL); > > > > using the bitmap, so factoring the rest of the function and customizing= this > > walk would do the trick. Note that the whole function could be refacto= red to > > handle single blocks more efficiently. > > Sorry for the late reply, but does this look OK? Tested on > aarch64-linux-gnu and x86_64-linux-gnu. LGTM, not sure if I'm qualified enough to approve though (I think you are more qualified here, so ..) Thanks, Richard. > Thanks, > Richard > > --- > > The prologue/epilogue pass allows the prologue sequence to contain > jumps. The sequence is then partitioned into basic blocks using > find_many_sub_basic_blocks. > > This patch treats epilogues in a similar way. Since only one block > might need to be split, the patch (re)introduces a find_sub_basic_blocks > routine to handle a single block. > > The new routine hard-codes the assumption that split_block will chain > the new block immediately after the original block. The routine doesn't > try to replicate the fix for PR81030, since that was specific to > gimple->rtl expansion. > > The patch is needed for follow-on aarch64 patches that add conditional > code to the epilogue. The tests are part of those patches. > > gcc/ > * cfgbuild.h (find_sub_basic_blocks): Declare. > * cfgbuild.cc (update_profile_for_new_sub_basic_block): New funct= ion, > split out from... > (find_many_sub_basic_blocks): ...here. > (find_sub_basic_blocks): New function. > * function.cc (thread_prologue_and_epilogue_insns): Handle > epilogues that contain jumps. > --- > gcc/cfgbuild.cc | 95 +++++++++++++++++++++++++++++++++---------------- > gcc/cfgbuild.h | 1 + > gcc/function.cc | 4 +++ > 3 files changed, 70 insertions(+), 30 deletions(-) > > diff --git a/gcc/cfgbuild.cc b/gcc/cfgbuild.cc > index 15ed4deb5f7..9a6b34fb4b1 100644 > --- a/gcc/cfgbuild.cc > +++ b/gcc/cfgbuild.cc > @@ -693,6 +693,43 @@ compute_outgoing_frequencies (basic_block b) > } > } > > +/* Update the profile information for BB, which was created by splitting > + an RTL block that had a non-final jump. */ > + > +static void > +update_profile_for_new_sub_basic_block (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + > + bool initialized_src =3D false, uninitialized_src =3D false; > + bb->count =3D profile_count::zero (); > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + if (e->count ().initialized_p ()) > + { > + bb->count +=3D e->count (); > + initialized_src =3D true; > + } > + else > + uninitialized_src =3D true; > + } > + /* When some edges are missing with read profile, this is > + most likely because RTL expansion introduced loop. > + When profile is guessed we may have BB that is reachable > + from unlikely path as well as from normal path. > + > + TODO: We should handle loops created during BB expansion > + correctly here. For now we assume all those loop to cycle > + precisely once. */ > + if (!initialized_src > + || (uninitialized_src > + && profile_status_for_fn (cfun) < PROFILE_GUESSED)) > + bb->count =3D profile_count::uninitialized (); > + > + compute_outgoing_frequencies (bb); > +} > + > /* Assume that some pass has inserted labels or control flow > instructions within a basic block. Split basic blocks as needed > and create edges. */ > @@ -744,40 +781,15 @@ find_many_sub_basic_blocks (sbitmap blocks) > if (profile_status_for_fn (cfun) !=3D PROFILE_ABSENT) > FOR_BB_BETWEEN (bb, min, max->next_bb, next_bb) > { > - edge e; > - edge_iterator ei; > - > if (STATE (bb) =3D=3D BLOCK_ORIGINAL) > continue; > if (STATE (bb) =3D=3D BLOCK_NEW) > { > - bool initialized_src =3D false, uninitialized_src =3D false; > - bb->count =3D profile_count::zero (); > - FOR_EACH_EDGE (e, ei, bb->preds) > - { > - if (e->count ().initialized_p ()) > - { > - bb->count +=3D e->count (); > - initialized_src =3D true; > - } > - else > - uninitialized_src =3D true; > - } > - /* When some edges are missing with read profile, this is > - most likely because RTL expansion introduced loop. > - When profile is guessed we may have BB that is reachable > - from unlikely path as well as from normal path. > - > - TODO: We should handle loops created during BB expansion > - correctly here. For now we assume all those loop to cycle > - precisely once. */ > - if (!initialized_src > - || (uninitialized_src > - && profile_status_for_fn (cfun) < PROFILE_GUESSED)) > - bb->count =3D profile_count::uninitialized (); > + update_profile_for_new_sub_basic_block (bb); > + continue; > } > - /* If nothing changed, there is no need to create new BBs. */ > - else if (EDGE_COUNT (bb->succs) =3D=3D n_succs[bb->index]) > + /* If nothing changed, there is no need to create new BBs. */ > + if (EDGE_COUNT (bb->succs) =3D=3D n_succs[bb->index]) > { > /* In rare occassions RTL expansion might have mistakely assi= gned > a probabilities different from what is in CFG. This happe= ns > @@ -788,10 +800,33 @@ find_many_sub_basic_blocks (sbitmap blocks) > update_br_prob_note (bb); > continue; > } > - > compute_outgoing_frequencies (bb); > } > > FOR_EACH_BB_FN (bb, cfun) > SET_STATE (bb, 0); > } > + > +/* Like find_many_sub_basic_blocks, but look only within BB. */ > + > +void > +find_sub_basic_blocks (basic_block bb) > +{ > + basic_block end_bb =3D bb->next_bb; > + find_bb_boundaries (bb); > + if (bb->next_bb =3D=3D end_bb) > + return; > + > + /* Re-scan and wire in all edges. This expects simple (conditional) > + jumps at the end of each new basic blocks. */ > + make_edges (bb, end_bb->prev_bb, 1); > + > + /* Update branch probabilities. Expect only (un)conditional jumps > + to be created with only the forward edges. */ > + if (profile_status_for_fn (cfun) !=3D PROFILE_ABSENT) > + { > + compute_outgoing_frequencies (bb); > + for (bb =3D bb->next_bb; bb !=3D end_bb; bb =3D bb->next_bb) > + update_profile_for_new_sub_basic_block (bb); > + } > +} > diff --git a/gcc/cfgbuild.h b/gcc/cfgbuild.h > index 51d3eccb1ae..4191fb3fcba 100644 > --- a/gcc/cfgbuild.h > +++ b/gcc/cfgbuild.h > @@ -24,5 +24,6 @@ extern bool inside_basic_block_p (const rtx_insn *); > extern bool control_flow_insn_p (const rtx_insn *); > extern void rtl_make_eh_edge (sbitmap, basic_block, rtx); > extern void find_many_sub_basic_blocks (sbitmap); > +extern void find_sub_basic_blocks (basic_block); > > #endif /* GCC_CFGBUILD_H */ > diff --git a/gcc/function.cc b/gcc/function.cc > index 336af28fb22..afb0b33da9e 100644 > --- a/gcc/function.cc > +++ b/gcc/function.cc > @@ -6112,6 +6112,8 @@ thread_prologue_and_epilogue_insns (void) > && returnjump_p (BB_END (e->src))) > e->flags &=3D ~EDGE_FALLTHRU; > } > + > + find_sub_basic_blocks (BLOCK_FOR_INSN (epilogue_seq)); > } > else if (next_active_insn (BB_END (exit_fallthru_edge->src))) > { > @@ -6210,6 +6212,8 @@ thread_prologue_and_epilogue_insns (void) > set_insn_locations (seq, epilogue_location); > > emit_insn_before (seq, insn); > + > + find_sub_basic_blocks (BLOCK_FOR_INSN (insn)); > } > } > > -- > 2.25.1 >