From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id 7E907385770A for ; Wed, 24 May 2023 10:07:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E907385770A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-96fd3a658eeso98254966b.1 for ; Wed, 24 May 2023 03:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684922876; x=1687514876; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=mRcSNyDBthra9d7ENvp/t+Xf6X6JWdbuTTBo7krbz6E=; b=vOEDNG4FOUEWqAPGfJltsuOwUeRLCpvPl0kMg+u154g/LRYecHZZhmeUXFbvyS3Haw nUi33sChOQq/fIXsjHdlpyGqE39oZcBDi3vIXk3j2wRo0kfz3zXagv9wGUhD6H3E+Kg7 MgAErJ+87xWnoSCcJvxGxOagSEbp0SG1b2yz43gzhsrzm7+TRgY0KIuro74Kdjs8MtXI 1kdcfKKn66+rz1/iZIuc5XwB0uYxN04hYDVGxTpR6KbI1YjQ7hgQ6qYPNJ+EylZM5+za gFqWftk0NOFDuepUSwvkLCgM5TYEQVla54cmXfzY4TLHaRj80iXzre5+7ioCHpKYZKRq Zb8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684922876; x=1687514876; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mRcSNyDBthra9d7ENvp/t+Xf6X6JWdbuTTBo7krbz6E=; b=Df92ClgJaMlVFwuc3xzMCFPM3GmtiQ/2+Q3ZL7b24wEECwDmwsNOCTeuD7V4bSouYr jEJ4+YIbL5m89aohcOskqatMkmd6llLi+O8HP3s2/W84w/HLg9xayRwm1bvt27WrZfij sIOPIjt2pA8CHe72gls//+PuYp4G0k8y/qQ2eO8/BnLMU9IKXVWTNO1WlOjsZf8aNYv0 jtK/8ZIPSV8geXSH90xEUNGkuLnCfla912sfxDWFSTMOFOgy2Fgy/LLFp6L8SSZMuZdC TJY8z0JaqRLJX950ZojVyxw8raEir3QmDwatopcKubunJuVCohN8wyxhx1fLLT23Hj8X hBOA== X-Gm-Message-State: AC+VfDzckekSrmsbo7dxBjaG6hi1VObZUKJte8XQbeG1YuVruHtjNQoR vMmo8yAyKeo93L1Wju7wh9v56L3Hs056EJsMk8A14Q== X-Google-Smtp-Source: ACHHUZ4f3YbqqPlJYu2mYDJBktTyznqh2Sgk7fYHP3KWjG/6gWBub+kWwrf46IyYqJP7G/pYM1luydolBdfi+HFuF8M= X-Received: by 2002:a17:907:3f16:b0:94f:a292:20cc with SMTP id hq22-20020a1709073f1600b0094fa29220ccmr17690806ejc.41.1684922875808; Wed, 24 May 2023 03:07:55 -0700 (PDT) MIME-Version: 1.0 References: <20230523095533.B1C0713588@imap2.suse-dmz.suse.de> In-Reply-To: <20230523095533.B1C0713588@imap2.suse-dmz.suse.de> From: Christophe Lyon Date: Wed, 24 May 2023 12:07:48 +0200 Message-ID: Subject: Re: [PATCH] tree-optimization/109849 - missed code hoisting To: Richard Biener Cc: gcc-patches@gcc.gnu.org, "richard.sandiford@arm.com" Content-Type: multipart/alternative; boundary="0000000000007ca3f305fc6dadb9" X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,HTML_MESSAGE,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000007ca3f305fc6dadb9 Content-Type: text/plain; charset="UTF-8" Hi Richard, On Tue, 23 May 2023 at 11:55, Richard Biener via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > The following fixes code hoisting to properly consider ANTIC_OUT instead > of ANTIC_IN. That's a bit expensive to re-compute but since we no > longer iterate we're doing this only once per BB which should be > acceptable. This avoids missing hoistings to the end of blocks where > something in the block clobbers the hoisted value. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > PR tree-optimization/109849 > * tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT > and use that to determine what to hoist. > > * gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase. > This patch causes a regression on aarch64: gcc.target/aarch64/sve/fmla_2.c: \\tst1d found 3 times FAIL: gcc.target/aarch64/sve/fmla_2.c scan-assembler-times \\tst1d 2 We used to generate: mov x6, 0 mov w7, 55 whilelo p7.d, wzr, w7 .p2align 3,,7 .L2: ld1d z30.d, p7/z, [x5, x6, lsl 3] ld1d z31.d, p7/z, [x4, x6, lsl 3] cmpne p6.d, p7/z, z30.d, #0 ld1d z30.d, p7/z, [x3, x6, lsl 3] ld1d z29.d, p6/z, [x2, x6, lsl 3] movprfx z28, z30 fmla z28.d, p6/m, z31.d, z29.d fmla z31.d, p6/m, z30.d, z29.d st1d z28.d, p7, [x1, x6, lsl 3] st1d z31.d, p7, [x0, x6, lsl 3] incd x6 whilelo p7.d, w6, w7 b.any .L2 But now: mov x6, 0 mov w7, 55 ptrue p4.b, all whilelo p7.d, wzr, w7 .p2align 3,,7 .L2: ld1d z30.d, p7/z, [x5, x6, lsl 3] ld1d z31.d, p7/z, [x4, x6, lsl 3] cmpne p6.d, p7/z, z30.d, #0 cmpeq p5.d, p7/z, z30.d, #0 ld1d z29.d, p6/z, [x2, x6, lsl 3] ld1d z28.d, p6/z, [x3, x6, lsl 3] ld1d z30.d, p5/z, [x3, x6, lsl 3] movprfx z27, z31 fmla z27.d, p4/m, z29.d, z28.d movprfx z30.d, p6/m, z28.d fmla z30.d, p6/m, z31.d, z29.d st1d z27.d, p6, [x0, x6, lsl 3] st1d z30.d, p7, [x1, x6, lsl 3] st1d z31.d, p5, [x0, x6, lsl 3] incd x6 whilelo p7.d, w6, w7 b.any .L2 Christophe --- > gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c | 22 ++++++++++ > gcc/tree-ssa-pre.cc | 48 ++++++++++++++++++--- > 2 files changed, 64 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c > new file mode 100644 > index 00000000000..66bb48e0dc1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-pre-stats" } */ > + > +int mem; > +void foo (); > +int bar (int flag) > +{ > + int res; > + foo (); > + /* Hoist the load of mem here even though foo () clobbers it. */ > + if (flag) > + res = mem; > + else > + { > + res = mem; > + mem = 2; > + } > + return res; > +} > + > +/* { dg-final { scan-tree-dump "HOIST inserted: 1" "pre" } } */ > +/* { dg-final { scan-tree-dump-times " = mem;" 1 "pre" } } */ > diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc > index 1f7eea93c16..d56431b4145 100644 > --- a/gcc/tree-ssa-pre.cc > +++ b/gcc/tree-ssa-pre.cc > @@ -3622,19 +3622,51 @@ do_hoist_insertion (basic_block block) > && stmt_ends_bb_p (gsi_stmt (last))) > return false; > > - /* Compute the set of hoistable expressions from ANTIC_IN. First > compute > + /* We have multiple successors, compute ANTIC_OUT by taking the > intersection > + of all of ANTIC_IN translating through PHI nodes. Note we do not > have to > + worry about iteration stability here so just intersect the > expression sets > + as well. This is a simplification of what we do in > compute_antic_aux. */ > + bitmap_set_t ANTIC_OUT = bitmap_set_new (); > + bool first = true; > + FOR_EACH_EDGE (e, ei, block->succs) > + { > + if (first) > + { > + phi_translate_set (ANTIC_OUT, ANTIC_IN (e->dest), e); > + first = false; > + } > + else if (!gimple_seq_empty_p (phi_nodes (e->dest))) > + { > + bitmap_set_t tmp = bitmap_set_new (); > + phi_translate_set (tmp, ANTIC_IN (e->dest), e); > + bitmap_and_into (&ANTIC_OUT->values, &tmp->values); > + bitmap_and_into (&ANTIC_OUT->expressions, &tmp->expressions); > + bitmap_set_free (tmp); > + } > + else > + { > + bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN > (e->dest)->values); > + bitmap_and_into (&ANTIC_OUT->expressions, > + &ANTIC_IN (e->dest)->expressions); > + } > + } > + > + /* Compute the set of hoistable expressions from ANTIC_OUT. First > compute > hoistable values. */ > bitmap_set hoistable_set; > > - /* A hoistable value must be in ANTIC_IN(block) > + /* A hoistable value must be in ANTIC_OUT(block) > but not in AVAIL_OUT(BLOCK). */ > bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack); > bitmap_and_compl (&hoistable_set.values, > - &ANTIC_IN (block)->values, &AVAIL_OUT (block)->values); > + &ANTIC_OUT->values, &AVAIL_OUT (block)->values); > > /* Short-cut for a common case: hoistable_set is empty. */ > if (bitmap_empty_p (&hoistable_set.values)) > - return false; > + { > + bitmap_set_free (ANTIC_OUT); > + return false; > + } > > /* Compute which of the hoistable values is in AVAIL_OUT of > at least one of the successors of BLOCK. */ > @@ -3652,11 +3684,14 @@ do_hoist_insertion (basic_block block) > > /* Short-cut for a common case: availout_in_some is empty. */ > if (bitmap_empty_p (&availout_in_some)) > - return false; > + { > + bitmap_set_free (ANTIC_OUT); > + return false; > + } > > /* Hack hoitable_set in-place so we can use > sorted_array_from_bitmap_set. */ > bitmap_move (&hoistable_set.values, &availout_in_some); > - hoistable_set.expressions = ANTIC_IN (block)->expressions; > + hoistable_set.expressions = ANTIC_OUT->expressions; > > /* Now finally construct the topological-ordered expression set. */ > vec exprs = sorted_array_from_bitmap_set (&hoistable_set); > @@ -3731,6 +3766,7 @@ do_hoist_insertion (basic_block block) > } > > exprs.release (); > + bitmap_set_free (ANTIC_OUT); > > return new_stuff; > } > -- > 2.35.3 > --0000000000007ca3f305fc6dadb9--