From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 913F93858414 for ; Wed, 24 May 2023 10:41:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 913F93858414 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id AA54E1F889; Wed, 24 May 2023 10:41:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1684924898; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TQ4DYB2EHIPPXosXjIzyFZQWvJLeT8Gm8VM/hB9I5yU=; b=EyU+6IBQiEyfolNkaSJyWJ24fawZ6YfgHFp61NH+oMFez7nHGUfqVW57NgEQEuGu+PkQ3h nYXx0hoQfXW3M8d8CnTZuQ/Jiv12HcKk24UXOrypaZmCNBnoQeudYVkEKGDSQOMIMqdgsg I/sX3Ro96ymQR76q522c0iKHq7xqBPE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1684924898; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TQ4DYB2EHIPPXosXjIzyFZQWvJLeT8Gm8VM/hB9I5yU=; b=SyMWjQ8fgWxmNFkOKHSIs1zexDy43MdhgYeTO+OHBQ24/Xx89FcHicjaEKBJmpgIO/O2mW z0AU9rg4Jnhs5zAQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 9662E2C141; Wed, 24 May 2023 10:41:38 +0000 (UTC) Date: Wed, 24 May 2023 10:41:38 +0000 (UTC) From: Richard Biener To: Christophe Lyon cc: gcc-patches@gcc.gnu.org, "richard.sandiford@arm.com" Subject: Re: [PATCH] tree-optimization/109849 - missed code hoisting In-Reply-To: Message-ID: References: <20230523095533.B1C0713588@imap2.suse-dmz.suse.de> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 24 May 2023, Christophe Lyon wrote: > Hi Richard, > > On Tue, 23 May 2023 at 11:55, Richard Biener via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > > The following fixes code hoisting to properly consider ANTIC_OUT instead > > of ANTIC_IN. That's a bit expensive to re-compute but since we no > > longer iterate we're doing this only once per BB which should be > > acceptable. This avoids missing hoistings to the end of blocks where > > something in the block clobbers the hoisted value. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > > > PR tree-optimization/109849 > > * tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT > > and use that to determine what to hoist. > > > > * gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase. > > > > This patch causes a regression on aarch64: > gcc.target/aarch64/sve/fmla_2.c: \\tst1d found 3 times > FAIL: gcc.target/aarch64/sve/fmla_2.c scan-assembler-times \\tst1d 2 > > > We used to generate: > mov x6, 0 > mov w7, 55 > whilelo p7.d, wzr, w7 > .p2align 3,,7 > .L2: > ld1d z30.d, p7/z, [x5, x6, lsl 3] > ld1d z31.d, p7/z, [x4, x6, lsl 3] > cmpne p6.d, p7/z, z30.d, #0 > ld1d z30.d, p7/z, [x3, x6, lsl 3] > ld1d z29.d, p6/z, [x2, x6, lsl 3] > movprfx z28, z30 > fmla z28.d, p6/m, z31.d, z29.d > fmla z31.d, p6/m, z30.d, z29.d > st1d z28.d, p7, [x1, x6, lsl 3] > st1d z31.d, p7, [x0, x6, lsl 3] > incd x6 > whilelo p7.d, w6, w7 > b.any .L2 > > > But now: > mov x6, 0 > mov w7, 55 > ptrue p4.b, all > whilelo p7.d, wzr, w7 > .p2align 3,,7 > .L2: > ld1d z30.d, p7/z, [x5, x6, lsl 3] > ld1d z31.d, p7/z, [x4, x6, lsl 3] > cmpne p6.d, p7/z, z30.d, #0 > cmpeq p5.d, p7/z, z30.d, #0 > ld1d z29.d, p6/z, [x2, x6, lsl 3] > ld1d z28.d, p6/z, [x3, x6, lsl 3] > ld1d z30.d, p5/z, [x3, x6, lsl 3] > movprfx z27, z31 > fmla z27.d, p4/m, z29.d, z28.d > movprfx z30.d, p6/m, z28.d > fmla z30.d, p6/m, z31.d, z29.d > st1d z27.d, p6, [x0, x6, lsl 3] > st1d z30.d, p7, [x1, x6, lsl 3] > st1d z31.d, p5, [x0, x6, lsl 3] > incd x6 > whilelo p7.d, w6, w7 > b.any .L2 Thanks for reporting. I'm testing the following together with a testcase that's not architecture specific. Richard. >From 2340df60dd9192b30b02de5b34f9cb7c16806430 Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Wed, 24 May 2023 12:36:28 +0200 Subject: [PATCH] tree-optimization/109849 - fix fallout of PRE hoisting change To: gcc-patches@gcc.gnu.org The PR109849 fix made us no longer hoist some memory loads because of the expression set intersection. We can still avoid to compute the union by simply taking the first sets expressions and leave the pruning of expressions with values not suitable for hoisting to sorted_array_from_bitmap_set. PR tree-optimization/109849 * tree-ssa-pre.cc (do_hoist_insertion): Do not intersect expressions but take the first sets. * gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c | 20 ++++++++++++++++++++ gcc/tree-ssa-pre.cc | 12 ++++-------- 2 files changed, 24 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c new file mode 100644 index 00000000000..388f79fd80f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre-stats" } */ + +int foo (int flag, int * __restrict a, int * __restrict b) +{ + int res; + if (flag) + res = *a + *b; + else + { + res = *a; + *a = 1; + res += *b; + } + return res; +} + +/* { dg-final { scan-tree-dump "HOIST inserted: 3" "pre" } } */ +/* { dg-final { scan-tree-dump-times " = \\\*" 2 "pre" } } */ +/* { dg-final { scan-tree-dump-times " = \[^\r\n\]* \\\+ \[^\r\n\]*;" 1 "pre" } } */ diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc index b1ceea90a8e..7bbfa5ac43d 100644 --- a/gcc/tree-ssa-pre.cc +++ b/gcc/tree-ssa-pre.cc @@ -3625,8 +3625,9 @@ do_hoist_insertion (basic_block block) /* We have multiple successors, compute ANTIC_OUT by taking the intersection of all of ANTIC_IN translating through PHI nodes. Note we do not have to - worry about iteration stability here so just intersect the expression sets - as well. This is a simplification of what we do in compute_antic_aux. */ + worry about iteration stability here so just use the expression set + from the first set and prune that by sorted_array_from_bitmap_set. + This is a simplification of what we do in compute_antic_aux. */ bitmap_set_t ANTIC_OUT = bitmap_set_new (); bool first = true; FOR_EACH_EDGE (e, ei, block->succs) @@ -3641,15 +3642,10 @@ do_hoist_insertion (basic_block block) bitmap_set_t tmp = bitmap_set_new (); phi_translate_set (tmp, ANTIC_IN (e->dest), e); bitmap_and_into (&ANTIC_OUT->values, &tmp->values); - bitmap_and_into (&ANTIC_OUT->expressions, &tmp->expressions); bitmap_set_free (tmp); } else - { - bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN (e->dest)->values); - bitmap_and_into (&ANTIC_OUT->expressions, - &ANTIC_IN (e->dest)->expressions); - } + bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN (e->dest)->values); } /* Compute the set of hoistable expressions from ANTIC_OUT. First compute -- 2.35.3