From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id C26A83858433 for ; Wed, 24 May 2023 12:45:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C26A83858433 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-510ede0f20aso1749081a12.3 for ; Wed, 24 May 2023 05:45:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684932347; x=1687524347; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=i9PQGwP9xHVQEy35UBgSi4A7cF6Bo97KRNhdm+Z/I8s=; b=ev2Z3s1UxQ/H6ZXLXRVczXfcrTTIfYuiAHemuzK4iSbMqZiDvEln1wQf5mZJMVKtTO H7JIzScUasWIHtocgPXx+KYQQvV9lb59nT7zz7tTgYYQXCl8QksaDm4dZxJGZ9hshCP1 Hq+3e+eVvadT5aWUW50Zce5UUYPS71b+EaSHtkQ/BhGSReNDnhCugGth2xrJtPHsLGsC TqkOJWa/3i3He7AjK1t2/KMcPTRa5Qr7604xi2dQKqUaLleIyYWQw2IuYj8qaBvzyO8Q wspnU1tPnqWKFZyGfMXghu8z7QhlqDDVDuLBkFPkDRqD5h54Aop3zT0pjDcH+oMCY+bm Rt+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684932347; x=1687524347; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i9PQGwP9xHVQEy35UBgSi4A7cF6Bo97KRNhdm+Z/I8s=; b=dgcbelRjV0UV7npXqA+XVDPYjmA+CD82rnyA77GFXOnV3t5L/3WpJUubwWJELrQm3Y INOeh6uS2o9X8hOPx8N/c12SWcrMIvlKrzcPOzA6g8f2vOwCGbXSruVDGYvyAAPc2MAa nFEvLCq17TcxayLW4K9H6sEHAdIVtbK6f+RCYu+JE32/HG4FKIWHxj9h07dMAXxA/K46 zROaN340YD0KXiZC7tOPYp801jGAsZu5sjJykfTuDKGyOUOFfs+i6KfZCBByXfdw+Z6K 79C+tUMzMnWFNMb4sVpuuGtgq0Qo2KWbJB9Q8RUy2Pikwu2M0TcdL9UXdBzkGQXQnfJi zflQ== X-Gm-Message-State: AC+VfDyzg5HPveWO069yFfJnOR3ua10c4y69HEmpZ9SOn5gIjw0oPML/ CW23lVPsZx+JzYTVUBKuuJZqsVxlXbZjP2IhQiGuXw== X-Google-Smtp-Source: ACHHUZ4t0tRNt5RwL4zpJr7kBUgGTFA5Y2jTFj3/+sE7WadT/REdhBKELTBs+9qeyzYSioJVNqTq7weDsEj6qqdJ4BQ= X-Received: by 2002:aa7:d515:0:b0:50b:c9d4:8804 with SMTP id y21-20020aa7d515000000b0050bc9d48804mr2038294edq.4.1684932347373; Wed, 24 May 2023 05:45:47 -0700 (PDT) MIME-Version: 1.0 References: <20230523095533.B1C0713588@imap2.suse-dmz.suse.de> In-Reply-To: From: Christophe Lyon Date: Wed, 24 May 2023 14:45:40 +0200 Message-ID: Subject: Re: [PATCH] tree-optimization/109849 - missed code hoisting To: Richard Biener Cc: gcc-patches@gcc.gnu.org, "richard.sandiford@arm.com" Content-Type: multipart/alternative; boundary="0000000000000940e205fc6fe225" X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,HTML_MESSAGE,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000000940e205fc6fe225 Content-Type: text/plain; charset="UTF-8" On Wed, 24 May 2023 at 12:41, Richard Biener wrote: > On Wed, 24 May 2023, Christophe Lyon wrote: > > > Hi Richard, > > > > On Tue, 23 May 2023 at 11:55, Richard Biener via Gcc-patches < > > gcc-patches@gcc.gnu.org> wrote: > > > > > The following fixes code hoisting to properly consider ANTIC_OUT > instead > > > of ANTIC_IN. That's a bit expensive to re-compute but since we no > > > longer iterate we're doing this only once per BB which should be > > > acceptable. This avoids missing hoistings to the end of blocks where > > > something in the block clobbers the hoisted value. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > > > > > PR tree-optimization/109849 > > > * tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT > > > and use that to determine what to hoist. > > > > > > * gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase. > > > > > > > This patch causes a regression on aarch64: > > gcc.target/aarch64/sve/fmla_2.c: \\tst1d found 3 times > > FAIL: gcc.target/aarch64/sve/fmla_2.c scan-assembler-times \\tst1d 2 > > > > > > We used to generate: > > mov x6, 0 > > mov w7, 55 > > whilelo p7.d, wzr, w7 > > .p2align 3,,7 > > .L2: > > ld1d z30.d, p7/z, [x5, x6, lsl 3] > > ld1d z31.d, p7/z, [x4, x6, lsl 3] > > cmpne p6.d, p7/z, z30.d, #0 > > ld1d z30.d, p7/z, [x3, x6, lsl 3] > > ld1d z29.d, p6/z, [x2, x6, lsl 3] > > movprfx z28, z30 > > fmla z28.d, p6/m, z31.d, z29.d > > fmla z31.d, p6/m, z30.d, z29.d > > st1d z28.d, p7, [x1, x6, lsl 3] > > st1d z31.d, p7, [x0, x6, lsl 3] > > incd x6 > > whilelo p7.d, w6, w7 > > b.any .L2 > > > > > > But now: > > mov x6, 0 > > mov w7, 55 > > ptrue p4.b, all > > whilelo p7.d, wzr, w7 > > .p2align 3,,7 > > .L2: > > ld1d z30.d, p7/z, [x5, x6, lsl 3] > > ld1d z31.d, p7/z, [x4, x6, lsl 3] > > cmpne p6.d, p7/z, z30.d, #0 > > cmpeq p5.d, p7/z, z30.d, #0 > > ld1d z29.d, p6/z, [x2, x6, lsl 3] > > ld1d z28.d, p6/z, [x3, x6, lsl 3] > > ld1d z30.d, p5/z, [x3, x6, lsl 3] > > movprfx z27, z31 > > fmla z27.d, p4/m, z29.d, z28.d > > movprfx z30.d, p6/m, z28.d > > fmla z30.d, p6/m, z31.d, z29.d > > st1d z27.d, p6, [x0, x6, lsl 3] > > st1d z30.d, p7, [x1, x6, lsl 3] > > st1d z31.d, p5, [x0, x6, lsl 3] > > incd x6 > > whilelo p7.d, w6, w7 > > b.any .L2 > > Thanks for reporting. I'm testing the following together with > a testcase that's not architecture specific. > > Richard. > > From 2340df60dd9192b30b02de5b34f9cb7c16806430 Mon Sep 17 00:00:00 2001 > From: Richard Biener > Date: Wed, 24 May 2023 12:36:28 +0200 > Subject: [PATCH] tree-optimization/109849 - fix fallout of PRE hoisting > change > To: gcc-patches@gcc.gnu.org > > The PR109849 fix made us no longer hoist some memory loads because > of the expression set intersection. We can still avoid to compute > the union by simply taking the first sets expressions and leave > the pruning of expressions with values not suitable for hoisting > to sorted_array_from_bitmap_set. > > PR tree-optimization/109849 > * tree-ssa-pre.cc (do_hoist_insertion): Do not intersect > expressions but take the first sets. > > * gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase. > Thanks, I confirm that this fixes the problem I reported on aarch64. Christophe > --- > gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c | 20 ++++++++++++++++++++ > gcc/tree-ssa-pre.cc | 12 ++++-------- > 2 files changed, 24 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c > new file mode 100644 > index 00000000000..388f79fd80f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-9.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-pre-stats" } */ > + > +int foo (int flag, int * __restrict a, int * __restrict b) > +{ > + int res; > + if (flag) > + res = *a + *b; > + else > + { > + res = *a; > + *a = 1; > + res += *b; > + } > + return res; > +} > + > +/* { dg-final { scan-tree-dump "HOIST inserted: 3" "pre" } } */ > +/* { dg-final { scan-tree-dump-times " = \\\*" 2 "pre" } } */ > +/* { dg-final { scan-tree-dump-times " = \[^\r\n\]* \\\+ \[^\r\n\]*;" 1 > "pre" } } */ > diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc > index b1ceea90a8e..7bbfa5ac43d 100644 > --- a/gcc/tree-ssa-pre.cc > +++ b/gcc/tree-ssa-pre.cc > @@ -3625,8 +3625,9 @@ do_hoist_insertion (basic_block block) > > /* We have multiple successors, compute ANTIC_OUT by taking the > intersection > of all of ANTIC_IN translating through PHI nodes. Note we do not > have to > - worry about iteration stability here so just intersect the > expression sets > - as well. This is a simplification of what we do in > compute_antic_aux. */ > + worry about iteration stability here so just use the expression set > + from the first set and prune that by sorted_array_from_bitmap_set. > + This is a simplification of what we do in compute_antic_aux. */ > bitmap_set_t ANTIC_OUT = bitmap_set_new (); > bool first = true; > FOR_EACH_EDGE (e, ei, block->succs) > @@ -3641,15 +3642,10 @@ do_hoist_insertion (basic_block block) > bitmap_set_t tmp = bitmap_set_new (); > phi_translate_set (tmp, ANTIC_IN (e->dest), e); > bitmap_and_into (&ANTIC_OUT->values, &tmp->values); > - bitmap_and_into (&ANTIC_OUT->expressions, &tmp->expressions); > bitmap_set_free (tmp); > } > else > - { > - bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN > (e->dest)->values); > - bitmap_and_into (&ANTIC_OUT->expressions, > - &ANTIC_IN (e->dest)->expressions); > - } > + bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN (e->dest)->values); > } > > /* Compute the set of hoistable expressions from ANTIC_OUT. First > compute > -- > 2.35.3 > > --0000000000000940e205fc6fe225--