From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com
 [IPv6:2a00:1450:4864:20::52c])
 by sourceware.org (Postfix) with ESMTPS id 694153858400
 for <gcc-patches@gcc.gnu.org>; Thu, 11 Nov 2021 07:15:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 694153858400
Received: by mail-ed1-x52c.google.com with SMTP id r12so20457243edt.6
 for <gcc-patches@gcc.gnu.org>; Wed, 10 Nov 2021 23:15:19 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=mkKU/wQK4670eJ17QY3wQZnUhPHAACwklv/umxMColk=;
 b=AqujjnGGpjCeTw51MGHC/j/wwV1xaax7MzAGB1slmqPkPG53sTltpR5m3wP40Fy+Rx
 +WecrZx+KSgZ2Lrg9Pjo8/rz+P9EgLwxscsRqQ5LJcuk0JtiG1BIR8XTrTPJU9lqfs06
 3Cfutt72e1WiXyKqRxAglLhndUS28yrFZwxAqzEfxBX7zT7C7Z+6AA4h63pSrt5ztzd3
 L7thiu3vjt+pmlcvN3hTDfgLEkBwdcudPONP8/ue03ASbW2JwXP8wfyW6ZUWQmAAb/NN
 90bN/vaJg61R8ECypsDhZnaAMMB1Q6LdCjKWq7sTvIpDzU+2DrbA17IWEOhdW71/KG/0
 F3kA==
X-Gm-Message-State: AOAM5301ph4FmpfT4yG28JA4Xoodp2++z65JDRvkREOqXsgOGW4bWR66
 x09Dy91XcfEwZH7ZpSX1DFVNXeN222tjwps3qfc=
X-Google-Smtp-Source: ABdhPJxtFmhuCerIZRilEgYVKI9//Tjh4cVkLEG6RHYmH4vI2e8Uq6brV2ghSDYuhP+8Vh8eIKpL9rvacqDluBjRZ/0=
X-Received: by 2002:a50:e0c3:: with SMTP id j3mr7132314edl.97.1636614918007;
 Wed, 10 Nov 2021 23:15:18 -0800 (PST)
MIME-Version: 1.0
References: <f1eb9f80-fdc5-8f57-327a-d35db67c1a6c@suse.cz>
 <CAFiYyc0dd8Amz_cvXOOAjteLCX8j=waOmoz6pgmuMDeYVdx=iw@mail.gmail.com>
 <7791e850-f74d-8c1c-f67c-e02f3f6e007d@redhat.com>
 <5c6c91d4-ed8b-8d98-2cd9-bafc84e6f2a4@suse.cz>
 <8da24825-19ec-56a6-a68c-5c37c7acc3e1@redhat.com>
 <CAFiYyc1Bx66N4MvvtX8qF+DPQX_+dkDYo+78c_wA4CUoe9-OoA@mail.gmail.com>
 <59763e1a-8432-5f23-c399-a9b4dd6c6dff@suse.cz>
 <CAFiYyc3x9o4zgEn4aDRd4n__uJKTSG2WtpfUXfLyLr51v=mBgA@mail.gmail.com>
 <fc7be232-5932-0d6b-6344-cbf71dbbe5c7@suse.cz>
In-Reply-To: <fc7be232-5932-0d6b-6344-cbf71dbbe5c7@suse.cz>
From: Richard Biener <richard.guenther@gmail.com>
Date: Thu, 11 Nov 2021 08:15:06 +0100
Message-ID: <CAFiYyc0bhqJs4LMap7+M7dVyTrV-eQEjTac0hu4TfLaHWjiQNg@mail.gmail.com>
Subject: Re: [PATCH] Loop unswitching: support gswitch statements.
To: =?UTF-8?Q?Martin_Li=C5=A1ka?= <mliska@suse.cz>
Cc: Andrew MacLeod <amacleod@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Nov 2021 07:15:21 -0000

On Wed, Nov 10, 2021 at 2:29 PM Martin Li=C5=A1ka <mliska@suse.cz> wrote:
>
> On 11/10/21 09:59, Richard Biener wrote:
> > On Tue, Nov 9, 2021 at 5:44 PM Martin Li=C5=A1ka <mliska@suse.cz> wrote=
:
> >>
> >> On 11/9/21 14:37, Richard Biener wrote:
> >>> On Mon, Nov 8, 2021 at 8:45 PM Andrew MacLeod <amacleod@redhat.com> w=
rote:
> >>>>
> >>>> On 11/8/21 10:05 AM, Martin Li=C5=A1ka wrote:
> >>>>> On 9/28/21 22:39, Andrew MacLeod wrote:
> >>>>>> In Theory, modifying the IL should be fine, it happens already in
> >>>>>> places, but its not extensively tested under those conditions yet.
> >>>>>
> >>>>> Hello Andrew.
> >>>>>
> >>>>> I've just tried using a global gimple_ranger and it crashes when lo=
op
> >>>>> unswitching duplicates
> >>>>> some BBs.
> >>>>>
> >>>>> Please try the attached patch for:
> >>>>
> >>>> hey Martin,
> >>>>
> >>>> try using this in your tree.  Since nothing else is using a growing =
BB
> >>>> right now, I'll let you work with it and see if everything works as
> >>>> expected before checking it in, just in case we need more tweaking.
> >>>> With this,
> >>>>
> >>>> make RUNTESTFLAGS=3Ddg.exp=3Dloop-unswitch*.c check-gcc
> >>>>
> >>>> runs clean.
> >>>>
> >>>>
> >>>> basically, I tried to grow it by either a factor of 10% for the curr=
ent
> >>>> BB size when the grow is requested, or some double the needed extra
> >>>> size, or 128... whichever value is "maximum"    That means it shoudn=
t be
> >>>> asking for tooo much each time, but also not a minimum amount.
> >>>>
> >>>> Im certainly open to suggestion on how much to grow it each time.
> >>>> Note the vector being grown is ONLY fo the SSA_NAme being asked for.=
. so
> >>>> it really an on-demand thing just for specific names, in your case,
> >>>> mostly just the switch index.
> >>>>
> >>>> Let me know how this works for you, and if you have any other issues=
.
> >>>
> >>> So I think in the end we shouldn't need the growing.  Ideally we'd do=
 all
> >>> the analysis before the first transform, but for that we'd need range=
r to
> >>> be able to "simplify" conditions based on a known true/false predicat=
e
> >>> that's not yet in the IL.  Consider
> >>>
> >>>    for (;;)
> >>>      {
> >>>           if (invariant < 3) // A
> >>>             {
> >>> ...
> >>>             }
> >>>           if (invariant < 5) // B
> >>>             {
> >>> ...
> >>>             }
> >>>      }
> >>>
> >>> unswitch analysis will run into the condition 'A' and determine the l=
oop
> >>> can be unswitched with the condition 'invariant < 3'.  To be able to
> >>> perform cost assessment and to avoid redundant unswitching we
> >>> want to determine that if we unswitch with 'invariant < 3' being
> >>> true then the condition at 'B' is true as well before actually insert=
ing
> >>> the if (invariant < 3) outside of the loop.
> >>>
> >>> So I'm thinking of assigning a gimple_uid to each condition we want t=
o
> >>> unswitch on and have an array indexed by the uid with meta-data on
> >>> the unswitch opportunity, the "related" conditions could be marked wi=
th
> >>> the same uid (or some other), and the folding result recorded so that
> >>> at transform time we can just do the appropriate replacement without
> >>> invoking ranger again.
> >>
> >> Calculating all this before transformation is quite ambitious based on=
 the code
> >> we have now.
> >>
> >> Note one can have in a loop:
> >>
> >> if (a > 100)
> >>      ...
> >>
> >> switch (a)
> >>      case 1000:
> >>        ...
> >>      case 20:
> >>        ...
> >>      case 200:
> >>        ...
> >>
> >> which means the first predicate effectively makes some cases unreachab=
le. Moreover
> >> one can have
> >>
> >> if (a > 100 && b < 300)
> >>      ...
> >>
> >> and more complex conditions.
> >
> > True - I guess we should do two things.
>
> All right.
>
> >
> >   1) keep simplify_using_entry_checks like code for symbolic conditions
> >   2) add integer ranges for unswitch conditions producing them, that
> >       includes all unswitching of switch stmts - we might be able to us=
e
> >       the ranger queries (with global ranges) to simplify stmts with th=
e
> >       known ranges as noted by Andrew
> >
> > I do think that pre-computing the simplifications is what we should do
> > to be able to make the cost modeling sane.  What we can avoid
> > trying is evaluating multiple unswitch possibilities to pick the "best"=
.
>
> So the first step would be taking all unswitching candidates (gconds basi=
cally)
> and grouping them (all items in a group would fold to true edge in the un=
switched loop).
> Is it something we want to do combining simplify_using_entry_checks and f=
old_range ranger
> capability?

The first step is to, after finding the first condition to unswitch on, con=
tinue
scanning the loop body for related conditions that will simplify to be able=
 to
assess the true cost of the unswitching more readily and to prepare the
simplification step after the loop copying.

That is, the first step is to "fix" the cost modeling.

I wouldn't go as far as trying to discover all unswitching opportunities at=
 once
yet - even though the far goal would be to maybe do that to discover the mo=
st
profitable / cheapest.

If you look at simplify_using_entry_checks then this is really really simpl=
e,
so I'd try to abstract this, recording sth like a unswitch_predicate where
we store the condition we unswitch on plus maybe cache the constant
range of a VAR cmp CST variable condition on the true/false edge.  We
can then try to simplify each gcond/gswitch based on such an unswitch_predi=
cate
(when we ever scan the loop once to discover all opportunities we'd have a
set of unswitch_predicates to try simplifying against).  As said the intege=
r
range thing would be an improvement over the current state so even that
can be done as followup but I guess for gswitch support that's going to be
the thing to use.

So I'd try to do no functional change first, improving the costing and
setting up the transform to simply pick up the stmts to "fold" as discovere=
d
during analysis (as I hinted you possibly can use gimple_uid to mark
the stmts that simplify, IIRC gimple_uid is preserved during copying.
gimple_uid would also scale better than gimple_plf in case we do
the analysis for all candidates at once).

As for costing in the end we should probably assign a budget based on
the original loop and eat from that for all unswitchings done on it.

> >
> > I think changing the code do to the analysis first should be done
> > before wiring in gcond support, even adding the additional 'range'
>
> s/gcond/switch, right?

Eh, yes - obviously ;)

>
> > capability will be useful without that since the current code
> > wont figure out a > 5 is true when we unswitch on a > 3.
>
> Agree that gswitch support should be added later.
>
> Martin
>
> >
> >>>
> >>> Now, but how do we arrange for the ranger analysis here?
> >>
> >> That's likely something we need support from ranger, yes.
> >>
> >>>
> >>> We might also somehow want to remember that on the
> >>> 'invariant < 3' =3D=3D false copy of the loop there's still the
> >>> unswitching opportunity on 'invariant < 5', but not on the
> >>> 'invariant < 5' =3D=3D true copy.
> >>>
> >>> Currently unswitching uses a custom simplify_using_entry_checks
> >>> which tries to do simplification only after the fact (and so costing
> >>> also is far from costing the true cost and ordering of the opportunit=
ies
> >>> to do the best first is not implemented either).
> >>
> >> I'm sending updated version of the patch where I changed:
> >> - simplify_using_entry_checks is put back for the floating point expre=
ssions
> >> - all scans utilize scan-tree-dump-times
> >> - some new tests were added
> >> - global ranger is used (I rely on the growing patch provided by Andre=
w)
> >> - better ranger API is used for gcond expression: ranger.range_of_stmt=
 (r, stmt) && r.singleton_p (&result))
> >> - auto_edge_flag is used now
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Thoughts?
> >> Martin
> >>
> >>>
> >>> Richard.
> >>>
> >>>> Andrew
> >>>>
>