From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aldyh@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id A81113858C50
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Aug 2022 08:46:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A81113858C50
Received: from mail-oa1-f72.google.com (mail-oa1-f72.google.com
 [209.85.160.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-557-fQmFFQvhONai-0JpdsVO_g-1; Wed, 17 Aug 2022 04:46:25 -0400
X-MC-Unique: fQmFFQvhONai-0JpdsVO_g-1
Received: by mail-oa1-f72.google.com with SMTP id
 586e51a60fabf-11c1fdd025cso1448676fac.0
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Aug 2022 01:46:24 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc;
 bh=cgOLwUFvSjNayz6PqeGWW3NWga+fq7FdS5OCkUJDD/Q=;
 b=lsw8Y8LYH9GrxPSAbB6OHFsBv/Corb4wD8Nr2RbY12IPdS3sBMNk3YkFoBHLZCQohY
 18cLEkIaILTZsNMUA+N1b8TRK7wvGiC+bwICOYDspqE/mq9pgn5nt69tL/jEVb930EHM
 6UXNReH0K1WXQcvH9s9208V/MJRMNnd1Y0d+ECpXksVEnd3alF5H0W9EW1415Jo4DNpy
 utQQs9LBIu/MV8OkVVzksfBTXkHfib4qPB3iooZ1Py9JcwYGQG1VsLBy+T18ir0yDIfq
 xUsUZiUa7zv9FSu0Y8Zb1E2TaeFrH4V/mXswsiEAV87HUQk28kAvLniP+Uan+AA6GFVN
 UpoA==
X-Gm-Message-State: ACgBeo3jMMjpymIVmLCsi+uJ95coOUSCMRbz/AwPWeAy8875JvvX4jwN
 uxNXx1nZ4yyhec9phYSXWVIn5ygnzGcUkEuewYujQeB1CjFBANiY9B7rR4oeIrzQo2hX1XFlnXq
 t2EtyuRv5te2iSfIpiw/IY78Agi8YN45nVw==
X-Received: by 2002:a05:6870:210b:b0:10b:ed11:4e2d with SMTP id
 f11-20020a056870210b00b0010bed114e2dmr1208967oae.265.1660725984175; 
 Wed, 17 Aug 2022 01:46:24 -0700 (PDT)
X-Google-Smtp-Source: AA6agR4p855qCKBxK87j8AJGfYnxCtBvMMUAnN+GDc7WFY253J9x7sbpFm7tXjoFy/iOZATNZXm90gBQQ5jGAor127A=
X-Received: by 2002:a05:6870:210b:b0:10b:ed11:4e2d with SMTP id
 f11-20020a056870210b00b0010bed114e2dmr1208957oae.265.1660725983896; Wed, 17
 Aug 2022 01:46:23 -0700 (PDT)
MIME-Version: 1.0
References: <81970.122081610045900133@us-mta-26.us.mimecast.lan>
 <CAGm3qMWV3RoEKA3vRChHunqLp+_GjW28Rd052ELSe8ZzyHeRFQ@mail.gmail.com>
 <nycvar.YFH.7.77.849.2208170746280.13569@jbgna.fhfr.qr>
 <CAGm3qMUmAbAd5aGNERR_BX6Vi9MrrhK95qRg3tD6dXTKVKumBQ@mail.gmail.com>
 <nycvar.YFH.7.77.849.2208170817090.13569@jbgna.fhfr.qr>
In-Reply-To: <nycvar.YFH.7.77.849.2208170817090.13569@jbgna.fhfr.qr>
From: Aldy Hernandez <aldyh@redhat.com>
Date: Wed, 17 Aug 2022 10:46:12 +0200
Message-ID: <CAGm3qMX5eW6ZT+WacH4bf0Hd50+1Zh+tZCFSgBjd60FOubTJiA@mail.gmail.com>
Subject: Re: [PATCH] Refactor back_threader_profitability
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2022 08:46:31 -0000

On Wed, Aug 17, 2022 at 10:38 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 17 Aug 2022, Aldy Hernandez wrote:
>
> > On Wed, Aug 17, 2022 at 9:54 AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Wed, 17 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > I just have a few high level comments.
> > > >
> > > > On Tue, Aug 16, 2022 at 4:05 PM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > The following refactors profitable_path_p in the backward threader,
> > > > > splitting out parts that can be computed once the exit block is known,
> > > > > parts that contiguously update and that can be checked allowing
> > > > > for the path to be later identified as FSM with larger limits,
> > > > > possibly_profitable_path_p, and final checks done when the whole
> > > > > path is known, profitable_path_p.
> > > >
> > > > I thought we were removing references to FSM, as they were leftovers
> > > > from some previous incarnation.  For that matter, I don't think I ever
> > > > understood what they are, so if we're gonna keep them, could you
> > > > comment what makes FSM threads different from other threads?
> > >
> > > I don't know exactly what 'FSM' stands for but the FSM threader
> > > specifically tried to cover
> >
> > It probably stands for finite state machine then?
> >
> > >
> > > for (;;)
> > >  {
> > >    switch (state)
> > >     {
> > >     case 1:
> > >       /* sth */
> > >       state = 3;
> > >       break;
> > >     ...
> > >     case 3:
> > >       ...
> > >     }
> > > }
> > >
> > > so optimizing state machine transitions.  That is, these are all
> > > multiway (switch or goto, but goto support has been removed with the
> > > ranger rewrite it seems) and the thread path going through the
> >
> > ISTR going through the computed goto path in the old code, and it
> > never got triggered.  I just didn't get around to removing the
> > references to GIMPLE_GOTO.  At least, the old threader not once got a
> > thread we didn't get with the new code, even through a full Fedora
> > build.  I could be wrong though, but that's my recollection.
>
> When one massages the threader to even consider gotos we eventually
> run into find_taken_edge not handling them because range_of_expr
> computes the range of 'state' as
>
> [irange] void * [1, +INF]$4 = void
>
> rather than &&E.
>
> A testcase would be
>
> unsigned g;
> void FSM (int start)
> {
>   void *states[] = { &&A, &&B, &&C, &&E };
>   void *state = states[start];
>
>   do {
>   goto *state;
>
> A:
>   g += 1;
>   state = g & 1 ? &&B : &&E;
>   continue;
>
> B:
>   g += 2;
>   state = &&C;
>   continue;
>
> C:
>   g += 3;
>   state = states[g & 3];
>   continue;
>
> E:
>   break;
>   } while (1);
> }
>
> where we'd expect to thread the B -> C state transition.  I checked
> gcc 7 and it doesn't do that - not sure if it broke somewhen
> on the way or if it was just never working.  I'll file a bugreport,
> OTOH &label and goto *p are both GNU extensions, so not sure if it
> is worth optimizing.  I'll attach the "simple" enabler I have.

Yeah, that's what I thought.  I seem to remember tracing through the
old code and never finding a path that would handle computed gotos.
Either way, thanks for distilling a testcase.  If you could file a PR
and CC me on it, that'd be great.

When I contribute prange (irange for pointers), the plan is to have it
handle pointer equivalences, and we should be able to get the address
from the prange itself.

Aldy

>
> > > loop backedge.  This is why FSM has different size limits because
> > > it was thought those threadings are especially worthwhile and
> > > they tend to be larger.  To make discovery cheaper a TODO item
> > > would be to skip to the loop backedge when we reach the regular
> > > thread limit and only continue the real search there (and
> > > pick the "smallest" ways through any diamonds on the way).
> >
> > Ah, thanks.  If you could, add some similar blurb, it'd be great.  I'm
> > sure we'll all forget it at some time (well, I will :)).
>
> Sure ;)
>
> Richard.
>
> > Aldy
> >
> > >
> > > > In your possibly_profitable_path_p function, could you document a bit
> > > > better what's the difference between profitable_path_p and
> > > > possibly_profitable_path_p?
> > >
> > > Sure.
> > >
> > > > >
> > > > > I've removed the back_threader_profitability instance from the
> > > > > back_threader class and instead instantiate it once per path
> > > > > discovery.  I've kept the size compute non-incremental to simplify
> > > > > the patch and not worry about unwinding.
> > > > >
> > > > > There's key changes to previous behavior - namely we apply
> > > > > the param_max_jump_thread_duplication_stmts early only when
> > > > > we know the path cannot become an FSM one (multiway + thread through
> > > > > latch) but make sure to elide the path query when we we didn't
> > > > > yet discover that but are over this limit.  Similarly the
> > > > > speed limit is now used even when we did not yet discover a
> > > > > hot BB on the path.  Basically the idea is to only stop path
> > > > > discovery when we know the path will never become profitable
> > > > > but avoid the expensive path range query when we know it's
> > > > > currently not.
> > > > >
> > > > > I've done a few cleanups, merging functions, on the way.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > >
> > > > > Statistics show an overall slight increase in threading but
> > > > > looking at different files theres noise up and down.  That's
> > > > > somewhat expected since we now are applying the "more correct"
> > > > > limits in the end.  Unless I made big mistakes of course.
> > > > >
> > > > > The next thing cost-wise would be to raise the backwards
> > > > > threading limit to the limit of DOM so we don't get
> > > > > artificial high counts for that.
> > > >
> > > > The DOM threader has limits?  I thought most of those limits were just
> > > > due to the fact that it couldn't determine long enough paths?  Either
> > > > way, I like that we're merging the necessary forward threader bits
> > > > here, in preparation for its demise ;-).
> > >
> > > Both use param_max_jump_thread_duplication_stmts, but the backwards
> > > threader applies this limit to non-FSM threads with
> > >
> > >   /* The generic copier used by the backthreader does not re-use an
> > >      existing threading path to reduce code duplication.  So for that
> > >      case, drastically reduce the number of statements we are allowed
> > >      to copy.  */
> > >   if (!(threaded_through_latch && threaded_multiway_branch)
> > >       && (n_insns * param_fsm_scale_path_stmts
> > >           >= param_max_jump_thread_duplication_stmts))
> > >
> > > and param_fsm_scale_path_stmts happens to be two.  I have no idea
> > > why we apply this scaling, the scaling is otherwise used in
> > >
> > >   /* We avoid creating irreducible inner loops unless we thread through
> > >      a multiway branch, in which case we have deemed it worth losing
> > >      other loop optimizations later.
> > >
> > >      We also consider it worth creating an irreducible inner loop if
> > >      the number of copied statement is low relative to the length of
> > >      the path -- in that case there's little the traditional loop
> > >      optimizer would have done anyway, so an irreducible loop is not
> > >      so bad.  */
> > >   if (!threaded_multiway_branch
> > >       && creates_irreducible_loop
> > >       && *creates_irreducible_loop
> > >       && (n_insns * (unsigned) param_fsm_scale_path_stmts
> > >           > (m_path.length () *
> > >              (unsigned) param_fsm_scale_path_blocks)))
> > >
> > > so my idea is to drop the scaling from applying the
> > > param_max_jump_thread_duplication_stmts limit which raises the
> > > effective default limit from 15 / 2 to 15, just like what DOM
> > > applies (where DOM also estimates some optimization on the path,
> > > reducing the stmt count).
> > >
> > > > Looks good.
> > >
> > > Thanks - I'll adjust the commentary and push.
> > >
> > > Richard.
> > >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
>