From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 33912 invoked by alias); 11 Feb 2020 13:58:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 33899 invoked by uid 89); 11 Feb 2020 13:58:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.2 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.1 spammy=H*MI:sk:1ac9813, H*f:sk:1ac9813, H*i:sk:1ac9813, classical X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Feb 2020 13:58:50 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id EAF0CBCC3; Tue, 11 Feb 2020 13:58:47 +0000 (UTC) Date: Tue, 11 Feb 2020 13:58:00 -0000 From: Richard Biener To: Roman Zhuykov cc: Segher Boessenkool , "Kewen.Lin" , GCC Patches , Bill Schmidt , "bin.cheng" Subject: Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling In-Reply-To: <1ac98132-734e-0ee3-5ea2-7ec256ee92d2@ispras.ru> Message-ID: References: <20200120123332.GV3191@gate.crashing.org> <52c8eecc-3383-81ad-70ce-27c149d7a103@linux.ibm.com> <20200210212910.GL22482@gate.crashing.org> <20200211074859.GV22482@gate.crashing.org> <1ac98132-734e-0ee3-5ea2-7ec256ee92d2@ispras.ru> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1609908220-1264932560-1581429527=:18835" X-SW-Source: 2020-02/txt/msg00647.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609908220-1264932560-1581429527=:18835 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-length: 4298 On Tue, 11 Feb 2020, Roman Zhuykov wrote: > 11.02.2020 11:01, Richard Biener wrote: > > On Tue, 11 Feb 2020, Segher Boessenkool wrote: > > > >> On Tue, Feb 11, 2020 at 08:34:15AM +0100, Richard Biener wrote: > >>> On Mon, 10 Feb 2020, Segher Boessenkool wrote: > >>>> Yes, we should decide how often we want to unroll things somewhere before > >>>> ivopts already, and just use that info here. > >>>> > >>>> Or are there advantage to doing it *in* ivopts? It sounds like doing > >>>> it there is probably expensive, but maybe not, and we need to do similar > >>>> analysis there anyway. > >>> Well, if the only benefit of doing the unrolling is that IVs get > >>> cheaper then yes, IVOPTs should drive it. > >> We need to know much earlier in the pass pipeline how often a loop will > >> be unrolled. We don't have to *do* it early. > >> > >> If we want to know it before ivopts, then obviously it has to be done > >> earlier. Otherwise, maybe it is a good idea to do it in ivopts itself. > >> Or maybe not. It's just an idea :-) > >> > >> We know we do not want it *later*, ivopts needs to know this to make > >> good decisions of its own. > >> > >>> But usually unrolling exposes redundancies (catched by predictive > >>> commoning which drives some unrolling) or it enables better use > >>> of CPU resources via scheduling (only catched later in RTL). > >>> For scheduling we have the additional complication that the RTL > >>> side doesn't have as much of a fancy data dependence analysis > >>> framework as on the GIMPLE side. So I'd put my bet on trying to > >>> move something like SMS to GIMPLE and combine it with unrolling > >>> (IIRC SMS at most interleaves 1 1/2 loop iterations). > To clarify, without specifying -fmodulo-sched-allow-regmoves it only > interleaves 2 iterations.  With register moves enabled more iterations > can be considered. > > SMS on RTL always was quite disappointing... > Hmm, even when trying to move it just few passes earlier many years ago, > got another opinion: > https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html > Although without such a move we still have annoying issues which RTL > folks can't solve, see e.q. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264#c2 > > It originally came with "data dependence export from GIMPLE to RTL" > > that never materialized so I'm not surprised ;) It also relies > > on doloop detection. > My current attempt to drop doloop dependency is still WIP, hopefully > I'll create branch in refs/users/ in a month or so.  But older (gcc-7 > and earlier) versions are available, see > https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html > Doloops are still supported for some kind of backward compatibility, but > much more loops (which loop-iv can analyze) are considered in new SMS. > >> Do you expect it will be more useful on Gimple? Moving it there is a > >> good idea in any case ;-) > >> > >> I don't quite see the synergy between SMS and loop unrolling, but maybe > >> I need to look harder. > > As said elsewhere I don't believe in actual unrolling doing much good > > but in removing data dependences in the CPU pipeline. SMS rotates > > the loop, peeling N iterations (and somehow I think for N > 1 that > > should better mean unrolling the loop body). > Yes, this is what theory tells us. > > Of course doing "scheduling" on GIMPLE is "interesting" in its own > > but OTOH our pipeline DFAs are imprecise enough that one could even > > devise some basic GIMPLE <-> "RTL" mapping to make use of it. But > > then scheduling without IVs or register pressure in mind is somewhat > > pointless as well. > Unfortunately, even with -fmodulo-sched-allow-regmoves it doesn't > interact much with register pressure. > > That said - if I had enough time I'd still thing that investigating > > "scheduling on GIMPLE" as replacement for sched1 is an interesting > > thing to do. > Sound good, but IMHO modulo scheduler is not the best choice to be the > first step implementing such a concept. True ;) But since the context of this thread is unrolling ... Not sure how you'd figure the unroll factor to apply if you want to do unrolling within a classical scheduling framework? Maybe unroll as much as you can fill slots until the last instruction of the first iteration retires? Richard. ---1609908220-1264932560-1581429527=:18835--