From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-519337-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 33912 invoked by alias); 11 Feb 2020 13:58:52 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 33899 invoked by uid 89); 11 Feb 2020 13:58:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.2 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.1 spammy=H*MI:sk:1ac9813, H*f:sk:1ac9813, H*i:sk:1ac9813, classical
X-HELO: mx2.suse.de
Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Feb 2020 13:58:50 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id EAF0CBCC3;	Tue, 11 Feb 2020 13:58:47 +0000 (UTC)
Date: Tue, 11 Feb 2020 13:58:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Roman Zhuykov <zhroma@ispras.ru>
cc: Segher Boessenkool <segher@kernel.crashing.org>,     "Kewen.Lin" <linkw@linux.ibm.com>, GCC Patches <gcc-patches@gcc.gnu.org>,     Bill Schmidt <wschmidt@linux.ibm.com>,     "bin.cheng" <bin.cheng@linux.alibaba.com>
Subject: Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
In-Reply-To: <1ac98132-734e-0ee3-5ea2-7ec256ee92d2@ispras.ru>
Message-ID: <nycvar.YFH.7.76.2002111456590.18835@zhemvz.fhfr.qr>
References: <ddd8c186-fc88-96df-b1c0-f99edec654f2@linux.ibm.com> <20200120123332.GV3191@gate.crashing.org> <52c8eecc-3383-81ad-70ce-27c149d7a103@linux.ibm.com> <20200210212910.GL22482@gate.crashing.org> <nycvar.YFH.7.76.2002110829340.18835@zhemvz.fhfr.qr> <20200211074859.GV22482@gate.crashing.org> <nycvar.YFH.7.76.2002110850570.18835@zhemvz.fhfr.qr> <1ac98132-734e-0ee3-5ea2-7ec256ee92d2@ispras.ru>
User-Agent: Alpine 2.21 (LSU 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="-1609908220-1264932560-1581429527=:18835"
X-SW-Source: 2020-02/txt/msg00647.txt.bz2

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1609908220-1264932560-1581429527=:18835
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
Content-length: 4298

On Tue, 11 Feb 2020, Roman Zhuykov wrote:

> 11.02.2020 11:01, Richard Biener wrote:
> > On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> >
> >> On Tue, Feb 11, 2020 at 08:34:15AM +0100, Richard Biener wrote:
> >>> On Mon, 10 Feb 2020, Segher Boessenkool wrote:
> >>>> Yes, we should decide how often we want to unroll things somewhere before
> >>>> ivopts already, and just use that info here.
> >>>>
> >>>> Or are there advantage to doing it *in* ivopts?  It sounds like doing
> >>>> it there is probably expensive, but maybe not, and we need to do similar
> >>>> analysis there anyway.
> >>> Well, if the only benefit of doing the unrolling is that IVs get
> >>> cheaper then yes, IVOPTs should drive it.
> >> We need to know much earlier in the pass pipeline how often a loop will
> >> be unrolled.  We don't have to *do* it early.
> >>
> >> If we want to know it before ivopts, then obviously it has to be done
> >> earlier.  Otherwise, maybe it is a good idea to do it in ivopts itself.
> >> Or maybe not.  It's just an idea :-)
> >>
> >> We know we do not want it *later*, ivopts needs to know this to make
> >> good decisions of its own.
> >>
> >>> But usually unrolling exposes redundancies (catched by predictive
> >>> commoning which drives some unrolling) or it enables better use
> >>> of CPU resources via scheduling (only catched later in RTL).
> >>> For scheduling we have the additional complication that the RTL
> >>> side doesn't have as much of a fancy data dependence analysis
> >>> framework as on the GIMPLE side.  So I'd put my bet on trying to
> >>> move something like SMS to GIMPLE and combine it with unrolling
> >>> (IIRC SMS at most interleaves 1 1/2 loop iterations).
> To clarify, without specifying -fmodulo-sched-allow-regmoves it only
> interleaves 2 iterations.Â  With register moves enabled more iterations
> can be considered.
> > SMS on RTL always was quite disappointing...
> Hmm, even when trying to move it just few passes earlier many years ago,
> got another opinion:
> https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html
> Although without such a move we still have annoying issues which RTL
> folks can't solve, see e.q.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264#c2
> > It originally came with "data dependence export from GIMPLE to RTL"
> > that never materialized so I'm not surprised ;)  It also relies
> > on doloop detection.
> My current attempt to drop doloop dependency is still WIP, hopefully
> I'll create branch in refs/users/ in a month or so.Â  But older (gcc-7
> and earlier) versions are available, see
> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html
> Doloops are still supported for some kind of backward compatibility, but
> much more loops (which loop-iv can analyze) are considered in new SMS.
> >> Do you expect it will be more useful on Gimple?  Moving it there is a 
> >> good idea in any case ;-)
> >>
> >> I don't quite see the synergy between SMS and loop unrolling, but maybe
> >> I need to look harder.
> > As said elsewhere I don't believe in actual unrolling doing much good
> > but in removing data dependences in the CPU pipeline.  SMS rotates
> > the loop, peeling N iterations (and somehow I think for N > 1 that
> > should better mean unrolling the loop body).
> Yes, this is what theory tells us.
> > Of course doing "scheduling" on GIMPLE is "interesting" in its own
> > but OTOH our pipeline DFAs are imprecise enough that one could even
> > devise some basic GIMPLE <-> "RTL" mapping to make use of it.  But
> > then scheduling without IVs or register pressure in mind is somewhat
> > pointless as well.
> Unfortunately, even with -fmodulo-sched-allow-regmoves it doesn't
> interact much with register pressure.
> > That said - if I had enough time I'd still thing that investigating
> > "scheduling on GIMPLE" as replacement for sched1 is an interesting
> > thing to do.
> Sound good, but IMHO modulo scheduler is not the best choice to be the
> first step implementing such a concept.

True ;)   But since the context of this thread is unrolling ...
Not sure how you'd figure the unroll factor to apply if you want
to do unrolling within a classical scheduling framework?  Maybe
unroll as much as you can fill slots until the last instruction
of the first iteration retires?

Richard.
---1609908220-1264932560-1581429527=:18835--