On Tue, 11 Feb 2020, Roman Zhuykov wrote:

> 11.02.2020 11:01, Richard Biener wrote:
> > On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> >
> >> On Tue, Feb 11, 2020 at 08:34:15AM +0100, Richard Biener wrote:
> >>> On Mon, 10 Feb 2020, Segher Boessenkool wrote:
> >>>> Yes, we should decide how often we want to unroll things somewhere before
> >>>> ivopts already, and just use that info here.
> >>>>
> >>>> Or are there advantage to doing it *in* ivopts?  It sounds like doing
> >>>> it there is probably expensive, but maybe not, and we need to do similar
> >>>> analysis there anyway.
> >>> Well, if the only benefit of doing the unrolling is that IVs get
> >>> cheaper then yes, IVOPTs should drive it.
> >> We need to know much earlier in the pass pipeline how often a loop will
> >> be unrolled.  We don't have to *do* it early.
> >>
> >> If we want to know it before ivopts, then obviously it has to be done
> >> earlier.  Otherwise, maybe it is a good idea to do it in ivopts itself.
> >> Or maybe not.  It's just an idea :-)
> >>
> >> We know we do not want it *later*, ivopts needs to know this to make
> >> good decisions of its own.
> >>
> >>> But usually unrolling exposes redundancies (catched by predictive
> >>> commoning which drives some unrolling) or it enables better use
> >>> of CPU resources via scheduling (only catched later in RTL).
> >>> For scheduling we have the additional complication that the RTL
> >>> side doesn't have as much of a fancy data dependence analysis
> >>> framework as on the GIMPLE side.  So I'd put my bet on trying to
> >>> move something like SMS to GIMPLE and combine it with unrolling
> >>> (IIRC SMS at most interleaves 1 1/2 loop iterations).
> To clarify, without specifying -fmodulo-sched-allow-regmoves it only
> interleaves 2 iterations.Â  With register moves enabled more iterations
> can be considered.
> > SMS on RTL always was quite disappointing...
> Hmm, even when trying to move it just few passes earlier many years ago,
> got another opinion:
> https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html
> Although without such a move we still have annoying issues which RTL
> folks can't solve, see e.q.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264#c2
> > It originally came with "data dependence export from GIMPLE to RTL"
> > that never materialized so I'm not surprised ;)  It also relies
> > on doloop detection.
> My current attempt to drop doloop dependency is still WIP, hopefully
> I'll create branch in refs/users/ in a month or so.Â  But older (gcc-7
> and earlier) versions are available, see
> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html
> Doloops are still supported for some kind of backward compatibility, but
> much more loops (which loop-iv can analyze) are considered in new SMS.
> >> Do you expect it will be more useful on Gimple?  Moving it there is a 
> >> good idea in any case ;-)
> >>
> >> I don't quite see the synergy between SMS and loop unrolling, but maybe
> >> I need to look harder.
> > As said elsewhere I don't believe in actual unrolling doing much good
> > but in removing data dependences in the CPU pipeline.  SMS rotates
> > the loop, peeling N iterations (and somehow I think for N > 1 that
> > should better mean unrolling the loop body).
> Yes, this is what theory tells us.
> > Of course doing "scheduling" on GIMPLE is "interesting" in its own
> > but OTOH our pipeline DFAs are imprecise enough that one could even
> > devise some basic GIMPLE <-> "RTL" mapping to make use of it.  But
> > then scheduling without IVs or register pressure in mind is somewhat
> > pointless as well.
> Unfortunately, even with -fmodulo-sched-allow-regmoves it doesn't
> interact much with register pressure.
> > That said - if I had enough time I'd still thing that investigating
> > "scheduling on GIMPLE" as replacement for sched1 is an interesting
> > thing to do.
> Sound good, but IMHO modulo scheduler is not the best choice to be the
> first step implementing such a concept.

True ;)   But since the context of this thread is unrolling ...
Not sure how you'd figure the unroll factor to apply if you want
to do unrolling within a classical scheduling framework?  Maybe
unroll as much as you can fill slots until the last instruction
of the first iteration retires?

Richard.