Re: [PATCH/RFC] Add a new memory gathering optimization for loop (PR98598)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Feng Xue OS <fxue@os.amperecomputing.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH/RFC] Add a new memory gathering optimization for loop (PR98598)
Date: Fri, 30 Apr 2021 09:37:56 +0200	[thread overview]
Message-ID: <CAFiYyc1Y5xb3cvjU-8yK+X8sz7FO3eoznTSdkvQaoKRtm6ddQg@mail.gmail.com> (raw)
In-Reply-To: <MWHPR01MB2304C068CB1CBA463DB2AA9BF7A10@MWHPR01MB2304.prod.exchangelabs.com>

On Thu, Jan 21, 2021 at 7:28 AM Feng Xue OS via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch implements a new loop optimization according to the proposal
> in RFC given at https://gcc.gnu.org/pipermail/gcc/2021-January/234682.html.
> So do not repeat the idea in this mail. Hope your comments on it.
>
> Bootstrapped/regtested on x86_64-linux and aarch64-linux.
>
> Thanks,
> Feng
>
> ----
> 2020-12-25  Feng Xue  <fxue@os.amperecomputing.com>
>
> gcc/
>         PR tree-optimization/98598
>         * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o.
>         * common.opt (-ftree-loop-mgo): New option.

Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a good
name.  -floop-mgo would be better but still I'd have no idea what this would do.

I don't have a good suggestion here other than to expand it to
-floop-gather-memory (?!).

The option documentation isn't informative either.

From:

  outer-loop ()
    {
      inner-loop (iter, iter_count)
        {
          Type1 v1 = LOAD (iter);
          Type2 v2 = LOAD (v1);
          Type3 v3 = LOAD (v2);
          ...
          iter = NEXT (iter);
        }
    }

To:

  typedef struct cache_elem
    {
      bool   init;
      Type1  c_v1;
      Type2  c_v2;
      Type3  c_v3;
    } cache_elem;

  cache_elem *cache_arr = calloc (iter_count, sizeof (cache_elem));

  outer-loop ()
    {
      size_t cache_idx = 0;

      inner-loop (iter, iter_count)
        {
          if (!cache_arr[cache_idx]->init)
            {
              v1 = LOAD (iter);
              v2 = LOAD (v1);
              v3 = LOAD (v2);

              cache_arr[cache_idx]->init = true;
              cache_arr[cache_idx]->c_v1 = v1;
              cache_arr[cache_idx]->c_v2 = v2;
              cache_arr[cache_idx]->c_v3 = v3;
            }
          else
            {
              v1 = cache_arr[cache_idx]->c_v1;
              v2 = cache_arr[cache_idx]->c_v2;
              v3 = cache_arr[cache_idx]->c_v3;
            }
          ...
          cache_idx++;
          iter = NEXT (iter);
        }
    }

  free (cache_arr);

This is a _very_ special transform.  What it seems to do is
optimize the dependent loads for outer loop iteration n > 1
by caching the result(s).  If that's possible then you should
be able to distribute the outer loop to one doing the caching
and one using the cache.  Then this transform would be more
like a tradidional array expansion of scalars?  In some cases
also loop interchange could remove the need for the caching.

Doing MGO as the very first loop pass thus looks bad, I think
MGO should be much later, for example after interchange.
I also think that MGO should work in concert with loop
distribution (which could have an improved cost model)
rather than being a separate pass.

Your analysis phase looks quite expensive, building sth
like a on-the side representation very closely matching SSA.
It seems to work from PHI defs to uses, which looks backwards.

You seem to roll your own dependence analysis code :/  Please
have a look at loop distribution.

Also you build an actual structure type for reasons that escape
me rather than simply accessing the allocated storage at
appropriate offsets.

I think simply calling 'calloc' isn't OK because you might need
aligned storage and because calloc might not be available.
Please at least use 'malloc' and make sure MALLOC_ABI_ALIGNMENT
is large enough for the data you want to place (or perform
dynamic re-alignment yourself).  We probably want some generic
middle-end utility to obtain aligned allocated storage at some
point.

As said above I think you want to re-do this transform as
a loop distribution transform.  I think if caching works then
the loads should be distributable and the loop distribution
transform should be enhanced to expand the scalars to arrays.

Richard.



>         * opts.c (default_options_table): Enable -ftree-loop-mgo at -O3+.
>         * params.opt (mgo-max-dep-load-level): New parameter.
>         (mgo-max-cache-elem-size): Likewise.
>         (mgo-min-cache-array-length): Likewise.
>         (mgo-max-cache-array-length): Likewise.
>         * doc/invoke.texi (mgo-max-dep-load-level): Document new parameter.
>         (mgo-max-cache-elem-size): Likewise.
>         (mgo-min-cache-array-length): Likewise.
>         (mgo-max-cache-array-length): Likewise.
>         * passes.def: Add new pass_loop_mgo pass.
>         * timevar.def (TV_LOOP_MGO): New timevar.
>         * tree-pass.h (make_pass_loop_mgo): New declaration.
>         * tree-ssa-loop-mgo.c: New file.
>
> gcc/testsuite/
>         PR tree-optimization/98598
>         * gcc.dg/tree-ssa/mgo/mgo.exp: New file.
>         * gcc.dg/tree-ssa/mgo/mgo-common.h: New test header.
>         * gcc.dg/tree-ssa/mgo/list/list-1.c: New test.
>         * gcc.dg/tree-ssa/mgo/array/simple-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/simple-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/simple-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/param-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/param-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/param-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-4.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-5.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-6.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-7.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-8.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-9.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-10.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-4.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-5.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-6.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-4.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-5.c: Likewise.
> ---

next prev parent reply	other threads:[~2021-04-30  7:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21  6:27 Feng Xue OS
2021-02-02  1:45 ` Ping: " Feng Xue OS
2021-04-02  5:44 ` PING^2 " Feng Xue OS
2021-04-29  2:50   ` PING^3 " Feng Xue OS
2021-04-29 13:46 ` David Malcolm
2021-04-30  4:02   ` Feng Xue OS
2021-05-08  1:49     ` Bin.Cheng
2021-04-30  7:37 ` Richard Biener [this message]
2021-05-07  4:29   ` Feng Xue OS
2021-05-10 11:56     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc1Y5xb3cvjU-8yK+X8sz7FO3eoznTSdkvQaoKRtm6ddQg@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=fxue@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).