RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Di Zhao OS <dizhao@os.amperecomputing.com>
To: Richard Biener <richard.guenther@gmail.com>,
	Jeff Law <jeffreyalaw@gmail.com>, Martin Jambor <mjambor@suse.cz>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA
Date: Tue, 29 Aug 2023 07:49:01 +0000	[thread overview]
Message-ID: <SN6PR01MB4240BDF99B9DAE32C7E1346DE8E7A@SN6PR01MB4240.prod.exchangelabs.com> (raw)
In-Reply-To: <CAFiYyc0bPqj_wiZSYPL-a3uHV+wg-=faGQJZPkgFLRV53x8P6A@mail.gmail.com>

Hi,

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, August 29, 2023 3:41 PM
> To: Jeff Law <jeffreyalaw@gmail.com>; Martin Jambor <mjambor@suse.cz>
> Cc: Di Zhao OS <dizhao@os.amperecomputing.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
> reduce cross backedge FMA
> 
> On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > This patch tries to fix the 2% regression in 510.parest_r on
> > > ampere1 in the tracker. (Previous discussion is here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> > >
> > > 1. Add testcases for the problem. For an op list in the form of
> > > "acc = a * b + c * d + acc", currently reassociation doesn't
> > > Swap the operands so that more FMAs can be generated.
> > > After widening_mul the result looks like:
> > >
> > >     _1 = .FMA(a, b, acc_0);
> > >     acc_1 = .FMA(c, d, _1);
> > >
> > > While previously (before the "Handle FMA friendly..." patch),
> > > widening_mul's result was like:
> > >
> > >     _1 = a * b;
> > >     _2 = .FMA (c, d, _1);
> > >     acc_1 = acc_0 + _2;
> 
> How can we execute the multiply and the FMA in parallel?  They
> depend on each other.  Or is it the uarch can handle dependence
> on the add operand but only when it is with a multiplication and
> not a FMA in some better ways?  (I'd doubt so much complexity)
> 
> Can you explain in more detail how the uarch executes one vs. the
> other case?
> 
> > > If the code fragment is in a loop, some architecture can execute
> > > the latter in parallel, so the performance can be much faster than
> > > the former. For the small testcase, the performance gap is over
> > > 10% on both ampere1 and neoverse-n1. So the point here is to avoid
> > > turning the last statement into FMA, and keep it a PLUS_EXPR as
> > > much as possible. (If we are rewriting the op list into parallel,
> > > no special treatment is needed, since the last statement after
> > > rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)
> > >
> > > 2. Function result_feeds_back_from_phi_p is to check for cross
> > > backedge dependency. Added new enum fma_state to describe the
> > > state of FMA candidates.
> > >
> > > With this patch, there's a 3% improvement in 510.parest_r 1-copy
> > > run on ampere1. The compile options are:
> > > "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".
> > >
> > > Best regards,
> > > Di Zhao
> > >
> > > ----
> > >
> > >          PR tree-optimization/110279
> > >
> > > gcc/ChangeLog:
> > >
> > >          * tree-ssa-reassoc.cc (enum fma_state): New enum to
> > >          describe the state of FMA candidates for an op list.
> > >          (rewrite_expr_tree_parallel): Changed boolean
> > >          parameter to enum type.
> > >          (result_feeds_back_from_phi_p): New function to check
> > >          for cross backedge dependency.
> > >          (rank_ops_for_fma): Return enum fma_state. Added new
> > >          parameter.
> > >          (reassociate_bb): If there's backedge dependency in an
> > >          op list, swap the operands before rewrite_expr_tree.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >          * gcc.dg/pr110279.c: New test.
> > Not a review, but more of a question -- isn't this transformation's
> > profitability uarch sensitive.  ie, just because it's bad for a set of
> > aarch64 uarches, doesn't mean it's bad everywhere.
> >
> > And in general we shy away from trying to adjust gimple code based on
> > uarch preferences.
> >
> > It seems the right place to do this is gimple->rtl expansion.
> 
> Another comment is that FMA forming has this deferring code which I
> think deals exactly with this kind of thing?  CCing Martin who did this
> work based on AMD uarchs also not wanting cross-loop dependences
> on FMAs (or so).  In particular I see
> 
>   if (fma_state.m_deferring_p
>       && fma_state.m_initial_phi)
>     {
>       gcc_checking_assert (fma_state.m_last_result);
>       if (!last_fma_candidate_feeds_initial_phi (&fma_state,
>                                                  &m_last_result_set))
>         cancel_fma_deferring (&fma_state);
> 
> and I think code to avoid FMAs in other/related cases should be here
> as well, like avoid forming back-to-back FMAs.

The changes in this patch is controlled by "param_avoid_fma_max_bits", so
I think it should only affect architectures with similar behavior. (The
parameter was added in a previous patch "Deferring FMA transformations
in tight loops", which seems to be dealing with the same issue.)

> Richard.
> 
> > Jeff

Thanks,
Di Zhao

next prev parent reply	other threads:[~2023-08-29  7:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-28  8:17 Di Zhao OS
2023-08-28 23:22 ` Jeff Law
2023-08-29  7:41   ` Richard Biener
2023-08-29  7:49     ` Di Zhao OS [this message]
2023-08-29  8:09       ` Richard Biener
2023-08-29  8:58         ` Di Zhao OS
2023-08-29 11:11           ` Richard Biener
2023-08-30  9:33             ` Di Zhao OS
2023-08-31 12:22               ` Richard Biener
2023-09-04 10:33                 ` Di Zhao OS
2023-08-29 12:34     ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR01MB4240BDF99B9DAE32C7E1346DE8E7A@SN6PR01MB4240.prod.exchangelabs.com \
    --to=dizhao@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=mjambor@suse.cz \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).