RE: [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Cui, Lili" <lili.cui@intel.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass
Date: Wed, 17 May 2023 13:05:10 +0000	[thread overview]
Message-ID: <SJ0PR11MB56006591A169D43098C628699E7E9@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAFiYyc3GhaVARNhxWCVT+QobZS9pnCgy5xRBSN=Jf8QFkUM9Jg@mail.gmail.com>

> I think to make a difference you need to hit the number of parallel fadd/fmul
> the pipeline can perform.  I don't think issue width is ever a problem for
> chains w/o fma and throughput of fma vs fadd + fmul should be similar.
> 

Yes, for x86 backend, fadd , fmul and fma have the same TP meaning they should have the same width. 
The current implementation is reasonable  /* reassoc int, fp, vec_int, vec_fp.  */.

> That said, I think iff then we should try to improve
> rewrite_expr_tree_parallel rather than adding a new function.  For example
> for the case with equal rank operands we can try to sort adds first.  I can't
> convince myself that rewrite_expr_tree_parallel honors ranks properly
> quickly.
> 

I rewrite this patch, there are mainly two changes:
1. I made some changes to rewrite_expr_tree_parallel_for_fma and used it instead of rewrite_expr_tree_parallel. The following example shows that the sequence generated by the this patch is better.
2. Put no-mult ops and mult ops alternately at the end of the queue, which is conducive to generating more fma and reducing the loss of FMA when breaking the chain.
  
With these two changes, GCC can break the chain with width = 2 and generates 6 FMAs for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350  without any params.

------------------------------------------------------------------------------------------------------------------
Source code： g + h + j + s + m + n+a+b +e  (https://godbolt.org/z/G8sb86n84)
Compile options: -Ofast -mfpmath=sse -mfma
Width = 3 was chosen for reassociation
-----------------------------------------------------------------------------------------------------------------
Old rewrite_expr_tree_parallel generates:
  _6 = g_8(D) + h_9(D);       ------> parallel 0
  _3 = s_11(D) + m_12(D);  ------> parallel 1
  _5 = _3 + j_10(D);
  _2 = n_13(D) + a_14(D);   ------> parallel 2
  _1 = b_15(D) + e_16(D);  -----> Parallel 3, This is not necessary, and it is not friendly to FMA.
  _4 = _1 + _2;        
  _7 = _4 + _5;        
  _17 = _6 + _7;      
  return _17;

When the width = 3,  we need 5 cycles here.
---------------------------------------------first end---------------------------------------------------------
Rewrite the old rewrite_expr_tree_parallel (3 sets in parallel) generates:

  _3 = s_11(D) + m_12(D);  ------> parallel 0
  _5 = _3 + j_10(D);
  _2 = n_13(D) + a_14(D);   ------> parallel 1
  _1 = b_15(D) + e_16(D);   ------> parallel 2
  _4 = _1 + _2;
  _6 = _4 + _5;
  _7 = _6 + h_9(D);
  _17 = _7 + g_8(D); 
  return _17;

When the width = 3, we need 5 cycles here.
---------------------------------------------second end-------------------------------------------------------
Use rewrite_expr_tree_parallel_for_fma instead of rewrite_expr_tree_parallel generates:

  _3 = s_11(D) + m_12(D);
  _6 = _3 + g_8(D);
  _2 = n_13(D) + a_14(D);
  _5 = _2 + h_9(D);
  _1 = b_15(D) + e_16(D);
  _4 = _1 + j_10(D);
  _7 = _4 + _5;
  _17 = _7 + _6;
  return _17;

When the width = 3, we need 4 cycles here.
--------------------------------------------third end-----------------------------------------------------------

Thanks,
Lili.

next prev parent reply	other threads:[~2023-05-17 13:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 10:12 Cui, Lili
2023-05-11 10:12 ` [PATCH 2/2] Add a tune option to control the length of the chain with FMA Cui, Lili
2023-05-11 10:56   ` Richard Biener
2023-05-11 10:52 ` [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass Richard Biener
2023-05-11 15:18   ` Cui, Lili
2023-05-12  6:04     ` Richard Biener
2023-05-12  9:04       ` Cui, Lili
2023-05-15 13:12         ` Richard Biener
2023-05-17 13:05           ` Cui, Lili [this message]
2023-05-22 13:15             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB56006591A169D43098C628699E7E9@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).