[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "already5chosen at yahoo dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/97127] FMA3 code transformation leads to slowdown on Skylake
Date: Thu, 24 Sep 2020 12:38:49 +0000	[thread overview]
Message-ID: <bug-97127-4-KwHxP1zONW@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-97127-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127

--- Comment #13 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Hongtao.liu from comment #11)
> (In reply to Michael_S from comment #10)
> > (In reply to Hongtao.liu from comment #9)
> > > (In reply to Michael_S from comment #8)
> > > > What are values of gcc "loop" cost of the relevant instructions now?
> > > > 1. AVX256 Load
> > > > 2. FMA3 ymm,ymm,ymm
> > > > 3. AVX256 Regmove
> > > > 4. FMA3 mem,ymm,ymm
> > > 
> > > For skylake, outside of register allocation.
> > > 
> > > they are
> > > 1. AVX256 Load  ---- 10
> > > 2. FMA3 ymm,ymm,ymm --- 16
> > > 3. AVX256 Regmove  --- 2
> > > 4. FMA3 mem,ymm,ymm --- 32
> > > 
> > > In RA, no direct cost for fma instrcutions, but we can disparage memory
> > > alternative in FMA instructions， but again, it may hurt performance in some
> > > cases.
> > > 
> > > 1. AVX256 Load  ---- 10
> > > 3. AVX256 Regmove  --- 2
> > > 
> > > BTW: we have done a lot of experiments with different cost models and no
> > > significant performance impact on SPEC2017.
> > 
> > Thank you.
> > With relative costs like these gcc should generate 'FMA3 mem,ymm,ymm' only
> > in conditions of heavy registers pressure. So, why it generates it in my
> > loop, where registers pressure in the innermost loop is light and even in
> > the next outer level the pressure isn't heavy?
> > What am I missing?
> 
> the actual transformation gcc did is
> 
> vmovuxx (mem1), %ymmA     pass_combine     
> vmovuxx (mem), %ymmD         ---->     vmovuxx   (mem1), %ymmA
> vfmadd213 %ymmD,%ymmC,%ymmA            vfmadd213 (mem),%ymmC,%ymmA
> 
> then RA works like
>                             RA
> vmovuxx (mem1), %ymmA      ---->  %vmovaps %ymmB, %ymmA
> vfmadd213 (mem),%ymmC,%ymmA       vfmadd213 (mem),%ymmC,%ymmA
> 
> it "look like" but actually not this one.
> 
>  vmovuxx      (mem), %ymmA
>  vfnmadd231xx %ymmB, %ymmC, %ymmA
> transformed to
>  vmovaxx      %ymmB, %ymmA
>  vfnmadd213xx (mem), %ymmC, %ymmA
> 
> ymmB is allocate for (mem1) not (mem)

Thank you.
Now compiler's reasoning is starting to make more sense.
Still I don't understand why compiler does not compare the cost of full loop
body after combining to the cost before combining and does not come to
conclusion that combining increased the cost.

next prev parent reply	other threads:[~2020-09-24 12:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-20 20:36 [Bug target/97127] New: " already5chosen at yahoo dot com
2020-09-21  6:54 ` [Bug target/97127] " rguenth at gcc dot gnu.org
2020-09-21 10:35 ` amonakov at gcc dot gnu.org
2020-09-21 13:40 ` already5chosen at yahoo dot com
2020-09-21 15:17 ` amonakov at gcc dot gnu.org
2020-09-22  8:10 ` crazylht at gmail dot com
2020-09-22 10:01 ` already5chosen at yahoo dot com
2020-09-23  1:38 ` crazylht at gmail dot com
2020-09-23 17:49 ` already5chosen at yahoo dot com
2020-09-24  3:23 ` crazylht at gmail dot com
2020-09-24  8:28 ` already5chosen at yahoo dot com
2020-09-24 10:06 ` crazylht at gmail dot com
2020-09-24 10:46 ` crazylht at gmail dot com
2020-09-24 12:38 ` already5chosen at yahoo dot com [this message]
2020-09-25  5:24 ` crazylht at gmail dot com
2020-09-25 13:21 ` already5chosen at yahoo dot com
2020-09-25 14:02 ` amonakov at gcc dot gnu.org
2020-09-25 15:55 ` amonakov at gcc dot gnu.org
2020-09-30 12:09 ` rsandifo at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-97127-4-KwHxP1zONW@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).