[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "already5chosen at yahoo dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3
Date: Sat, 26 Nov 2022 18:36:09 +0000	[thread overview]
Message-ID: <bug-97832-4-roxWRcU4J0@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-97832-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832

--- Comment #20 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Richard Biener from comment #17)
> (In reply to Michael_S from comment #16)
> > On unrelated note, why loop overhead uses so many instructions?
> > Assuming that I am as misguided as gcc about load-op combining, I would
> > write it as:
> >   sub %rax, %rdx
> > .L3:
> >   vmovupd   (%rdx,%rax), %ymm1
> >   vmovupd 32(%rdx,%rax), %ymm0
> >   vfmadd213pd    32(%rax), %ymm3, %ymm1
> >   vfnmadd213pd     (%rax), %ymm2, %ymm0
> >   vfnmadd231pd   32(%rdx,%rax), %ymm3, %ymm0
> >   vfnmadd231pd     (%rdx,%rax), %ymm2, %ymm1
> >   vmovupd %ymm0,   (%rax)
> >   vmovupd %ymm1, 32(%rax)
> >   addq    $64, %rax
> >   decl    %esi
> >   jb      .L3
> >   
> > The loop overhead in my variant is 3 x86 instructions==2 macro-ops,
> > vs 5 x86 instructions==4 macro-ops in gcc variant.
> > Also, in gcc variant all memory accesses have displacement that makes them
> > 1 byte longer. In my variant only half of accesses have displacement.
> > 
> > I think, in the past I had seen cases where gcc generates optimal or
> > near-optimal
> > code sequences for loop overhead. I wonder why it can not do it here.
> 
> I don't think we currently consider IVs based on the difference of two
> addresses.  

It seems to me that I had seen you doing it.
But, may be, I confuse gcc with clang.

> The cost benefit of no displacement is only size, 

Size is pretty important in high-IPC SIMD loops. Esp. on Intel and when # of
iterations is small, because Intel has 16-byte fetch out of L1I cache. SIMD
instructions tend to be long and not many instructions fit within 16 bytes even
when memory accesses have no offsets. Offset adds impact to the injury.

> otherwise
> I have no idea why we have biased the %rax accesses by -32.  Why we
> fail to consider decrement-to-zero for the counter IV is probably because
> IVCANON would add such IV but the vectorizer replaces that and IVOPTs
> doesn't consider re-adding that.

Sorry, I have no idea about the meaning of IVCANON.

next prev parent reply	other threads:[~2022-11-26 18:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14 20:44 [Bug target/97832] New: " already5chosen at yahoo dot com
2020-11-16  7:21 ` [Bug target/97832] " rguenth at gcc dot gnu.org
2020-11-16 11:11 ` rguenth at gcc dot gnu.org
2020-11-16 20:11 ` already5chosen at yahoo dot com
2020-11-17  9:21 ` [Bug tree-optimization/97832] " rguenth at gcc dot gnu.org
2020-11-17 10:18 ` rguenth at gcc dot gnu.org
2020-11-18  8:53 ` rguenth at gcc dot gnu.org
2020-11-18  9:15 ` rguenth at gcc dot gnu.org
2020-11-18 13:23 ` rguenth at gcc dot gnu.org
2020-11-18 13:39 ` rguenth at gcc dot gnu.org
2020-11-19 19:55 ` already5chosen at yahoo dot com
2020-11-20  7:10 ` rguenth at gcc dot gnu.org
2021-06-09 12:41 ` cvs-commit at gcc dot gnu.org
2021-06-09 12:54 ` rguenth at gcc dot gnu.org
2022-01-21  0:16 ` pinskia at gcc dot gnu.org
2022-11-24 23:22 ` already5chosen at yahoo dot com
2022-11-25  8:16 ` rguenth at gcc dot gnu.org
2022-11-25 13:19 ` already5chosen at yahoo dot com
2022-11-25 20:46 ` rguenth at gcc dot gnu.org
2022-11-25 21:27 ` amonakov at gcc dot gnu.org
2022-11-26 18:27 ` already5chosen at yahoo dot com
2022-11-26 18:36 ` already5chosen at yahoo dot com [this message]
2022-11-26 19:36 ` amonakov at gcc dot gnu.org
2022-11-26 22:00 ` already5chosen at yahoo dot com
2022-11-28  6:29 ` crazylht at gmail dot com
2022-11-28  6:42 ` crazylht at gmail dot com
2022-11-28  7:21 ` rguenther at suse dot de
2022-11-28  7:24 ` crazylht at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-97832-4-roxWRcU4J0@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).