public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/99912] Unnecessary / inefficient spilling of AVX2 ymm registers
Date: Wed, 07 Apr 2021 09:36:12 +0000	[thread overview]
Message-ID: <bug-99912-4-mehr20EFnN@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-99912-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2021-04-07
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Erik Schnetter from comment #5)
> As you suggested, the problem is probably not caused by register spills, but
> by stores into a struct that are not optimized away. In this case, the
> respective struct elements are unused in the code.
> 
> I traced the results of the first __builtin_ia32_maskloadpd256:
> 
>   _63940 = __builtin_ia32_maskloadpd256 (_63955, prephitmp_86203);
>   MEM <const vector(4) double> [(struct mat3 *)&vars + 992B] = _63940;
>   _178613 = .FMA (_63940, _64752, _178609);
>   MEM <const vector(4) double> [(struct mat3 *)&vars + 1312B] = _63940;
> 
> The respective struct locations (+ 992B, + 1312B) are indeed not used
> anywhere else.
> 
> The struct is of type z4c_vars. It (and its parent) are defined in lines
> 279837 to 280818. It is large.
> 
> Is there e.g. a parameter I could set to make GCC try harder avoid
> unnecessary stores?

Yes, there's --param dse-max-alias-queries-per-store=N where setting N
to 10000 seems to help quite a bit (it's default is 256 and it's used to
limit the quadratic complexity of DSE to constant-time).

There are still some other temporaries left, notably we have aggregate
copies like

  MEM[(struct vect *)&D.662088 + 96B clique 23 base 1].elts._M_elems[0] =
MEM[(const struct simd &)&D.662080 clique 23 base 0];

and later used as

  SR.3537_174107 = MEM <simd_vector> [(const struct vec3 &)&D.662088];

the aggregate copying keeps the stores into D.662080 live (instead of
directly storing into D.662088).  At SRA time we still take the address
of both so they are not considered for decomposition.  That's from
dead stores to some std::initializer_list.  Thus there's a pass ordering
issue and SRA would benefit from another DSE pass before it.

For the fun I did

diff --git a/gcc/passes.def b/gcc/passes.def
index e9ed3c7bc57..0c8a50e7a07 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -221,6 +221,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_tail_recursion);
       NEXT_PASS (pass_ch);
       NEXT_PASS (pass_lower_complex);
+      NEXT_PASS (pass_dse);
       NEXT_PASS (pass_sra);
       /* The dom pass will also resolve all __builtin_constant_p calls
          that are still there to 0.  This has to be done after some
@@ -236,7 +237,6 @@ along with GCC; see the file COPYING3.  If not see
       /* Identify paths that should never be executed in a conforming
         program and isolate those paths.  */
       NEXT_PASS (pass_isolate_erroneous_paths);
-      NEXT_PASS (pass_dse);
       NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_forwprop);

which helps SRA in this case.  It does need quite some magic to remove all
the C++ abstraction in this code.  There's also lots of stray CLOBBERS
of otherwise unused variables in the code which skews the DSE alias
query parameter limit, we should look at doing TODO_remove_unused_locals
more often (for example after DSE) - that alone is enough to get almost
all the improvements without increasing any of the DSE walking limits.

  parent reply	other threads:[~2021-04-07  9:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05  1:53 [Bug target/99912] New: " schnetter at gmail dot com
2021-04-05  1:54 ` [Bug target/99912] " schnetter at gmail dot com
2021-04-05  2:03 ` schnetter at gmail dot com
2021-04-06  8:25 ` rguenth at gcc dot gnu.org
2021-04-06 14:42 ` schnetter at gmail dot com
2021-04-06 16:33 ` schnetter at gmail dot com
2021-04-07  9:36 ` rguenth at gcc dot gnu.org [this message]
2021-04-07 16:48 ` rguenth at gcc dot gnu.org
2021-04-27 13:17 ` cvs-commit at gcc dot gnu.org
2021-04-27 13:17 ` cvs-commit at gcc dot gnu.org
2021-04-27 14:02 ` rguenth at gcc dot gnu.org
2021-04-27 16:03 ` schnetter at gmail dot com
2021-04-29  6:32 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-99912-4-mehr20EFnN@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).