public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/99912] Unnecessary / inefficient spilling of AVX2 ymm registers Date: Wed, 07 Apr 2021 09:36:12 +0000 [thread overview] Message-ID: <bug-99912-4-mehr20EFnN@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-99912-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2021-04-07 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Erik Schnetter from comment #5) > As you suggested, the problem is probably not caused by register spills, but > by stores into a struct that are not optimized away. In this case, the > respective struct elements are unused in the code. > > I traced the results of the first __builtin_ia32_maskloadpd256: > > _63940 = __builtin_ia32_maskloadpd256 (_63955, prephitmp_86203); > MEM <const vector(4) double> [(struct mat3 *)&vars + 992B] = _63940; > _178613 = .FMA (_63940, _64752, _178609); > MEM <const vector(4) double> [(struct mat3 *)&vars + 1312B] = _63940; > > The respective struct locations (+ 992B, + 1312B) are indeed not used > anywhere else. > > The struct is of type z4c_vars. It (and its parent) are defined in lines > 279837 to 280818. It is large. > > Is there e.g. a parameter I could set to make GCC try harder avoid > unnecessary stores? Yes, there's --param dse-max-alias-queries-per-store=N where setting N to 10000 seems to help quite a bit (it's default is 256 and it's used to limit the quadratic complexity of DSE to constant-time). There are still some other temporaries left, notably we have aggregate copies like MEM[(struct vect *)&D.662088 + 96B clique 23 base 1].elts._M_elems[0] = MEM[(const struct simd &)&D.662080 clique 23 base 0]; and later used as SR.3537_174107 = MEM <simd_vector> [(const struct vec3 &)&D.662088]; the aggregate copying keeps the stores into D.662080 live (instead of directly storing into D.662088). At SRA time we still take the address of both so they are not considered for decomposition. That's from dead stores to some std::initializer_list. Thus there's a pass ordering issue and SRA would benefit from another DSE pass before it. For the fun I did diff --git a/gcc/passes.def b/gcc/passes.def index e9ed3c7bc57..0c8a50e7a07 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -221,6 +221,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); NEXT_PASS (pass_lower_complex); + NEXT_PASS (pass_dse); NEXT_PASS (pass_sra); /* The dom pass will also resolve all __builtin_constant_p calls that are still there to 0. This has to be done after some @@ -236,7 +237,6 @@ along with GCC; see the file COPYING3. If not see /* Identify paths that should never be executed in a conforming program and isolate those paths. */ NEXT_PASS (pass_isolate_erroneous_paths); - NEXT_PASS (pass_dse); NEXT_PASS (pass_reassoc, true /* insert_powi_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); which helps SRA in this case. It does need quite some magic to remove all the C++ abstraction in this code. There's also lots of stray CLOBBERS of otherwise unused variables in the code which skews the DSE alias query parameter limit, we should look at doing TODO_remove_unused_locals more often (for example after DSE) - that alone is enough to get almost all the improvements without increasing any of the DSE walking limits.
next prev parent reply other threads:[~2021-04-07 9:36 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-05 1:53 [Bug target/99912] New: " schnetter at gmail dot com 2021-04-05 1:54 ` [Bug target/99912] " schnetter at gmail dot com 2021-04-05 2:03 ` schnetter at gmail dot com 2021-04-06 8:25 ` rguenth at gcc dot gnu.org 2021-04-06 14:42 ` schnetter at gmail dot com 2021-04-06 16:33 ` schnetter at gmail dot com 2021-04-07 9:36 ` rguenth at gcc dot gnu.org [this message] 2021-04-07 16:48 ` rguenth at gcc dot gnu.org 2021-04-27 13:17 ` cvs-commit at gcc dot gnu.org 2021-04-27 13:17 ` cvs-commit at gcc dot gnu.org 2021-04-27 14:02 ` rguenth at gcc dot gnu.org 2021-04-27 16:03 ` schnetter at gmail dot com 2021-04-29 6:32 ` cvs-commit at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-99912-4-mehr20EFnN@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).