public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "tschwinge at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104345] [12 Regression] "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1" patch made some code generation worse Date: Tue, 08 Feb 2022 11:57:27 +0000 [thread overview] Message-ID: <bug-104345-4-D0WF6BdCU1@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-104345-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104345 --- Comment #7 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- All your three patches combined still doesn't help resolve the problem. And, what I realized: they don't even change the Nvidia/CUDA Driver reported "used [...] registers". Does that mean that the Driver "doesn't care" that we feed into it simple PTX code, using less PTX registers -- it seems able to do these optimizations all by itself? :-O (At least regarding number of registers used -- I didn't verify the SASS code generated.) (Emitting cleaner, "de-cluttered" code in GCC/nvptx is still very much valuable, of course: if only for our own visual consumption! ... at least as long as it doesn't make 'nvptx.md' etc. much more complex...) For "good" vs. "bad"/"not-so-good" (before vs. after "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1"), the only code generation difference is in the '__muldc3' function ('nvptx-none/libgcc/_muldc3.o'), and that one is the only '.extern' dependency aside from the '__reduction_lock' global variable ('nvptx-none/libgcc/reduction.o'). (In the following, working around <https://gcc.gnu.org/PR104416> "'lto-wrapper' invoking 'mkoffload's with duplicated command-line options".) This means, I can conveniently manually create a minimal nvptx 'libgcc.a': $ cp build-gcc-offload-nvptx-none/nvptx-none/libgcc/_muldc3.o ./ $ rm -f libgcc.a && ar q libgcc.a _muldc3.o build-gcc-offload-nvptx-none/nvptx-none/libgcc/reduction.o ..., and compile 'libgomp.oacc-c-c++-common/reduction-cplx-dbl.c' with '-foffload=nvptx-none=-Wl,-L.'. (Via 'GOMP_DEBUG=1' verified that identical PTX code is loaded to GPU.) Then, hand-modify '_muldc3.o', re-create 'libgcc.a', re-compile, re-execute. Verified that before "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1" '__muldc3' (attached '_muldc3-good.o') works fine, and after "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1" '__muldc3' (attached '_muldc3-bad.o') does show the problem originally reported here. I then gradually morphed the former into the latter (beginning with eliding simple changes like renumbered registers etc.), until only one last change was necessary to turn "good" into "bad" (attached '_muldc3-WIP.o'); showing the "still-good" state: [...] @@ -1716,8 +1718,16 @@ cvt.rn.f64.s32 %r84,%r27; copysign.f64 %r61,%r61,%r84; .loc 2 1981 32 +// Current/after "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1": +/* selp.u32 %r86,1,0,%r138; mov.u32 %r85,%r86; +*/ +// Before "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1": +set.u32.leu.f64 %r86,%r57,0d7fefffffffffffff; +neg.s32 %r87,%r86; +mov.u32 %r85,%r87; +// cvt.u16.u8 %r140,%r85; mov.u16 %r89,%r140; xor.b16 %r88,%r89,1; @@ -1770,8 +1780,16 @@ [...] That is, things go "bad" if we here use the '%r138' that was computed (and used) earlier in the code, and things are "good" if we re-compute locally. Same for '%r139' in the second code block. (Interestingly, it is tolerated if one of the long-lived registers are used, but things go "bad" only if *both* are used. But that's probably just a distraction? And again, I have not inspected the actual SASS code, but just looked at the JIT-time "used [...] registers".) Do we thus conclude that what happens is that the "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1" changes here enable an "optimization" in GCC such that values that have previously been compute may be re-used later in the code, without re-computing them. But on the flip side, of course, this means that the values have to kept live in (SASS) registers. (That's just my theory; I haven't verified the actual SASS.) In other words: at least in this case here, it seems preferrable to re-compute instead of keeping registers occupied. (But I'm of course not claiming that to be a simple yes/no decision...) It seem we're now in territory of tuning CPU vs. GPU code generation? Certainly, GCC has not seen much care for the latter (GPU code generation). I mean: verify GCC pass pipeline generally, and parameterization of individual passes for GPU code generation. Impossible to get "right", of course, but maybe some heuristics for CPU vs. GPU may be discovered and implemented? I'm sure there must be some literature on that topic? All that complicated by the fact the with the (several different versions of the) Nvidia/CUDA Driver's PTX -> SASS translation/optimization we have another moving part...
next prev parent reply other threads:[~2022-02-08 11:57 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-02-02 14:10 [Bug target/104345] New: nvptx: "regression" after "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1" tschwinge at gcc dot gnu.org 2022-02-02 14:26 ` [Bug target/104345] " roger at nextmovesoftware dot com 2022-02-02 22:37 ` [Bug target/104345] [12 Regression] "nvptx: Transition nvptx backend to STORE_FLAG_VALUE = 1" patch made some code generation worse pinskia at gcc dot gnu.org 2022-02-03 14:17 ` vries at gcc dot gnu.org 2022-02-03 21:09 ` roger at nextmovesoftware dot com 2022-02-08 11:53 ` tschwinge at gcc dot gnu.org 2022-02-08 11:54 ` tschwinge at gcc dot gnu.org 2022-02-08 11:55 ` tschwinge at gcc dot gnu.org 2022-02-08 11:57 ` tschwinge at gcc dot gnu.org [this message] 2022-02-08 12:33 ` tschwinge at gcc dot gnu.org 2022-02-08 15:36 ` tschwinge at gcc dot gnu.org 2022-02-10 8:02 ` cvs-commit at gcc dot gnu.org 2022-02-10 8:02 ` cvs-commit at gcc dot gnu.org 2022-03-02 12:10 ` roger at nextmovesoftware dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-104345-4-D0WF6BdCU1@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).