public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers. @ 2021-07-30 14:04 ts.tomeksopel at gmail dot com 2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: ts.tomeksopel at gmail dot com @ 2021-07-30 14:04 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693 Bug ID: 101693 Summary: Terrible SIMD register allocation with a tight loop operating on 8 registers. Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ts.tomeksopel at gmail dot com Target Milestone: --- There are a few issues regarding unnecessary register spilling, but this also exhibits a lot of unnecessary juggling between registers. See https://godbolt.org/z/da76fY1n7 and https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when_your_compiler_forgets_how_to/ The gist is that there's a tight loop, executed a constant number of times (~64 times) where accumulation happens to 8 ymm registers, and only those 8 registers are used from outside of the loop. Before the loop zeros are assinged, and after the loop horizontal addition is performed. GCC generates suboptimal code, whereas clang gets it right. It seems to perform unnecessary movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on godbolt >=8.1 seem to exhibit the issue, including trunk. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers. 2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com @ 2021-07-30 14:05 ` ts.tomeksopel at gmail dot com 2021-08-02 8:12 ` rguenth at gcc dot gnu.org 2021-08-02 8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: ts.tomeksopel at gmail dot com @ 2021-07-30 14:05 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693 --- Comment #1 from Tomasz Sobczyk <ts.tomeksopel at gmail dot com> --- PS. when #define USE_VNNI is commented out it exhibits similar behaviour to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers. 2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com 2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com @ 2021-08-02 8:12 ` rguenth at gcc dot gnu.org 2021-08-02 8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-08-02 8:12 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 51238 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51238&action=edit your testcase Attached the testcase for reference. The odd thing is there's nothing apperantly wrong with what we feed to the RA, but eventually we're confused about the UNSPEC way of encoding vpmaddubswaccd so the RA doesn't see it can coalesce the accumulators and its result. (insn 43 42 45 3 (set (reg:V8SI 164) (unspec:V8SI [ (subreg:V8SI (reg:V4DI 89 [ regs__I_lsm.13 ]) 0) (reg:V8SI 116 [ _127 ]) (mem:V8SI (plus:DI (reg:DI 84 [ ivtmp.26 ]) (const_int 6144 [0x1800])) [0 MEM[(const __m256i * {ref-all})_147 + 6144B]+0 S32 A256]) ] UNSPEC_VPMADDUBSWACCD)) "Compiler Explorer C++ Editor #[object Object] Code.cpp":13:11 6082 {vpdpbusd_v8si} (expr_list:REG_DEAD (reg:V4DI 89 [ regs__I_lsm.13 ]) (nil))) ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers. 2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com 2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com 2021-08-02 8:12 ` rguenth at gcc dot gnu.org @ 2021-08-02 8:12 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-08-02 8:12 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |x86_64-*-* i?86-*-* Component|rtl-optimization |target Status|UNCONFIRMED |NEW Last reconfirmed| |2021-08-02 Keywords| |ra Ever confirmed|0 |1 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-08-02 8:12 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com 2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com 2021-08-02 8:12 ` rguenth at gcc dot gnu.org 2021-08-02 8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).