public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers.
@ 2021-07-30 14:04 ts.tomeksopel at gmail dot com
2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: ts.tomeksopel at gmail dot com @ 2021-07-30 14:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693
Bug ID: 101693
Summary: Terrible SIMD register allocation with a tight loop
operating on 8 registers.
Product: gcc
Version: 11.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ts.tomeksopel at gmail dot com
Target Milestone: ---
There are a few issues regarding unnecessary register spilling, but this also
exhibits a lot of unnecessary juggling between registers.
See https://godbolt.org/z/da76fY1n7 and
https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when_your_compiler_forgets_how_to/
The gist is that there's a tight loop, executed a constant number of times (~64
times) where accumulation happens to 8 ymm registers, and only those 8
registers are used from outside of the loop. Before the loop zeros are
assinged, and after the loop horizontal addition is performed. GCC generates
suboptimal code, whereas clang gets it right. It seems to perform unnecessary
movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on
godbolt >=8.1 seem to exhibit the issue, including trunk.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers.
2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
@ 2021-07-30 14:05 ` ts.tomeksopel at gmail dot com
2021-08-02 8:12 ` rguenth at gcc dot gnu.org
2021-08-02 8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: ts.tomeksopel at gmail dot com @ 2021-07-30 14:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693
--- Comment #1 from Tomasz Sobczyk <ts.tomeksopel at gmail dot com> ---
PS. when
#define USE_VNNI
is commented out it exhibits similar behaviour to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers.
2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
@ 2021-08-02 8:12 ` rguenth at gcc dot gnu.org
2021-08-02 8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-02 8:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 51238
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51238&action=edit
your testcase
Attached the testcase for reference. The odd thing is there's nothing
apperantly wrong with what we feed to the RA, but eventually we're confused
about the UNSPEC
way of encoding vpmaddubswaccd so the RA doesn't see it can coalesce the
accumulators and its result.
(insn 43 42 45 3 (set (reg:V8SI 164)
(unspec:V8SI [
(subreg:V8SI (reg:V4DI 89 [ regs__I_lsm.13 ]) 0)
(reg:V8SI 116 [ _127 ])
(mem:V8SI (plus:DI (reg:DI 84 [ ivtmp.26 ])
(const_int 6144 [0x1800])) [0 MEM[(const __m256i *
{ref-all})_147 + 6144B]+0 S32 A256])
] UNSPEC_VPMADDUBSWACCD)) "Compiler Explorer C++ Editor #[object
Object] Code.cpp":13:11 6082 {vpdpbusd_v8si}
(expr_list:REG_DEAD (reg:V4DI 89 [ regs__I_lsm.13 ])
(nil)))
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers.
2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
2021-08-02 8:12 ` rguenth at gcc dot gnu.org
@ 2021-08-02 8:12 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-02 8:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-* i?86-*-*
Component|rtl-optimization |target
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-08-02
Keywords| |ra
Ever confirmed|0 |1
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-08-02 8:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
2021-08-02 8:12 ` rguenth at gcc dot gnu.org
2021-08-02 8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).