[Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers.
@ 2021-07-30 14:04 ts.tomeksopel at gmail dot com
  2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: ts.tomeksopel at gmail dot com @ 2021-07-30 14:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693

            Bug ID: 101693
           Summary: Terrible SIMD register allocation with a tight loop
                    operating on 8 registers.
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ts.tomeksopel at gmail dot com
  Target Milestone: ---

There are a few issues regarding unnecessary register spilling, but this also
exhibits a lot of unnecessary juggling between registers.

See https://godbolt.org/z/da76fY1n7 and
https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when_your_compiler_forgets_how_to/

The gist is that there's a tight loop, executed a constant number of times (~64
times) where accumulation happens to 8 ymm registers, and only those 8
registers are used from outside of the loop. Before the loop zeros are
assinged, and after the loop horizontal addition is performed. GCC generates
suboptimal code, whereas clang gets it right. It seems to perform unnecessary
movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on
godbolt >=8.1 seem to exhibit the issue, including trunk.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers.
  2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
@ 2021-07-30 14:05 ` ts.tomeksopel at gmail dot com
  2021-08-02  8:12 ` rguenth at gcc dot gnu.org
  2021-08-02  8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: ts.tomeksopel at gmail dot com @ 2021-07-30 14:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693

--- Comment #1 from Tomasz Sobczyk <ts.tomeksopel at gmail dot com> ---
PS. when 

#define USE_VNNI

is commented out it exhibits similar behaviour to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers.
  2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
  2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
@ 2021-08-02  8:12 ` rguenth at gcc dot gnu.org
  2021-08-02  8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-02  8:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 51238
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51238&action=edit
your testcase

Attached the testcase for reference.  The odd thing is there's nothing
apperantly wrong with what we feed to the RA, but eventually we're confused
about the UNSPEC
way of encoding vpmaddubswaccd so the RA doesn't see it can coalesce the
accumulators and its result.

(insn 43 42 45 3 (set (reg:V8SI 164)
        (unspec:V8SI [
                (subreg:V8SI (reg:V4DI 89 [ regs__I_lsm.13 ]) 0)
                (reg:V8SI 116 [ _127 ])
                (mem:V8SI (plus:DI (reg:DI 84 [ ivtmp.26 ])
                        (const_int 6144 [0x1800])) [0 MEM[(const __m256i *
{ref-all})_147 + 6144B]+0 S32 A256])
            ] UNSPEC_VPMADDUBSWACCD)) "Compiler Explorer C++ Editor #[object
Object] Code.cpp":13:11 6082 {vpdpbusd_v8si}
     (expr_list:REG_DEAD (reg:V4DI 89 [ regs__I_lsm.13 ])
        (nil)))

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/101693] Terrible SIMD register allocation with a tight loop operating on 8 registers.
  2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
  2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
  2021-08-02  8:12 ` rguenth at gcc dot gnu.org
@ 2021-08-02  8:12 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-02  8:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-* i?86-*-*
          Component|rtl-optimization            |target
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-08-02
           Keywords|                            |ra
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-02  8:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 14:04 [Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers ts.tomeksopel at gmail dot com
2021-07-30 14:05 ` [Bug rtl-optimization/101693] " ts.tomeksopel at gmail dot com
2021-08-02  8:12 ` rguenth at gcc dot gnu.org
2021-08-02  8:12 ` [Bug target/101693] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).