From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id E8A593864818; Fri, 30 Jul 2021 14:04:23 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E8A593864818
From: "ts.tomeksopel at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/101693] New: Terrible SIMD register allocation
 with a tight loop operating on 8 registers.
Date: Fri, 30 Jul 2021 14:04:23 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 11.2.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: ts.tomeksopel at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-101693-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2021 14:04:24 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101693

            Bug ID: 101693
           Summary: Terrible SIMD register allocation with a tight loop
                    operating on 8 registers.
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ts.tomeksopel at gmail dot com
  Target Milestone: ---

There are a few issues regarding unnecessary register spilling, but this al=
so
exhibits a lot of unnecessary juggling between registers.

See https://godbolt.org/z/da76fY1n7 and
https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when=
_your_compiler_forgets_how_to/

The gist is that there's a tight loop, executed a constant number of times =
(~64
times) where accumulation happens to 8 ymm registers, and only those 8
registers are used from outside of the loop. Before the loop zeros are
assinged, and after the loop horizontal addition is performed. GCC generates
suboptimal code, whereas clang gets it right. It seems to perform unnecessa=
ry
movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on
godbolt >=3D8.1 seem to exhibit the issue, including trunk.=