From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 781A43858C3B; Fri, 10 Sep 2021 03:27:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 781A43858C3B From: "ntukanov at cmu dot edu" To: gcc-bugs@gcc.gnu.org Subject: [Bug inline-asm/102264] Macro Intrinsics fail to use all the registers on the machine Date: Fri, 10 Sep 2021 03:27:51 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: inline-asm X-Bugzilla-Version: 9.1.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: ntukanov at cmu dot edu X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Sep 2021 03:27:51 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102264 --- Comment #2 from Nicholai Tukanov --- (In reply to Andrew Pinski from comment #1) > There seems to be some extra moves the register allocator cannot remove a= nd > that is causing some extra spilling. > > Your loop has 32 live variables and that is just at the limit. Can the register allocator be modified to recognize the other registers? The problem seems limited to the compute instruction (vpdpwssd in this case).=20 I specifically choose 32 to max out the registers. Since the compute instruction gets limited to half of that (zmm0-zmm15), the extra moves are killing the performance.=