From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 859D73898521; Mon, 3 May 2021 08:43:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 859D73898521 From: "zero at smallinteger dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/100391] New: 128 bit arithmetic --- many unnecessary instructions when extracting smaller parts Date: Mon, 03 May 2021 08:43:55 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: zero at smallinteger dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 May 2021 08:43:55 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100391 Bug ID: 100391 Summary: 128 bit arithmetic --- many unnecessary instructions when extracting smaller parts Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zero at smallinteger dot com Target Milestone: --- Created attachment 50738 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D50738&action=3Dedit Sample code Consider the attached code, compiled with -O2. The return value of both functions is just the low 32 bits of num. Whether the top 4 bits of kt were zero, or became zero because of the shifts in the if statement, is irreleva= nt.=20 So, this both functions should have resulted in something like twostep(unsigned __int128): # @twostep(unsigned __int128) mov rax, rdi ret onestep(unsigned __int128): # @onestep(unsigned __int128) mov rax, rdi ret Instead, gcc added many unnecessary instructions to twostep() as shown belo= w. twostep(unsigned __int128): mov rcx, rdi mov rax, rdi shr rcx, 60 je .L2 movabs rdx, 1152921504606846975 and rax, rdx .L2: ret onestep(unsigned __int128): mov rax, rdi ret This particular behavior was isolated while examining the output of gcc 9.3= .0 on Ubuntu 20.04 LTS, then verified for the stated versions (and a few other= s) using Godbolt. Incidentally, it might be worth checking whether movabs + and is indeed fas= ter than shl + shr, assuming doing so was necessary. If too many movabs instructions are generated for bit masking like this, it will run against t= he Intel optimization manual's recommendation not to include too many full size literals in code.=