From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4C666386103A; Sat, 6 Mar 2021 21:36:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4C666386103A From: "unlvsur at live dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/99434] std::bit_cast generates more instructions than __builtin_bit_cast and memcpy with -march=native Date: Sat, 06 Mar 2021 21:36:42 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: enhancement X-Bugzilla-Who: unlvsur at live dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Mar 2021 21:36:42 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99434 --- Comment #2 from cqwrteur --- (In reply to Andrew Pinski from comment #1) > This is just a register allocation issue dealing with mulx and TImode. >=20 > If mulq was used instead (that is without -march=3Dnative), all of the > functions are done correctly. I do not think so. I think GCC generally did things like this wrong. I have even found out how to produce different wrong results deterministically. For example like this https://godbolt.org/z/PbobYG Any time it deals with things like >>32 or >>64, it produces a slower resul= t. This even compiles without -march=3Dnative. While clang generates exactly the same assembly which means my result is correct. GCC does things for this wrong. It looks like we need more optimizations on trees for these patterns.=