From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7E4E3385DC19; Sun, 29 Oct 2023 18:01:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E4E3385DC19 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1698602462; bh=JfOdSKLLg0pgJO3ME5kUjwHAYt2OdzzIy1DxZoouhnQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PGxvzB5VFIuhl6RVRWsnVDRofCUYh++giQ1qBTteGSCNl5Qi/b0spI63dhxzyInaS VFiDlK9yp2Ll87udHKSETuu96/8xJUkfWBOoQcZsvAlvZjSOTPieW/9ceoEIot0N1X JaFlpjEDmg7+L5dMqpL3K5PIRTHnqqKwuN8ua6eY= From: "moncef.mechri at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply Date: Sun, 29 Oct 2023 18:01:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: moncef.mechri at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110551 --- Comment #6 from Moncef Mechri --- I confirm the extra mov disappears thanks to Roger's patch. However, the codegen still seems suboptimal to me when using -march=3Dhaswe= ll or newer, even with Roger's patch: uint64_t mulx64(uint64_t x) { __uint128_t r =3D (__uint128_t)x * 0x9E3779B97F4A7C15ull; return (uint64_t)r ^ (uint64_t)( r >> 64 ); } With -O2: mulx64(unsigned long): movabs rax, -7046029254386353131 mul rdi xor rax, rdx ret With -O2 -march=3Dhaswell mulx64(unsigned long): movabs rdx, -7046029254386353131 mulx rdi, rsi, rdi mov rax, rdi xor rax, rsi ret So it looks like there is still one extra mov, since I think the optimal codegen using mulx should be: mulx64(unsigned long): movabs rdx, -7046029254386353131 mulx rax, rsi, rdi xor rax, rsi ret=