From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 7E4E3385DC19; Sun, 29 Oct 2023 18:01:02 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E4E3385DC19
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1698602462;
	bh=JfOdSKLLg0pgJO3ME5kUjwHAYt2OdzzIy1DxZoouhnQ=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=PGxvzB5VFIuhl6RVRWsnVDRofCUYh++giQ1qBTteGSCNl5Qi/b0spI63dhxzyInaS
	 VFiDlK9yp2Ll87udHKSETuu96/8xJUkfWBOoQcZsvAlvZjSOTPieW/9ceoEIot0N1X
	 JaFlpjEDmg7+L5dMqpL3K5PIRTHnqqKwuN8ua6eY=
From: "moncef.mechri at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110551] [11/12/13/14 Regression] an extra mov when doing
 128bit multiply
Date: Sun, 29 Oct 2023 18:01:00 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: moncef.mechri at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.5
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110551-4-h16CPmJfvq@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110551-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110551-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110551

--- Comment #6 from Moncef Mechri <moncef.mechri at gmail dot com> ---
I confirm the extra mov disappears thanks to Roger's patch.

However, the codegen still seems suboptimal to me when using -march=3Dhaswe=
ll or
newer, even with Roger's patch:

uint64_t mulx64(uint64_t x)
{
    __uint128_t r =3D (__uint128_t)x * 0x9E3779B97F4A7C15ull;
    return (uint64_t)r ^ (uint64_t)( r >> 64 );
}


With -O2:

mulx64(unsigned long):
        movabs  rax, -7046029254386353131
        mul     rdi
        xor     rax, rdx
        ret

With -O2 -march=3Dhaswell

mulx64(unsigned long):
        movabs  rdx, -7046029254386353131
        mulx    rdi, rsi, rdi
        mov     rax, rdi
        xor     rax, rsi
        ret

So it looks like there is still one extra mov, since I think the optimal
codegen using mulx should be:

mulx64(unsigned long):
        movabs  rdx, -7046029254386353131
        mulx    rax, rsi, rdi
        xor     rax, rsi
        ret=