From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id E985A3858D39; Wed,  1 Nov 2023 22:35:01 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E985A3858D39
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1698878101;
	bh=/3aPSTCFRLfc2l9WgG9gudTm8v5eYKbxLezWcHKUW5U=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=TBj3+/z/XeHRcMD+tPt8lm9XLg2AutsPlNhkqbDJsfd8aWJrsefjPxn60JhcZ/PZ1
	 jc/RVt4e0tX+qP5BcvnqUFThr/1U6Ibg4gkzbU7aIFfWnYaHGpElVASfbmaGJbWa7D
	 T+WKwxznxIoDB0tPOMv2bB6ygUFvwYxctFIBzQqQ=
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110551] [11/12/13/14 Regression] an extra mov when doing
 128bit multiply
Date: Wed, 01 Nov 2023 22:35:00 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.5
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110551-4-KBlf4VOdX7@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110551-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110551-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110551
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:80b1a371008c31982d35cff9b85ca6affd3ac949

commit r14-5063-g80b1a371008c31982d35cff9b85ca6affd3ac949
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Nov 1 22:33:45 2023 +0000

    PR target/110551: Tweak mulx register allocation using peephole2.

    This patch is a follow-up to my previous PR target/110551 patch, this
    time to address the additional move after mulx, seen on TARGET_BMI2
    architectures (such as -march=3Dhaswell).  The complication here is
    that the flexible multiple-set mulx instruction is introduced into
    RTL after reload, by split2, and therefore can't benefit from register
    preferencing.  This results in RTL like the following:

    (insn 32 31 17 2 (parallel [
                (set (reg:DI 4 si [orig:101 r ] [101])
                    (mult:DI (reg:DI 1 dx [109])
                        (reg:DI 5 di [109])))
                (set (reg:DI 5 di [ r+8 ])
                    (umul_highpart:DI (reg:DI 1 dx [109])
                        (reg:DI 5 di [109])))
            ]) "pr110551-2.c":8:17 -1
         (nil))

    (insn 17 32 9 2 (set (reg:DI 0 ax [107])
            (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
         (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
            (nil)))

    Here insn 32, the mulx instruction, places its results in si and di,
    and then immediately after decides to move di to ax, with di now dead.
    This can be trivially cleaned up by a peephole2.  I've added an
    additional constraint that the two SET_DESTs can't be the same
    register to avoid confusing the middle-end, but this has well-defined
    behaviour on x86_64/BMI2, encoding a umul_highpart.

    For the new test case, compiled on x86_64 with -O2 -march=3Dhaswell:

    Before:
    mulx64: movabsq $-7046029254386353131, %rdx
            mulx    %rdi, %rsi, %rdi
            movq    %rdi, %rax
            xorq    %rsi, %rax
            ret

    After:
    mulx64: movabsq $-7046029254386353131, %rdx
            mulx    %rdi, %rsi, %rax
            xorq    %rsi, %rax
            ret

    2023-11-01  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/110551
            * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition
            as operands[2] with predicate register_operand must be !MEM_P.
            (peephole2): Optimize a mulx followed by a register-to-register
            move, to place result in the correct destination if possible.

    gcc/testsuite/ChangeLog
            PR target/110551
            * gcc.target/i386/pr110551-2.c: New test case.=