From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 8DE5C3858C2F for ; Tue, 17 Oct 2023 19:05:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DE5C3858C2F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DE5C3858C2F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697569552; cv=none; b=uK8SSIUXAKs+CntJlFBT+HoVeg1HG7P5sobrzY+lFXctdaKtCqo17Jy9spEtoJejD73ZXhKvFf6Q+w/N/WeH9o73eKvmMerW3uSqlnHheX5kSmdu0ShN3esQOE0gshjC7wuf5YfppDoSDmDHeE02niBZsiDcVupVcFs1pPkjrEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697569552; c=relaxed/simple; bh=mnULr+NbX9JdFg5YONPdy9VGJ7FV1cw9UiYUKAyVTJM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Wj+hVuVHuFmGdCOGDydzP59dys4UqYKeDiSyDNucBh3yO4R3P8A9zz+tawTmvquWMK7NUL58jlRp++PTC+AH/W48EMpjNFPzwptUn5s+1Kab1NaxVrNlvJleVR+GN9Dm+6dk19x3nGEMnv+WoTc67itR++9DWtxEEUgedC4+QWo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=FaQa/sOTkkHPVM3+P0uBBnmcxaIM4iulKgMzk4inqQA=; b=Oxm9m5oKbhXtakgRdvLY1vXGkV FZS1j9tSm2X1+qEf43lq7I9ZuoVvSkJgegrJE4koI7wAp6VtNfLxLKAZr7qMVu7UMMw/QIou9GqrA QonSD1M36N79nUwcyWrYXS1qMDH0/n0UPSKCozHaLyQ4WrbQUTJ2GP+pBdbIaB4X+qegg9SUZVVp7 E+gDHAZjIk47b8w8zdnWDPtde3HLhlMcoaKobWYIuQIH0siIZtlK9UQ1s04ZTaNmXLxHXzLQasN8S XYbyzPnd8YACQFjo7kHbdw0tI32zs6G/Mt09SoP9XjdQS1i9EPz+hHS/BUBfVefvS3t1bz7ys1s3B x1F7c0Gg==; Received: from host86-160-20-38.range86-160.btcentralplus.com ([86.160.20.38]:49794 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.1) (envelope-from ) id 1qspO2-0007Do-08; Tue, 17 Oct 2023 15:05:50 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications. Date: Tue, 17 Oct 2023 20:05:49 +0100 Message-ID: <022301da012c$f1f00a00$d5d01e00$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0224_01DA0135.53B47200" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdoBLOFfdiNNKhAwQjCd6kOL2QKqTA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0224_01DA0135.53B47200 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch contains clean-ups of the widening multiplication patterns in i386.md, and provides variants of the existing highpart multiplication peephole2 transformations (that tidy up register allocation after reload), and thereby fixes PR target/110511, which is a superfluous move instruction. For the new test case, compiled on x86_64 with -O2. Before: mulx64: movabsq $-7046029254386353131, %rcx movq %rcx, %rax mulq %rdi xorq %rdx, %rax ret After: mulx64: movabsq $-7046029254386353131, %rax mulq %rdi xorq %rdx, %rax ret The clean-ups are (i) that operand 1 is consistently made register_operand and operand 2 becomes nonimmediate_operand, so that predicates match the constraints, (ii) the representation of the BMI2 mulx instruction is updated to use the new umul_highpart RTX, and (iii) because operands 0 and 1 have different modes in widening multiplications, "a" is a more appropriate constraint than "0" (which avoids spills/reloads containing SUBREGs). The new peephole2 transformations are based upon those at around line 9951 of i386.md, that begins with the comment ;; Highpart multiplication peephole2s to tweak register allocation. ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx -> mov imm,%rax; imulq %rdi This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-17 Roger Sayle gcc/ChangeLog PR target/110511 * config/i386/i386.md (mul3): Make operands 1 and 2 take "regiser_operand" and "nonimmediate_operand" respectively. (mulqihi3): Likewise. (*bmi2_umul3_1): Operand 2 needs to be register_operand matching the %d constraint. Use umul_highpart RTX to represent the highpart multiplication. (*umul3_1): Operand 2 should use regiser_operand predicate, and "a" rather than "0" as operands 0 and 2 have different modes. (define_split): For mul to mulx conversion, use the new umul_highpart RTX representation. (*mul3_1): Operand 1 should be register_operand and the constraint %a as operands 0 and 1 have different modes. (*mulqihi3_1): Operand 1 should be register_operand matching the constraint %0. (define_peephole2): Providing widening multiplication variants of the peephole2s that tweak highpart multiplication register allocation. gcc/testsuite/ChangeLog PR target/110511 * gcc.target/i386/pr110511.c: New test case. Thanks in advance, Roger ------=_NextPart_000_0224_01DA0135.53B47200 Content-Type: text/plain; name="patchmt.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchmt.txt" diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index 2a60df5..22f18c2 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -9710,33 +9710,29 @@=0A= [(parallel [(set (match_operand: 0 "register_operand")=0A= (mult:=0A= (any_extend:=0A= - (match_operand:DWIH 1 "nonimmediate_operand"))=0A= + (match_operand:DWIH 1 "register_operand"))=0A= (any_extend:=0A= - (match_operand:DWIH 2 "register_operand"))))=0A= + (match_operand:DWIH 2 "nonimmediate_operand"))))=0A= (clobber (reg:CC FLAGS_REG))])])=0A= =0A= (define_expand "mulqihi3"=0A= [(parallel [(set (match_operand:HI 0 "register_operand")=0A= (mult:HI=0A= (any_extend:HI=0A= - (match_operand:QI 1 "nonimmediate_operand"))=0A= + (match_operand:QI 1 "register_operand"))=0A= (any_extend:HI=0A= - (match_operand:QI 2 "register_operand"))))=0A= + (match_operand:QI 2 "nonimmediate_operand"))))=0A= (clobber (reg:CC FLAGS_REG))])]=0A= "TARGET_QIMODE_MATH")=0A= =0A= (define_insn "*bmi2_umul3_1"=0A= [(set (match_operand:DWIH 0 "register_operand" "=3Dr")=0A= (mult:DWIH=0A= - (match_operand:DWIH 2 "nonimmediate_operand" "%d")=0A= + (match_operand:DWIH 2 "register_operand" "%d")=0A= (match_operand:DWIH 3 "nonimmediate_operand" "rm")))=0A= (set (match_operand:DWIH 1 "register_operand" "=3Dr")=0A= - (truncate:DWIH=0A= - (lshiftrt:=0A= - (mult: (zero_extend: (match_dup 2))=0A= - (zero_extend: (match_dup 3)))=0A= - (match_operand:QI 4 "const_int_operand"))))]=0A= - "TARGET_BMI2 && INTVAL (operands[4]) =3D=3D * = BITS_PER_UNIT=0A= + (umul_highpart:DWIH (match_dup 2) (match_dup 3)))]=0A= + "TARGET_BMI2=0A= && !(MEM_P (operands[2]) && MEM_P (operands[3]))"=0A= "mulx\t{%3, %0, %1|%1, %0, %3}"=0A= [(set_attr "type" "imulx")=0A= @@ -9747,7 +9743,7 @@=0A= [(set (match_operand: 0 "register_operand" "=3Dr,A")=0A= (mult:=0A= (zero_extend:=0A= - (match_operand:DWIH 1 "nonimmediate_operand" "%d,0"))=0A= + (match_operand:DWIH 1 "register_operand" "%d,a"))=0A= (zero_extend:=0A= (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"))))=0A= (clobber (reg:CC FLAGS_REG))]=0A= @@ -9783,11 +9779,7 @@=0A= [(parallel [(set (match_dup 3)=0A= (mult:DWIH (match_dup 1) (match_dup 2)))=0A= (set (match_dup 4)=0A= - (truncate:DWIH=0A= - (lshiftrt:=0A= - (mult: (zero_extend: (match_dup 1))=0A= - (zero_extend: (match_dup 2)))=0A= - (match_dup 5))))])]=0A= + (umul_highpart:DWIH (match_dup 1) (match_dup 2)))])]=0A= {=0A= split_double_mode (mode, &operands[0], 1, &operands[3], = &operands[4]);=0A= =0A= @@ -9798,7 +9790,7 @@=0A= [(set (match_operand: 0 "register_operand" "=3DA")=0A= (mult:=0A= (sign_extend:=0A= - (match_operand:DWIH 1 "nonimmediate_operand" "%0"))=0A= + (match_operand:DWIH 1 "register_operand" "%a"))=0A= (sign_extend:=0A= (match_operand:DWIH 2 "nonimmediate_operand" "rm"))))=0A= (clobber (reg:CC FLAGS_REG))]=0A= @@ -9818,7 +9810,7 @@=0A= [(set (match_operand:HI 0 "register_operand" "=3Da")=0A= (mult:HI=0A= (any_extend:HI=0A= - (match_operand:QI 1 "nonimmediate_operand" "%0"))=0A= + (match_operand:QI 1 "register_operand" "%0"))=0A= (any_extend:HI=0A= (match_operand:QI 2 "nonimmediate_operand" "qm"))))=0A= (clobber (reg:CC FLAGS_REG))]=0A= @@ -9835,6 +9827,51 @@=0A= (set_attr "bdver1_decode" "direct")=0A= (set_attr "mode" "QI")])=0A= =0A= +;; Widening multiplication peephole2s to tweak register allocation.=0A= +;; mov imm,%rdx; mov %rdi,%rax; mulq %rdx -> mov imm,%rax; mulq %rdi=0A= +(define_peephole2=0A= + [(set (match_operand:DWIH 0 "general_reg_operand")=0A= + (match_operand:DWIH 1 "immediate_operand"))=0A= + (set (match_operand:DWIH 2 "general_reg_operand")=0A= + (match_operand:DWIH 3 "general_reg_operand"))=0A= + (parallel [(set (match_operand: 4 "general_reg_operand")=0A= + (mult: (zero_extend: (match_dup 2))=0A= + (zero_extend: (match_dup 0))))=0A= + (clobber (reg:CC FLAGS_REG))])]=0A= + "REGNO (operands[3]) !=3D AX_REG=0A= + && REGNO (operands[0]) !=3D REGNO (operands[2])=0A= + && REGNO (operands[0]) !=3D REGNO (operands[3])=0A= + && (REGNO (operands[0]) =3D=3D REGNO (operands[4])=0A= + || REGNO (operands[0]) =3D=3D DX_REG=0A= + || peep2_reg_dead_p (3, operands[0]))"=0A= + [(set (match_dup 2) (match_dup 1))=0A= + (parallel [(set (match_dup 4)=0A= + (mult: (zero_extend: (match_dup 2))=0A= + (zero_extend: (match_dup 3))))=0A= + (clobber (reg:CC FLAGS_REG))])])=0A= +=0A= +;; mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi=0A= +(define_peephole2=0A= + [(set (match_operand:DWIH 0 "general_reg_operand")=0A= + (match_operand:DWIH 1 "immediate_operand"))=0A= + (set (match_operand:DWIH 2 "general_reg_operand")=0A= + (match_operand:DWIH 3 "general_reg_operand"))=0A= + (parallel [(set (match_operand:DWIH 4 "general_reg_operand")=0A= + (mult:DWIH (match_dup 2) (match_dup 0)))=0A= + (set (match_operand:DWIH 5 "general_reg_operand")=0A= + (umul_highpart:DWIH (match_dup 2) (match_dup 0)))])]=0A= + "REGNO (operands[3]) !=3D DX_REG=0A= + && REGNO (operands[0]) !=3D REGNO (operands[2])=0A= + && REGNO (operands[0]) !=3D REGNO (operands[3])=0A= + && (REGNO (operands[0]) =3D=3D REGNO (operands[4])=0A= + || REGNO (operands[0]) =3D=3D REGNO (operands[5])=0A= + || peep2_reg_dead_p (3, operands[0]))"=0A= + [(set (match_dup 2) (match_dup 1))=0A= + (parallel [(set (match_dup 4)=0A= + (mult:DWIH (match_dup 2) (match_dup 3)))=0A= + (set (match_dup 5)=0A= + (umul_highpart:DWIH (match_dup 2) (match_dup 3)))])])=0A= +=0A= ;; Highpart multiplication patterns=0A= (define_insn "mul3_highpart"=0A= [(set (match_operand:DWIH 0 "register_operand" "=3Dd")=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr110511.c = b/gcc/testsuite/gcc.target/i386/pr110511.c=0A= new file mode 100644=0A= index 0000000..142b808=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr110511.c=0A= @@ -0,0 +1,12 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +typedef unsigned long long uint64_t;=0A= +=0A= +uint64_t mulx64(uint64_t x)=0A= +{=0A= + __uint128_t r =3D (__uint128_t)x * 0x9E3779B97F4A7C15ull;=0A= + return (uint64_t)r ^ (uint64_t)( r >> 64 );=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "movq" } } */=0A= ------=_NextPart_000_0224_01DA0135.53B47200--