From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 04B013858D33 for ; Wed, 25 Oct 2023 14:41:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 04B013858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 04B013858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698244915; cv=none; b=F3L6Os71ZpTk9KvGu7sdZXVtYnsqFgZmvsLfzYC+jc/oHDGj/kkgxM6OVhetd/c4xW6rgNxtKLhK1Ajlf8nVXFAts12PHu8x0RAKDjA+s+vH7bZdk3L/xm7aDtPItCFGzS6mb48Rb8PDI8JXzdJlZSxa47IPv0kZmzYLxSTk4Q8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698244915; c=relaxed/simple; bh=BNImx5pOuhXSsoWXdBbTzrjgMoJJuXnBxAwc2cT6bFs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Sr6yy0c9jaq2RZG+KcyUQpzEVexT98+9vO5FtuoXe+MFKEyHNmAnVn/ZJzVknXctKZQOZlWhv5FAtxkYCMJ+Jl7Xk/lexMfPKbVx6KKsj0HSVaK4mKO/usw5Kgh+oC+4a+nX/v28yvEqUyMhh39GjVBxsjBt+13wmaFOTtXLOkE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=y9wWEL+cXVU/PHC2dP6iFyP/MvfjE2Sq8Jl2zkc1evk=; b=r0AQxAQ2AnM0T81Dntf9STZhr3 aEdxxN6tYPN7TUT58TA2IwxhMTvbBWUENOOoFcEswaG/GlBFkSrxQRfWzhndQq6I42FoSmwjv7iMP 20UHmnPnp8kXuRGrTTpzjbDiVvZPx8Ax41mtKMMoA1/L8zspdxoYw/saChvjlPFuZm1x72OcaGQpu j4f5jrUTb71402ucrUdpOOP1QA29UZl5Z+nguSjToLGp8U3v9Ci5ZwVmpslIAOClJpvhFlvrET2cW Y+hsEeLJ1E83rEAoir9qFf3Uy3YBFi2trOYQnlO6nSikfddcTUdbam+BE+Z5NxBIpM6qe5W6zGpLI UyDIRs/g==; Received: from [185.62.158.67] (port=58305 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1qvf4y-00013I-1P; Wed, 25 Oct 2023 10:41:52 -0400 From: "Roger Sayle" To: "'Uros Bizjak'" Cc: References: <022301da012c$f1f00a00$d5d01e00$@nextmovesoftware.com> In-Reply-To: Subject: RE: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications. Date: Wed, 25 Oct 2023 15:41:50 +0100 Message-ID: <003a01da0751$64764970$2d62dc50$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQHcqqWhY7bkR06OwVktejMxn/hmhgFY33pwsEp/0BA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Uros, I've tried your suggestions to see what would happen. Alas, allowing both operands to (i386's) widening multiplications to be nonimmediate_operand results in 90 additional testsuite unexpected failures", and 41 unresolved testcase, around things like: gcc.c-torture/compile/di.c:6:1: error: unrecognizable insn: (insn 14 13 15 2 (parallel [ (set (reg:DI 98 [ _3 ]) (mult:DI (zero_extend:DI (mem/c:SI (plus:SI (reg/f:SI 93 = virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [1 = a+0 S4 A64])) (zero_extend:DI (mem/c:SI (plus:SI (reg/f:SI 93 = virtual-stack-vars) (const_int -16 [0xfffffffffffffff0])) [1 = b+0 S4 A64])))) (clobber (reg:CC 17 flags)) ]) "gcc.c-torture/compile/di.c":5:12 -1 (nil)) during RTL pass: vregs gcc.c-torture/compile/di.c:6:1: internal compiler error: in = extract_insn, at recog.cc:2791 In my experiments, I've used nonimmediate_operand instead of = general_operand, as a zero_extend of an immediate_operand, like const_int, would be = non-canonical. In short, it's ok (common) for '%' to apply to operands with different = predicates; reload will only swap things if the operand's predicates/constraints = remain consistent. For example, see i386.c's *add_1 pattern. And as shown above it = can't be left to (until) reload to decide which "mem" gets loaded into a = register (which would be nice), as some passes before reload check both predicates and = constraints. My original patch fixes PR 110511, using the same peephole2 idiom as = already used elsewhere in i386.md. Ok for mainline? > -----Original Message----- > From: Uros Bizjak > Sent: 19 October 2023 18:02 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] PR target/110511: Fix reg allocation for = widening > multiplications. >=20 > On Tue, Oct 17, 2023 at 9:05=E2=80=AFPM Roger Sayle = > wrote: > > > > > > This patch contains clean-ups of the widening multiplication = patterns > > in i386.md, and provides variants of the existing highpart > > multiplication > > peephole2 transformations (that tidy up register allocation after > > reload), and thereby fixes PR target/110511, which is a superfluous > > move instruction. > > > > For the new test case, compiled on x86_64 with -O2. > > > > Before: > > mulx64: movabsq $-7046029254386353131, %rcx > > movq %rcx, %rax > > mulq %rdi > > xorq %rdx, %rax > > ret > > > > After: > > mulx64: movabsq $-7046029254386353131, %rax > > mulq %rdi > > xorq %rdx, %rax > > ret > > > > The clean-ups are (i) that operand 1 is consistently made > > register_operand and operand 2 becomes nonimmediate_operand, so that > > predicates match the constraints, (ii) the representation of the = BMI2 > > mulx instruction is updated to use the new umul_highpart RTX, and > > (iii) because operands > > 0 and 1 have different modes in widening multiplications, "a" is a > > more appropriate constraint than "0" (which avoids spills/reloads > > containing SUBREGs). The new peephole2 transformations are based = upon > > those at around line 9951 of i386.md, that begins with the comment = ;; > > Highpart multiplication peephole2s to tweak register allocation. > > ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx -> mov imm,%rax; imulq > > %rdi > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make = bootstrap > > and make -k check, both with and without --target_board=3Dunix{-m32} > > with no new failures. Ok for mainline? > > > > > > 2023-10-17 Roger Sayle > > > > gcc/ChangeLog > > PR target/110511 > > * config/i386/i386.md (mul3): Make operands 1 = and > > 2 take "regiser_operand" and "nonimmediate_operand" = respectively. > > (mulqihi3): Likewise. > > (*bmi2_umul3_1): Operand 2 needs to be = register_operand > > matching the %d constraint. Use umul_highpart RTX to = represent > > the highpart multiplication. > > (*umul3_1): Operand 2 should use regiser_operand > > predicate, and "a" rather than "0" as operands 0 and 2 have > > different modes. > > (define_split): For mul to mulx conversion, use the new > > umul_highpart RTX representation. > > (*mul3_1): Operand 1 should be register_operand > > and the constraint %a as operands 0 and 1 have different = modes. > > (*mulqihi3_1): Operand 1 should be register_operand = matching > > the constraint %0. > > (define_peephole2): Providing widening multiplication = variants > > of the peephole2s that tweak highpart multiplication = register > > allocation. > > > > gcc/testsuite/ChangeLog > > PR target/110511 > > * gcc.target/i386/pr110511.c: New test case. > > >=20 > (define_insn "*bmi2_umul3_1" > [(set (match_operand:DWIH 0 "register_operand" "=3Dr") > (mult:DWIH > - (match_operand:DWIH 2 "nonimmediate_operand" "%d") > + (match_operand:DWIH 2 "register_operand" "%d") > (match_operand:DWIH 3 "nonimmediate_operand" "rm"))) > (set (match_operand:DWIH 1 "register_operand" "=3Dr") >=20 > This will fail. Because of %, both predicates must allow = nonimmediate_operand, > since RA can swap operand constraints due to %. >=20 > @@ -9747,7 +9743,7 @@ > [(set (match_operand: 0 "register_operand" "=3Dr,A") > (mult: > (zero_extend: > - (match_operand:DWIH 1 "nonimmediate_operand" "%d,0")) > + (match_operand:DWIH 1 "register_operand" "%d,a")) > (zero_extend: > (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm")))) > (clobber (reg:CC FLAGS_REG))] >=20 > The same here, although changing "0" to "a" is correct, but the = oversight is > benign. "A" reads as "ad", and the first alternative already takes = "d". >=20 > +;; Widening multiplication peephole2s to tweak register allocation. > +;; mov imm,%rdx; mov %rdi,%rax; mulq %rdx -> mov imm,%rax; mulq = %rdi >=20 > Maybe instead of peephole2s, we allow general_operands in the = instruction (both > inputs) and rely on the RA to fulfill the constraint? Would this work? >=20 > Uros.