From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id F3F263858D28 for ; Wed, 1 Nov 2023 19:22:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F3F263858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F3F263858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::535 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698866575; cv=none; b=tjoU8pnPSbZLBx9g3jIUnxl06r14+XQz1YffXqnGOKh4GLNjWcVRewEINHE0vuX4Y6AOLj5KUt2qY9el1DaoWTFE7ufrFrm2trbOpQxzR+fDnaA+KgAWsqqk4bIPoYvj3imSfxGa7d/1tAPFVVhhJaHC5k/0FdFNAXJK5HXlvqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698866575; c=relaxed/simple; bh=d9dNjrP2s+EJz4W+PURGUE7blDmr94ZiJaYUPwSKfro=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=f4eyZxxdniSMUJcFzHStSFYCD6BRHCpPj6fQqsW++/LpiZezfeHTpVdEjj1lNOH4njGRiRSl2YqAF8SX+oExtOp6O+3VTt+Tz+EazZeVZ+JO0LsVlnyCtYAEHbmpKucsIC3U2RDCTFPf/0ZT+o6mOtRuROMGH4fA4aCyBe0HMbg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-53f9af41444so214915a12.1 for ; Wed, 01 Nov 2023 12:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698866563; x=1699471363; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LyF+Lj64sHV+V6OuJ8pelbSAZZLVwMKvyvUUILhNhzA=; b=aMuZeoOS1COcDNM3igkGIE6AzJUrmcvicbgGX4r3lEfjwuitn8Rb67stfQHEXRmnDS RUfTSuVPI1UXx+4Abtrxgtqeayy/OTgXo7GeF8cKCUCkEw5WvI66hysUTnk2BESg2mBS XPZ3FiezhXDSGZU4QnoUazfBLmJ6MrBP3I8zSJprzMf70Ya27B1Xca+a+0o8bPuju6DV 3ejJdhSeaNwfyi2oCs9gfCAG5xc4PRUdYApkxr3U8FjqPEZgVPH0ZfTRkl02kHlffP6o a+aXnWzlBjlD8svoQrKEL6d+zvj6xKfmbEW6IIe2FdCE2iYt+tbJN8a15VI+jY+i5o5F KIAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698866563; x=1699471363; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LyF+Lj64sHV+V6OuJ8pelbSAZZLVwMKvyvUUILhNhzA=; b=SG14LrF51AZYH5A1bhq2IWaxCjJqo77se30O271wqXK9FoCUc6w25oQZwj1CxfBJqK YVTkCZPzSzsGsYCgZTyi3gVBnFsW8mtjcK5nfoPFGeBbyU4jwh0fqSAzj2bNQAWuTb/L KSf9ZuVth2FGqKxsJ45MGWwNjuoCwWWRr6Vgqy3z7gPjyA3FLYwo1lg+w1g5f+SLfXDa H22yM0Tbe420EuXbbi5NDc6nm1JRf6gugFo0F2zSuK1V0rqRCxIzQOq/AgBVwsZdxuCB ReZ5XNwSq/htNFRoaYmdQaKdGtAdY8h/bHQ7a2gS/NFXIOflE0gA3k4mxVXpReL2fYN8 01BA== X-Gm-Message-State: AOJu0YzRJqW/QsoKPb62eqBDJNTfqCehoCV3KO08hmyDULywe4V7zrn9 W8yeu+gpUWMRoyvMSkvI2Eu+DzV4Xl5FYLvulBRiVwJoUoooHw== X-Google-Smtp-Source: AGHT+IHlSDfuK9qnN0snGPpH86AJq+0xnEzzJ/A4TZvdy76sSBNeSmbyqbzFXyp4dg7NY59BEKFZljBfMBc6AnYybC8= X-Received: by 2002:a50:fb1a:0:b0:53f:bb1e:ce39 with SMTP id d26-20020a50fb1a000000b0053fbb1ece39mr13819966edq.34.1698866562463; Wed, 01 Nov 2023 12:22:42 -0700 (PDT) MIME-Version: 1.0 References: <00c901da0b56$4a3ff750$debfe5f0$@nextmovesoftware.com> <003c01da0cc3$1f0ca790$5d25f6b0$@nextmovesoftware.com> In-Reply-To: <003c01da0cc3$1f0ca790$5d25f6b0$@nextmovesoftware.com> From: Uros Bizjak Date: Wed, 1 Nov 2023 20:22:30 +0100 Message-ID: Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2. To: Roger Sayle Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Nov 1, 2023 at 1:58=E2=80=AFPM Roger Sayle wrote: > > > Hi Uros, > > > From: Uros Bizjak > > Sent: 01 November 2023 10:05 > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register alloc= ation > > using peephole2. > > > > On Mon, Oct 30, 2023 at 6:27=E2=80=AFPM Roger Sayle > > wrote: > > > > > > > > > This patch is a follow-up to my previous PR target/110551 patch, this > > > time to address the additional move after mulx, seen on TARGET_BMI2 > > > architectures (such as -march=3Dhaswell). The complication here is t= hat > > > the flexible multiple-set mulx instruction is introduced into RTL > > > after reload, by split2, and therefore can't benefit from register > > > preferencing. This results in RTL like the following: > > > > > > (insn 32 31 17 2 (parallel [ > > > (set (reg:DI 4 si [orig:101 r ] [101]) > > > (mult:DI (reg:DI 1 dx [109]) > > > (reg:DI 5 di [109]))) > > > (set (reg:DI 5 di [ r+8 ]) > > > (umul_highpart:DI (reg:DI 1 dx [109]) > > > (reg:DI 5 di [109]))) > > > ]) "pr110551-2.c":8:17 -1 > > > (nil)) > > > > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107]) > > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_interna= l} > > > (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ]) > > > (nil))) > > > > > > Here insn 32, the mulx instruction, places its results in si and di, > > > and then immediately after decides to move di to ax, with di now dead= . > > > This can be trivially cleaned up by a peephole2. I've added an > > > additional constraint that the two SET_DESTs can't be the same > > > register to avoid confusing the middle-end, but this has well-defined > > > behaviour on x86_64/BMI2, encoding a umul_highpart. > > > > > > For the new test case, compiled on x86_64 with -O2 -march=3Dhaswell: > > > > > > Before: > > > mulx64: movabsq $-7046029254386353131, %rdx > > > mulx %rdi, %rsi, %rdi > > > movq %rdi, %rax > > > xorq %rsi, %rax > > > ret > > > > > > After: > > > mulx64: movabsq $-7046029254386353131, %rdx > > > mulx %rdi, %rsi, %rax > > > xorq %rsi, %rax > > > ret > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=3Dunix{-m32} > > > with no new failures. Ok for mainline? > > > > It looks that your previous PR110551 patch regressed -march=3Dcascadela= ke [1]. > > Let's fix these regressions first. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html > > > > Uros. > > This patch fixes that "regression". Originally, the test case in PR11055= 1 contained > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2 > targets, including -march=3Dhaswell and -march=3Dcascadelake. The first = patch > eliminated one of these MOVs, this patch eliminates the second. I'm not = sure > that you can call it a regression, the added test failed when run with a = non-standard > -march setting. The good news is that test case doesn't have to be chang= ed with > this patch applied, i.e. the correct intended behaviour is no MOVs on all= architectures. I was not worried about the extra mov, but more about [2], also referred from [1], but it looks that that regression was wrongly attributed to your patch. [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html > I'll admit the timing is unusual; I had already written and was regressio= n testing a > patch for the BMI2 issue, when the -march=3Dcascadelake regression tester= let me > know it was required for folks that helpfully run the regression suite wi= th non > standard settings. i.e. a long standing bug that wasn't previously teste= d for by > the testsuite. > > > > 2023-10-30 Roger Sayle > > > > > > gcc/ChangeLog > > > PR target/110551 > > > * config/i386/i386.md (*bmi2_umul3_1): Tidy condit= ion > > > as operands[2] with predicate register_operand must be !MEM_P= . > > > (peephole2): Optimize a mulx followed by a register-to-regist= er > > > move, to place result in the correct destination if possible. > > > > > > gcc/testsuite/ChangeLog > > > PR target/110551 > > > * gcc.target/i386/pr110551-2.c: New test case. The patch is OK. Thanks, Uros.