From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 095953858C3A for ; Mon, 26 Jul 2021 11:27:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 095953858C3A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=fkmFmNbMFElochaaopDxtT1tdVH1u9hoTdShkM8Ktos=; b=F8nybH+huiP1P+vSBwRiYIXJ++ GyXe1FRMUcXmI2kPM168bfht8GDbLDAHmlhct8oOk4n9+e3kJ3dkpmOeDvYrut21SgnWy+SlT+fKs UWU7Rq3sipAwnkjvIAxEsVB/tgN2D0QfNU9Mbq/V5EZWpozuw20vZrEiy4qNDygKI0FjTx8OrYkv8 CaTiMXpxmzJ3OcR6Nf1Z1hWq49NWnGmzO4A9YuPwHAtXgUrzSDTdi2f4XjeJUvSkZDbHt7+CexCbj QjEqdYvxMGQ1ZQkpCHUelclzqHvSkifZ0PJVXAgZFFcE1M0skmrLtuLmpR8071CapS764QcFtIJzn Kzgbe1+w==; Received: from host109-154-46-127.range109-154.btcentralplus.com ([109.154.46.127]:59091 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1m7ylV-0007UK-9m; Mon, 26 Jul 2021 07:27:21 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [x86_64 PATCH] Decrement followed by cmov improvements. Date: Mon, 26 Jul 2021 12:27:18 +0100 Message-ID: <02a101d78211$326e66f0$974b34d0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_02A2_01D78219.94353FF0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdeCD7veQ/FNpBQeQ8mUe3zjvRHxVg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2021 11:27:23 -0000 This is a multipart message in MIME format. ------=_NextPart_000_02A2_01D78219.94353FF0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit The following patch to the x86_64 backend improves the code generated for a decrement followed by a conditional move. The primary change is to recognize that after subtracting one, checking the result is -1 (or equivalently that the original value was zero) can be implemented using the borrow/carry flag instead of requiring an explicit test instruction. This is achieved by a new define_insn_and_split that allows combine to split the desired sequence/composite into a *subsi_3 and *movsicc_noc. The other change with this patch is/are a pair of peephole2 optimizations to eliminate register-to-register moves generated during register allocation. During reload, the compiler doesn't know that inverting the condition of a conditional cmove can sometimes reduce register pressure, but this is easy to tidy up during the peephole2 pass (where swapping the order of the insn's operands performs the required logic inversion). Both improvements are demonstrated by the case below: int foo(int x) { if (x == 0) x = 16; else x--; return x; } Before: foo: leal -1(%rdi), %eax testl %edi, %edi movl $16, %edx cmove %edx, %eax ret After: foo: subl $1, %edi movl $16, %eax cmovnc %edi, %eax ret And the value of the peephole2 clean-up can be seen on its own in: int bar(int x) { x--; if (x == 0) x = 16; return x; } Before: bar: movl %edi, %eax movl $16, %edx subl $1, %eax cmove %edx, %eax ret After: bar: subl $1, %edi movl $16, %eax cmovne %edi, %eax ret These idioms were inspired by the source code of NIST SciMark4's Random_nextDouble function, where the tweaks above result in a ~1% improvement in the MonteCarlo benchmark kernel. This patch has been tested on x86_64-pc-linux-gnu with a "make boostrap" and "make -k check" with no new failures. Ok for mainline? 2021-07-26 Roger Sayle gcc/ChangeLog * config/i386/i386.md (*dec_cmov): New define_insn_and_split to generate a conditional move using the carry flag after sub $1. (peephole2): Eliminate a register-to-register move by inverting the condition of a conditional move. gcc/testsuite/ChangeLog * gcc.target/i386/dec-cmov-1.c: New test. * gcc.target/i386/dec-cmov-2.c: New test. Roger -- Roger Sayle NextMove Software Cambridge, UK ------=_NextPart_000_02A2_01D78219.94353FF0 Content-Type: text/plain; name="patchd2.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchd2.txt" diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index 8b809c4..a4f512f 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -6755,6 +6755,29 @@=0A= ? GEU : LTU, VOIDmode, cc, const0_rtx);=0A= })=0A= =0A= +;; Help combine use borrow flag to test for -1 after dec (add $-1).=0A= +(define_insn_and_split "*dec_cmov"=0A= + [(set (match_operand:SWI248 0 "register_operand" "=3Dr")=0A= + (if_then_else:SWI248=0A= + (match_operator 1 "bt_comparison_operator"=0A= + [(match_operand:SWI248 2 "register_operand" "0") (const_int 0)])=0A= + (plus:SWI248 (match_dup 2) (const_int -1))=0A= + (match_operand:SWI248 3 "nonimmediate_operand" "rm")))=0A= + (clobber (reg:CC FLAGS_REG))]=0A= + "TARGET_CMOVE"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(parallel [(set (reg:CC FLAGS_REG)=0A= + (compare:CC (match_dup 2) (const_int 1)))=0A= + (set (match_dup 0) (minus:SWI248 (match_dup 2) (const_int 1)))])=0A= + (set (match_dup 0)=0A= + (if_then_else:SWI248 (match_dup 4) (match_dup 0) (match_dup 3)))]=0A= +{=0A= + rtx cc =3D gen_rtx_REG (CCCmode, FLAGS_REG);=0A= + operands[4] =3D gen_rtx_fmt_ee (GET_CODE (operands[1]) =3D=3D NE=0A= + ? GEU : LTU, VOIDmode, cc, const0_rtx);=0A= +})=0A= +=0A= (define_insn "*subsi_3_zext"=0A= [(set (reg FLAGS_REG)=0A= (compare (match_operand:SI 1 "register_operand" "0")=0A= @@ -19068,6 +19091,70 @@=0A= gcc_unreachable ();=0A= })=0A= =0A= +;; Eliminate a reg-reg mov by inverting the condition of a cmov (#1).=0A= +;; mov r0,r1; dec r0; mov r2,r3; cmov r0,r2 -> dec r1; mov r0,r3; cmov = r0, r1=0A= +(define_peephole2=0A= + [(set (match_operand:SWI248 0 "register_operand")=0A= + (match_operand:SWI248 1 "register_operand"))=0A= + (parallel [(set (reg FLAGS_REG) (match_operand 5))=0A= + (set (match_dup 0) (match_operand:SWI248 6))])=0A= + (set (match_operand:SWI248 2 "register_operand")=0A= + (match_operand:SWI248 3))=0A= + (set (match_dup 0)=0A= + (if_then_else:SWI248 (match_operator 4 "ix86_comparison_operator"=0A= + [(reg FLAGS_REG) (const_int 0)])=0A= + (match_dup 0)=0A= + (match_dup 2)))]=0A= + "TARGET_CMOVE=0A= + && REGNO (operands[2]) !=3D REGNO (operands[0])=0A= + && REGNO (operands[2]) !=3D REGNO (operands[1])=0A= + && peep2_reg_dead_p (1, operands[1])=0A= + && peep2_reg_dead_p (4, operands[2])=0A= + && !reg_overlap_mentioned_p (operands[0], operands[3])"=0A= + [(parallel [(set (match_dup 7) (match_dup 8))=0A= + (set (match_dup 1) (match_dup 9))])=0A= + (set (match_dup 0) (match_dup 3))=0A= + (set (match_dup 0) (if_then_else:SWI248 (match_dup 4)=0A= + (match_dup 1)=0A= + (match_dup 0)))]=0A= +{=0A= + operands[7] =3D SET_DEST (XVECEXP (PATTERN (peep2_next_insn (1)), 0, = 0));=0A= + operands[8] =3D replace_rtx (operands[5], operands[0], operands[1]);=0A= + operands[9] =3D replace_rtx (operands[6], operands[0], operands[1]);=0A= +})=0A= +=0A= +;; Eliminate a reg-reg mov by inverting the condition of a cmov (#2).=0A= +;; mov r2,r3; mov r0,r1; dec r0; cmov r0,r2 -> dec r1; mov r0,r3; cmov = r0, r1=0A= +(define_peephole2=0A= + [(set (match_operand:SWI248 2 "register_operand")=0A= + (match_operand:SWI248 3))=0A= + (set (match_operand:SWI248 0 "register_operand")=0A= + (match_operand:SWI248 1 "register_operand"))=0A= + (parallel [(set (reg FLAGS_REG) (match_operand 5))=0A= + (set (match_dup 0) (match_operand:SWI248 6))])=0A= + (set (match_dup 0)=0A= + (if_then_else:SWI248 (match_operator 4 "ix86_comparison_operator"=0A= + [(reg FLAGS_REG) (const_int 0)])=0A= + (match_dup 0)=0A= + (match_dup 2)))]=0A= + "TARGET_CMOVE=0A= + && REGNO (operands[2]) !=3D REGNO (operands[0])=0A= + && REGNO (operands[2]) !=3D REGNO (operands[1])=0A= + && peep2_reg_dead_p (2, operands[1])=0A= + && peep2_reg_dead_p (4, operands[2])=0A= + && !reg_overlap_mentioned_p (operands[0], operands[3])"=0A= + [(parallel [(set (match_dup 7) (match_dup 8))=0A= + (set (match_dup 1) (match_dup 9))])=0A= + (set (match_dup 0) (match_dup 3))=0A= + (set (match_dup 0) (if_then_else:SWI248 (match_dup 4)=0A= + (match_dup 1)=0A= + (match_dup 0)))]=0A= +{=0A= + operands[7] =3D SET_DEST (XVECEXP (PATTERN (peep2_next_insn (2)), 0, = 0));=0A= + operands[8] =3D replace_rtx (operands[5], operands[0], operands[1]);=0A= + operands[9] =3D replace_rtx (operands[6], operands[0], operands[1]);=0A= +})=0A= +=0A= (define_expand "movcc"=0A= [(set (match_operand:X87MODEF 0 "register_operand")=0A= (if_then_else:X87MODEF=0A= ------=_NextPart_000_02A2_01D78219.94353FF0 Content-Type: text/plain; name="dec-cmov-1.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="dec-cmov-1.c" /* { dg-do compile { target { ! ia32 } } } */=0A= /* { dg-options "-O2" } */=0A= =0A= int foo_m1(int x)=0A= {=0A= x--;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= int foo_m2(int x)=0A= {=0A= x -=3D 2;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= int foo_p1(int x)=0A= {=0A= x++;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= int foo_p2(int x)=0A= {=0A= x +=3D 2;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= =0A= long long fool_m1(long long x)=0A= {=0A= x--;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= long long fool_m2(long long x)=0A= {=0A= x -=3D 2;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= long long fool_p1(long long x)=0A= {=0A= x++;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= long long fool_p2(long long x)=0A= {=0A= x +=3D 2;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= =0A= short foos_m1(short x)=0A= {=0A= x--;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= short foos_m2(short x)=0A= {=0A= x -=3D 2;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= short foos_p1(short x)=0A= {=0A= x++;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= short foos_p2(short x)=0A= {=0A= x +=3D 2;=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= /* { dg-final { scan-assembler-not "mov(l|q)\[ \\t\]*%(e|r)(cx|di), = %(e|r)ax" } } */=0A= =0A= ------=_NextPart_000_02A2_01D78219.94353FF0 Content-Type: text/plain; name="dec-cmov-2.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="dec-cmov-2.c" /* { dg-do compile { target { ! ia32 } } } */=0A= /* { dg-options "-O2" } */=0A= =0A= int foo(int x)=0A= {=0A= x--;=0A= if (x =3D=3D -1)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= int bar(int x)=0A= {=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= else x--;=0A= return x;=0A= }=0A= =0A= long long fool(long long x)=0A= {=0A= x--;=0A= if (x =3D=3D -1)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= long long barl(long long x)=0A= {=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= else x--;=0A= return x;=0A= }=0A= =0A= short foos(short x)=0A= {=0A= x--;=0A= if (x =3D=3D -1)=0A= x =3D 16;=0A= return x;=0A= }=0A= =0A= short bars(short x)=0A= {=0A= if (x =3D=3D 0)=0A= x =3D 16;=0A= else x--;=0A= return x;=0A= }=0A= =0A= /* { dg-final { scan-assembler-not "lea(l|q)" } } */=0A= /* { dg-final { scan-assembler-not "test(l|q|w)" } } */=0A= =0A= ------=_NextPart_000_02A2_01D78219.94353FF0--