From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 7F8B43858D38 for ; Tue, 6 Jun 2023 22:31:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7F8B43858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=0IUpzPbpjjLXQle+H244pk6rVJO3GMncRJDS6se376g=; b=Cfps7b8e9Mq6Z78a2JabKVV6U2 /EuSyjT9GRCGDX25OsIWQ64Cao+UW5ARtQDjvE6AclFe6aFZDcoizxGGVXJiRyGMO2Ba4Aejx1DtG 30jMSQXngZf8UGoRu+vQa5tDXsiEDEVezrUlGvQqg8mmeqRUBWzCuWcdrdM5xHbuWG+eInfBcmM3D UAdIuONipzW2+GsE6g1z0S3TCoUmtbMRBUC0csBe4keHFeP501JVPpcoItDCuPc63VaYLUDfVMcQI uIN5uRtFsn6n9oR8CyH+7hKEPztPyR0fJqBm3szQzerS8Yz0imxZQfF7ywR+CpwTw+q/OZQ3qT472 cUo7abow==; Received: from host86-169-41-81.range86-169.btcentralplus.com ([86.169.41.81]:63949 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1q6fDL-00056R-2t; Tue, 06 Jun 2023 18:31:44 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86_64 PATCH] PR target/110104: Missing peephole2 for addcarry. Date: Tue, 6 Jun 2023 23:31:42 +0100 Message-ID: <030901d998c6$ac062250$041266f0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_030A_01D998CF.0DCA8A50" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdmYxgqMqzOEUxM+T52yEU8CTnaUiw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_030A_01D998CF.0DCA8A50 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch resolves PR target/110104, a missed optimization on x86 around adc with memory operands. In i386.md, there's a peephole2 after the pattern for *add3_cc_overflow_1 that converts the sequence reg = add(reg,mem); mem = reg [where the reg is dead afterwards] into the equivalent mem = add(mem,reg). The equivalent peephole2 for adc is missing (after addcarry), and is added by this patch. For the example code provided in the bugzilla PR: Before: movq %rsi, %rax mulq %rdx addq %rax, (%rdi) movq %rdx, %rax adcq 8(%rdi), %rax adcq $0, 16(%rdi) movq %rax, 8(%rdi) ret After: movq %rsi, %rax mulq %rdx addq %rax, (%rdi) adcq %rdx, 8(%rdi) adcq $0, 16(%rdi) ret Note that the addq in this example has been transformed by the existing peephole2 described above. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-07 Roger Sayle gcc/ChangeLog PR target/110104 * config/i386/i386.md (define_peephole2): Transform reg=adc(reg,mem) followed by mem=reg into mem=adc(mem,reg) when applicable. gcc/testsuite/ChangeLog PR target/110104 * gcc.target/i386/pr110104.c: New test case. Thanks in advance, Roger -- ------=_NextPart_000_030A_01D998CF.0DCA8A50 Content-Type: text/plain; name="patchac.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchac.txt" diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index e6ebc46..33ec45f 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -7870,6 +7870,51 @@=0A= (set_attr "pent_pair" "pu")=0A= (set_attr "mode" "")])=0A= =0A= +;; peephole2 for addcarry matching one for = *add3_cc_overflow_1.=0A= +;; reg =3D adc(reg,mem); mem =3D reg -> mem =3D adc(mem,reg).=0A= +(define_peephole2=0A= + [(parallel=0A= + [(set (reg:CCC FLAGS_REG) =0A= + (compare:CCC=0A= + (zero_extend:=0A= + (plus:SWI48=0A= + (plus:SWI48=0A= + (match_operator:SWI48 3 "ix86_carry_flag_operator"=0A= + [(match_operand 2 "flags_reg_operand") (const_int 0)])=0A= + (match_operand:SWI48 0 "general_reg_operand"))=0A= + (match_operand:SWI48 1 "memory_operand")))=0A= + (plus:=0A= + (zero_extend: (match_dup 1))=0A= + (match_operator: 4 "ix86_carry_flag_operator"=0A= + [(match_dup 2) (const_int 0)]))))=0A= + (set (match_dup 0)=0A= + (plus:SWI48 (plus:SWI48 (match_op_dup 3=0A= + [(match_dup 2) (const_int 0)])=0A= + (match_dup 0))=0A= + (match_dup 1)))])=0A= + (set (match_dup 1) (match_dup 0))]=0A= + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())=0A= + && peep2_reg_dead_p (2, operands[0])=0A= + && !reg_overlap_mentioned_p (operands[0], operands[1])"=0A= + [(parallel=0A= + [(set (reg:CCC FLAGS_REG)=0A= + (compare:CCC=0A= + (zero_extend:=0A= + (plus:SWI48=0A= + (plus:SWI48=0A= + (match_op_dup 3 [(match_dup 2) (const_int 0)])=0A= + (match_dup 1))=0A= + (match_dup 0)))=0A= + (plus:=0A= + (zero_extend: (match_dup 0))=0A= + (match_op_dup 4=0A= + [(match_dup 2) (const_int 0)]))))=0A= + (set (match_dup 1)=0A= + (plus:SWI48 (plus:SWI48 (match_op_dup 3=0A= + [(match_dup 2) (const_int 0)])=0A= + (match_dup 1))=0A= + (match_dup 0)))])])=0A= +=0A= (define_expand "addcarry_0"=0A= [(parallel=0A= [(set (reg:CCC FLAGS_REG)=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr110104.c = b/gcc/testsuite/gcc.target/i386/pr110104.c=0A= new file mode 100644=0A= index 0000000..bd814f3=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr110104.c=0A= @@ -0,0 +1,16 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +typedef unsigned long long u64;=0A= +typedef unsigned __int128 u128;=0A= +void testcase1(u64 *acc, u64 a, u64 b)=0A= +{=0A= + u128 res =3D (u128)a*b;=0A= + u64 lo =3D res, hi =3D res >> 64;=0A= + unsigned char cf =3D 0;=0A= + cf =3D __builtin_ia32_addcarryx_u64 (cf, lo, acc[0], acc+0);=0A= + cf =3D __builtin_ia32_addcarryx_u64 (cf, hi, acc[1], acc+1);=0A= + cf =3D __builtin_ia32_addcarryx_u64 (cf, 0, acc[2], acc+2);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-times "movq" 1 } } */=0A= ------=_NextPart_000_030A_01D998CF.0DCA8A50--