From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 3D9613858C50 for ; Sat, 23 Jul 2022 08:51:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3D9613858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=kmEl1Kvv8Imh1J++LTPKOsW/1st6oeXjRyhSJctlHMs=; b=MgnxUWw1wbDS+A2E8B3fHeqA8I IsulR8ErAFdgllUEBuMkWh7p3dtsudxljUPjsvmfHU1hXUG+1EeApBrlFBnyGYPh7IXxQLRU9Jyt7 WywufhjMZ3+uIqyurwDvxXmuMuZF9yq2UvKCgLKx3NjLsqTrMa1BAJWY342efkjOjS3ZMc0jss9U+ ckoFjabpuxZooZaisSpOSbT4BtzKf6DAfg/tKr2ohZIBF1fNhoarAFxkm51NnCKud0lGxK80GFnRe hdlnXP+E5eYUGUVlOiXvXs+eqxrqbVYnDH/oixmnesu+AXt6YDPcmb+kvZrBxKggRQGPiT4nw4iHj 9Ehdz4aQ==; Received: from host109-154-33-170.range109-154.btcentralplus.com ([109.154.33.170]:55416 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oFAqx-0001qV-Bh; Sat, 23 Jul 2022 04:51:15 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [x86 PATCH take #3] PR target/91681: zero_extendditi2 pattern for more optimizations. Date: Sat, 23 Jul 2022 09:51:11 +0100 Message-ID: <06bb01d89e71$5d6283a0$18278ae0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_06BC_01D89E79.BF2BA690" X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adieb2U2ZUf87U+NQuSXVFgQzV/7CQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jul 2022 08:51:19 -0000 This is a multipart message in MIME format. ------=_NextPart_000_06BC_01D89E79.BF2BA690 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable =20 Hi Uros, This is the next iteration of the zero_extendditi2 patch last reviewed = here: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596204.html =20 [1] The sse.md changes were split out, reviewed, approved and committed. [2] The *concat splitters have been moved post-reload matching what we now do for many/most of the double word functionality. [3] As you recommend, these *concat splitters now use split_double_mode to "subreg" operand[0] into parts, via a new helper function that can = also handle overlapping registers, and even use xchg for the rare case that a double word is constructed from its high and low parts, but the wrong way around. =20 This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without -target_board=3Dunix{-m32}, with no new failures. Ok for mainline? =20 2022-07-23 Roger Sayle Uro=B9 Bizjak =20 gcc/ChangeLog PR target/91681 * config/i386/i386-expand.cc (split_double_concat): A new helper function for setting a double word value from two word values. * config/i386/i386-protos.h (split_double_concat): Prototype = here. * config/i386/i386.md (zero_extendditi2): New = define_insn_and_split. (*add3_doubleword_zext): New define_insn_and_split. (*sub3_doubleword_zext): New define_insn_and_split. (*concat3_1): New define_insn_and_split replacing previous define_split for implementing DST =3D (HI<<32)|LO as pair of move instructions, setting lopart and hipart. (*concat3_2): Likewise. (*concat3_3): Likewise, where HI is zero_extended. (*concat3_4): Likewise, where HI is zero_extended. =20 gcc/testsuite/ChangeLog PR target/91681 * g++.target/i386/pr91681.C: New test case (from the PR). * gcc.target/i386/pr91681-1.c: New int128 test case. * gcc.target/i386/pr91681-2.c: Likewise. * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. =20 =20 Thanks in advance, Roger -- =20 ------=_NextPart_000_06BC_01D89E79.BF2BA690 Content-Type: text/plain; name="patchzt7.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchzt7.txt" diff --git a/gcc/config/i386/i386-expand.cc = b/gcc/config/i386/i386-expand.cc=0A= index 40f821e..66d8f28 100644=0A= --- a/gcc/config/i386/i386-expand.cc=0A= +++ b/gcc/config/i386/i386-expand.cc=0A= @@ -165,6 +165,46 @@ split_double_mode (machine_mode mode, rtx = operands[],=0A= }=0A= }=0A= =0A= +/* Emit the double word assignment DST =3D { LO, HI }. */=0A= +=0A= +void=0A= +split_double_concat (machine_mode mode, rtx dst, rtx lo, rtx hi)=0A= +{=0A= + rtx dlo, dhi;=0A= + int deleted_move_count =3D 0;=0A= + split_double_mode (mode, &dst, 1, &dlo, &dhi);=0A= + if (!rtx_equal_p (dlo, hi))=0A= + {=0A= + if (!rtx_equal_p (dlo, lo))=0A= + emit_move_insn (dlo, lo);=0A= + else=0A= + deleted_move_count++;=0A= + if (!rtx_equal_p (dhi, hi))=0A= + emit_move_insn (dhi, hi);=0A= + else=0A= + deleted_move_count++;=0A= + }=0A= + else if (!rtx_equal_p (lo, dhi))=0A= + {=0A= + if (!rtx_equal_p (dhi, hi))=0A= + emit_move_insn (dhi, hi);=0A= + else=0A= + deleted_move_count++;=0A= + if (!rtx_equal_p (dlo, lo))=0A= + emit_move_insn (dlo, lo);=0A= + else=0A= + deleted_move_count++;=0A= + }=0A= + else if (mode =3D=3D TImode)=0A= + emit_insn (gen_swapdi (dlo, dhi));=0A= + else=0A= + emit_insn (gen_swapsi (dlo, dhi));=0A= +=0A= + if (deleted_move_count =3D=3D 2)=0A= + emit_note (NOTE_INSN_DELETED);=0A= +}=0A= +=0A= +=0A= /* Generate either "mov $0, reg" or "xor reg, reg", as appropriate=0A= for the target. */=0A= =0A= diff --git a/gcc/config/i386/i386-protos.h = b/gcc/config/i386/i386-protos.h=0A= index cf84775..e27c14f 100644=0A= --- a/gcc/config/i386/i386-protos.h=0A= +++ b/gcc/config/i386/i386-protos.h=0A= @@ -85,6 +85,7 @@ extern void print_reg (rtx, int, FILE*);=0A= extern void ix86_print_operand (FILE *, rtx, int);=0A= =0A= extern void split_double_mode (machine_mode, rtx[], int, rtx[], rtx[]);=0A= +extern void split_double_concat (machine_mode, rtx, rtx lo, rtx);=0A= =0A= extern const char *output_set_got (rtx, rtx);=0A= extern const char *output_387_binary_op (rtx_insn *, rtx*);=0A= diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index 9aaeb69..4560681 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -4379,6 +4379,16 @@=0A= (set_attr "type" "imovx,mskmov,mskmov")=0A= (set_attr "mode" "SI,QI,QI")])=0A= =0A= +(define_insn_and_split "zero_extendditi2"=0A= + [(set (match_operand:TI 0 "nonimmediate_operand" "=3Dr,o")=0A= + (zero_extend:TI (match_operand:DI 1 "nonimmediate_operand" "rm,r")))]=0A= + "TARGET_64BIT"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(set (match_dup 3) (match_dup 1))=0A= + (set (match_dup 4) (const_int 0))]=0A= + "split_double_mode (TImode, &operands[0], 1, &operands[3], = &operands[4]);")=0A= +=0A= ;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l.=0A= (define_peephole2=0A= [(parallel [(set (match_operand:SWI48 0 "general_reg_operand")=0A= @@ -6512,6 +6522,31 @@=0A= [(set_attr "type" "alu")=0A= (set_attr "mode" "QI")])=0A= =0A= +(define_insn_and_split "*add3_doubleword_zext"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dr,o")=0A= + (plus:=0A= + (zero_extend:=0A= + (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) =0A= + (match_operand: 1 "nonimmediate_operand" "0,0")))=0A= + (clobber (reg:CC FLAGS_REG))]=0A= + "ix86_binary_operator_ok (UNKNOWN, mode, operands)"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(parallel [(set (reg:CCC FLAGS_REG)=0A= + (compare:CCC=0A= + (plus:DWIH (match_dup 1) (match_dup 2))=0A= + (match_dup 1)))=0A= + (set (match_dup 0)=0A= + (plus:DWIH (match_dup 1) (match_dup 2)))])=0A= + (parallel [(set (match_dup 3)=0A= + (plus:DWIH=0A= + (plus:DWIH=0A= + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))=0A= + (match_dup 4))=0A= + (const_int 0)))=0A= + (clobber (reg:CC FLAGS_REG))])]=0A= + "split_double_mode (mode, &operands[0], 2, &operands[0], = &operands[3]);")=0A= +=0A= ;; Like DWI, but use POImode instead of OImode.=0A= (define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI = "POI")])=0A= =0A= @@ -6962,6 +6997,29 @@=0A= }=0A= })=0A= =0A= +(define_insn_and_split "*sub3_doubleword_zext"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dr,o")=0A= + (minus:=0A= + (match_operand: 1 "nonimmediate_operand" "0,0")=0A= + (zero_extend:=0A= + (match_operand:DWIH 2 "nonimmediate_operand" "rm,r"))))=0A= + (clobber (reg:CC FLAGS_REG))]=0A= + "ix86_binary_operator_ok (UNKNOWN, mode, operands)"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(parallel [(set (reg:CC FLAGS_REG)=0A= + (compare:CC (match_dup 1) (match_dup 2)))=0A= + (set (match_dup 0)=0A= + (minus:DWIH (match_dup 1) (match_dup 2)))])=0A= + (parallel [(set (match_dup 3)=0A= + (minus:DWIH=0A= + (minus:DWIH=0A= + (match_dup 4)=0A= + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))=0A= + (const_int 0)))=0A= + (clobber (reg:CC FLAGS_REG))])]=0A= + "split_double_mode (mode, &operands[0], 2, &operands[0], = &operands[3]);")=0A= +=0A= (define_insn "*sub_1"=0A= [(set (match_operand:SWI 0 "nonimmediate_operand" "=3Dm,")=0A= (minus:SWI=0A= @@ -11111,34 +11169,68 @@=0A= =0A= ;; Split DST =3D (HI<<32)|LO early to minimize register usage.=0A= (define_code_iterator any_or_plus [plus ior xor])=0A= -(define_split=0A= - [(set (match_operand:DI 0 "register_operand")=0A= - (any_or_plus:DI=0A= - (ashift:DI (match_operand:DI 1 "register_operand")=0A= - (const_int 32))=0A= - (zero_extend:DI (match_operand:SI 2 "register_operand"))))]=0A= - "!TARGET_64BIT"=0A= - [(set (match_dup 3) (match_dup 4))=0A= - (set (match_dup 5) (match_dup 2))]=0A= +(define_insn_and_split "*concat3_1"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dro")=0A= + (any_or_plus:=0A= + (ashift: (match_operand: 1 "register_operand" "r")=0A= + (match_operand: 2 "const_int_operand"))=0A= + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))]=0A= + "INTVAL (operands[2]) =3D=3D * BITS_PER_UNIT"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(clobber (const_int 0))]=0A= {=0A= - operands[3] =3D gen_highpart (SImode, operands[0]);=0A= - operands[4] =3D gen_lowpart (SImode, operands[1]);=0A= - operands[5] =3D gen_lowpart (SImode, operands[0]);=0A= + split_double_concat (mode, operands[0], operands[3],=0A= + gen_lowpart (mode, operands[1]));=0A= + DONE;=0A= })=0A= =0A= -(define_split=0A= - [(set (match_operand:DI 0 "register_operand")=0A= - (any_or_plus:DI=0A= - (zero_extend:DI (match_operand:SI 1 "register_operand"))=0A= - (ashift:DI (match_operand:DI 2 "register_operand")=0A= - (const_int 32))))]=0A= - "!TARGET_64BIT"=0A= - [(set (match_dup 3) (match_dup 4))=0A= - (set (match_dup 5) (match_dup 1))]=0A= +(define_insn_and_split "*concat3_2"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dro")=0A= + (any_or_plus:=0A= + (zero_extend: (match_operand:DWIH 1 "register_operand" "r"))=0A= + (ashift: (match_operand: 2 "register_operand" "r")=0A= + (match_operand: 3 "const_int_operand"))))]=0A= + "INTVAL (operands[3]) =3D=3D * BITS_PER_UNIT"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(clobber (const_int 0))]=0A= +{=0A= + split_double_concat (mode, operands[0], operands[1],=0A= + gen_lowpart (mode, operands[2]));=0A= + DONE;=0A= +})=0A= +=0A= +(define_insn_and_split "*concat3_3"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dro")=0A= + (any_or_plus:=0A= + (ashift:=0A= + (zero_extend: (match_operand:DWIH 1 "register_operand" "r"))=0A= + (match_operand: 2 "const_int_operand"))=0A= + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))]=0A= + "INTVAL (operands[2]) =3D=3D * BITS_PER_UNIT"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(clobber (const_int 0))]=0A= {=0A= - operands[3] =3D gen_highpart (SImode, operands[0]);=0A= - operands[4] =3D gen_lowpart (SImode, operands[2]);=0A= - operands[5] =3D gen_lowpart (SImode, operands[0]);=0A= + split_double_concat (mode, operands[0], operands[3], = operands[1]);=0A= + DONE;=0A= +})=0A= +=0A= +(define_insn_and_split "*concat3_4"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dro")=0A= + (any_or_plus:=0A= + (zero_extend: (match_operand:DWIH 1 "register_operand" "r"))=0A= + (ashift:=0A= + (zero_extend: (match_operand:DWIH 2 "register_operand" "r"))=0A= + (match_operand: 3 "const_int_operand"))))]=0A= + "INTVAL (operands[3]) =3D=3D * BITS_PER_UNIT"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(clobber (const_int 0))]=0A= +{=0A= + split_double_concat (mode, operands[0], operands[1], = operands[2]);=0A= + DONE;=0A= })=0A= =0C=0A= ;; Negation instructions=0A= diff --git a/gcc/testsuite/g++.target/i386/pr91681.C = b/gcc/testsuite/g++.target/i386/pr91681.C=0A= new file mode 100644=0A= index 0000000..0271e43=0A= --- /dev/null=0A= +++ b/gcc/testsuite/g++.target/i386/pr91681.C=0A= @@ -0,0 +1,20 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +void multiply128x64x2_3 ( =0A= + const unsigned long a, =0A= + const unsigned long b, =0A= + const unsigned long c, =0A= + const unsigned long d, =0A= + __uint128_t o[2])=0A= +{=0A= + __uint128_t B0 =3D (__uint128_t) b * c;=0A= + __uint128_t B2 =3D (__uint128_t) a * c;=0A= + __uint128_t B1 =3D (__uint128_t) b * d;=0A= + __uint128_t B3 =3D (__uint128_t) a * d;=0A= +=0A= + o[0] =3D B2 + (B0 >> 64);=0A= + o[1] =3D B3 + (B1 >> 64);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr91681-1.c = b/gcc/testsuite/gcc.target/i386/pr91681-1.c=0A= new file mode 100644=0A= index 0000000..ab83cc4=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr91681-1.c=0A= @@ -0,0 +1,20 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +unsigned __int128 m;=0A= +=0A= +unsigned __int128 foo(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + return x + y;=0A= +}=0A= +=0A= +void bar(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + m =3D x + y;=0A= +}=0A= +=0A= +void baz(unsigned long long y)=0A= +{=0A= + m +=3D y;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr91681-2.c = b/gcc/testsuite/gcc.target/i386/pr91681-2.c=0A= new file mode 100644=0A= index 0000000..ea52c72=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr91681-2.c=0A= @@ -0,0 +1,20 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +unsigned __int128 m;=0A= +=0A= +unsigned __int128 foo(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + return x - y;=0A= +}=0A= +=0A= +void bar(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + m =3D x - y;=0A= +}=0A= +=0A= +void baz(unsigned long long y)=0A= +{=0A= + m -=3D y;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr91681-3.c = b/gcc/testsuite/gcc.target/i386/pr91681-3.c=0A= new file mode 100644=0A= index 0000000..22a03c2=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr91681-3.c=0A= @@ -0,0 +1,16 @@=0A= +/* { dg-do compile { target ia32 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +unsigned long long m;=0A= +=0A= +unsigned long long foo(unsigned long long x, unsigned int y)=0A= +{=0A= + return x - y;=0A= +}=0A= +=0A= +void bar(unsigned long long x, unsigned int y)=0A= +{=0A= + m =3D x - y;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= ------=_NextPart_000_06BC_01D89E79.BF2BA690--