From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 65E533838007 for ; Sun, 5 Jun 2022 11:48:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 65E533838007 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=MqhAdarw6Fz43eG7smv6bf2caIcBW1ua458bRTeOPVg=; b=JJUEy8c7xf2tP2OBjqI0ZvFIFn y5kUqkxCnDQmwR0bES7AnuX6+WJvX9vpGdOckaxvl2OdyL11wuva9d3E/aXopsm1/WGYyoAO3/jWP fVdwsfhb7QcoYBopjJ48oOlUzJoNGnhUwW+mH4jPjmVrtob4/Mk3xH3hTbzlk/vbo8/iU/3MulCbK sIwWNR2EBq5/YVy6jqZRK8zyVXiiDQKlaFihZBn/RmAnClqOUtjnXunQAjdxmb4YVGVZBfPuUs3dn 4+S2B6rXVxE956puH6bycYbWs7tJcm+Q6c/GjNBxmUA+5DmLcfb2K0gLk/Ld41x+7n+nTRS00GF3Q jmZ/pBQw==; Received: from host109-154-46-241.range109-154.btcentralplus.com ([109.154.46.241]:61950 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nxojx-0004Iq-RF; Sun, 05 Jun 2022 07:48:18 -0400 From: "Roger Sayle" To: "'Uros Bizjak'" Cc: "'GCC Patches'" References: <0af601d8772f$43ba22f0$cb2e68d0$@nextmovesoftware.com> In-Reply-To: Subject: RE: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations. Date: Sun, 5 Jun 2022 12:48:16 +0100 Message-ID: <009201d878d2$2591ff60$70b5fe20$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0093_01D878DA.87566760" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQIw0E5BZ8IWw+tei2liuEJavpNh4wFb+yb2rIUe/fA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jun 2022 11:48:21 -0000 This is a multipart message in MIME format. ------=_NextPart_000_0093_01D878DA.87566760 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Uros, Many thanks for your speedy review. This revised patch implements all three of your recommended improvements; the use of ix86_binary_operator_ok with code UNKNOWN, the removal of "n" constraints from const_int_operand predicates, and the use of force_reg (for input operands, and REG_P for output operands) to ensure that it's always safe to call gen_lowpart/gen_highpart. [If we proceed with the recent proposal to split double word=20 addition, subtraction and other operations before reload, then these new add/sub variants should be updated at the same time, but for now this patch keeps double word patterns consistent]. =20 This revised patch has been retested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without=20 --target_board=3Dunix{-m32} with no new failures. Ok for mainline? 2022-06-05 Roger Sayle Uro=C5=A1 Bizjak gcc/ChangeLog PR target/91681 * config/i386/i386.md (zero_extendditi2): New = define_insn_and_split. (*add3_doubleword_zext): New define_insn_and_split. (*sub3_doubleword_zext): New define_insn_and_split. (*concat3_1): New define_insn_and_split replacing previous define_split for implementing DST =3D (HI<<32)|LO as pair of move instructions, setting lopart and hipart. (*concat3_2): Likewise. (*concat3_3): Likewise, where HI is zero_extended. (*concat3_4): Likewise, where HI is zero_extended. * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP = unspec. (vec_pack_trunc_): Likewise. gcc/testsuite/ChangeLog PR target/91681 * g++.target/i386/pr91681.C: New test case (from the PR). * gcc.target/i386/pr91681-1.c: New int128 test case. * gcc.target/i386/pr91681-2.c: Likewise. * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. Thanks again, Roger -- > -----Original Message----- > From: Uros Bizjak > Sent: 03 June 2022 11:08 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for = more > optimizations. >=20 > On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle = > wrote: > > > > > > Technically, PR target/91681 has already been resolved; we now > > recognize the highpart multiplication at the tree-level, we no = longer > > use the stack, and we currently generate the same number of > > instructions as LLVM. However, it is still possible to do better, = the > > current x86_64 code to generate a double word addition of a zero = extended > operand, looks like: > > > > xorl %r11d, %r11d > > addq %r10, %rax > > adcq %r11, %rdx > > > > when it's possible (as LLVM does) to use an immediate constant: > > > > addq %r10, %rax > > adcq $0, %rdx > > > > To do this, the backend required one or two simple changes, that = then > > themselves required one or two more obscure tweaks. > > > > The simple starting point is to define a zero_extendditi2 pattern, = for > > zero extension from DImode to TImode on TARGET_64BIT that is split > > after reload. Double word (TImode) addition/subtraction is split > > after reload, so that constrains when things should happen. > > > > With zero extension now visible to combine, we add two new > > define_insn_and_split that add/subtract a zero extended operand in > > double word mode. These apply to both 32-bit and 64-bit code > > generation, to produce adc $0 and sbb $0. > > > > The first strange tweak is that these new patterns interfere with = the > > optimization that recognizes DW:DI =3D (HI:SI<<32)+LO:SI as a pair = of > > register moves, or more accurately the combine splitter no longer > > triggers as we're now converting two instructions into two > > instructions (not three instructions into two instructions). This = is > > easily repaired (and extended to handle TImode) by changing from a > > pair of define_split (that handle operand commutativity) to a set of > > four define_insn_and_split (again to handle operand commutativity). > > > > The other/final strange tweak that the above splitters now interfere > > with AVX512's kunpckdq instruction which is defined as identical = RTL, > > DW:DI =3D (HI:SI<<32)|zero_extend(LO:SI). To distinguish this, and = also > > avoid AVX512 mask registers being used by reload to perform SImode > > scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) to the > > unpack mask operations, which matches what sse.md does for the other > > mask specific (logic) operations. > > > > This patch has been tested on x86_64-pc-linux-gnu with make = bootstrap > > and make -k check, both with and without = --target_board=3Dunix{-m32}, > > with no new failures. Ok for mainline? > > > > > > 2022-06-03 Roger Sayle > > > > gcc/ChangeLog > > PR target/91681 > > * config/i386/i386.md (zero_extendditi2): New = define_insn_and_split. > > (*add3_doubleword_zext): New define_insn_and_split. > > (*sub3_doubleword_zext): New define_insn_and_split. > > (*concat3_1): New define_insn_and_split replacing > > previous define_split for implementing DST =3D (HI<<32)|LO = as > > pair of move instructions, setting lopart and hipart. > > (*concat3_2): Likewise. > > (*concat3_3): Likewise, where HI is = zero_extended. > > (*concat3_4): Likewise, where HI is = zero_extended. > > * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. > > (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. > > (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. > > (vec_pack_trunc_qi): Update to specify required = UNSPEC_MASKOP > > unspec. > > (vec_pack_trunc_): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/91681 > > * g++.target/i386/pr91681.C: New test case (from the PR). > > * gcc.target/i386/pr91681-1.c: New int128 test case. > > * gcc.target/i386/pr91681-2.c: Likewise. > > * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. >=20 > + "MEM_P (operands[0]) ? rtx_equal_p (operands[0], operands[1]) && > + !MEM_P (operands[2]) > + : !MEM_P (operands[1])" >=20 > Can we use ix86_binary_operator_ok (UNKNOWN, ...mode..., operands) = here > instead? >=20 > (UNKNOWN RTX code is used to prevent unwanted optimization with > commutative operands). >=20 > Uros. ------=_NextPart_000_0093_01D878DA.87566760 Content-Type: text/plain; name="patchzt5.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchzt5.txt" diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index 602dfa7..b3d2c90 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -4325,6 +4325,16 @@=0A= (set_attr "type" "imovx,mskmov,mskmov")=0A= (set_attr "mode" "SI,QI,QI")])=0A= =0A= +(define_insn_and_split "zero_extendditi2"=0A= + [(set (match_operand:TI 0 "nonimmediate_operand" "=3Dr,o")=0A= + (zero_extend:TI (match_operand:DI 1 "nonimmediate_operand" "rm,r")))]=0A= + "TARGET_64BIT"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(set (match_dup 3) (match_dup 1))=0A= + (set (match_dup 4) (const_int 0))]=0A= + "split_double_mode (TImode, &operands[0], 1, &operands[3], = &operands[4]);")=0A= +=0A= ;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l.=0A= (define_peephole2=0A= [(parallel [(set (match_operand:SWI48 0 "general_reg_operand")=0A= @@ -6453,6 +6463,31 @@=0A= [(set_attr "type" "alu")=0A= (set_attr "mode" "QI")])=0A= =0A= +(define_insn_and_split "*add3_doubleword_zext"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dr,o")=0A= + (plus:=0A= + (zero_extend:=0A= + (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) =0A= + (match_operand: 1 "nonimmediate_operand" "0,0")))=0A= + (clobber (reg:CC FLAGS_REG))]=0A= + "ix86_binary_operator_ok (UNKNOWN, mode, operands)"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(parallel [(set (reg:CCC FLAGS_REG)=0A= + (compare:CCC=0A= + (plus:DWIH (match_dup 1) (match_dup 2))=0A= + (match_dup 1)))=0A= + (set (match_dup 0)=0A= + (plus:DWIH (match_dup 1) (match_dup 2)))])=0A= + (parallel [(set (match_dup 3)=0A= + (plus:DWIH=0A= + (plus:DWIH=0A= + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))=0A= + (match_dup 4))=0A= + (const_int 0)))=0A= + (clobber (reg:CC FLAGS_REG))])]=0A= + "split_double_mode (mode, &operands[0], 2, &operands[0], = &operands[3]);")=0A= +=0A= ;; Like DWI, but use POImode instead of OImode.=0A= (define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI = "POI")])=0A= =0A= @@ -6903,6 +6938,29 @@=0A= }=0A= })=0A= =0A= +(define_insn_and_split "*sub3_doubleword_zext"=0A= + [(set (match_operand: 0 "nonimmediate_operand" "=3Dr,o")=0A= + (minus:=0A= + (match_operand: 1 "nonimmediate_operand" "0,0")=0A= + (zero_extend:=0A= + (match_operand:DWIH 2 "nonimmediate_operand" "rm,r"))))=0A= + (clobber (reg:CC FLAGS_REG))]=0A= + "ix86_binary_operator_ok (UNKNOWN, mode, operands)"=0A= + "#"=0A= + "&& reload_completed"=0A= + [(parallel [(set (reg:CC FLAGS_REG)=0A= + (compare:CC (match_dup 1) (match_dup 2)))=0A= + (set (match_dup 0)=0A= + (minus:DWIH (match_dup 1) (match_dup 2)))])=0A= + (parallel [(set (match_dup 3)=0A= + (minus:DWIH=0A= + (minus:DWIH=0A= + (match_dup 4)=0A= + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))=0A= + (const_int 0)))=0A= + (clobber (reg:CC FLAGS_REG))])]=0A= + "split_double_mode (mode, &operands[0], 2, &operands[0], = &operands[3]);")=0A= +=0A= (define_insn "*sub_1"=0A= [(set (match_operand:SWI 0 "nonimmediate_operand" "=3Dm,")=0A= (minus:SWI=0A= @@ -10956,34 +11014,82 @@=0A= =0A= ;; Split DST =3D (HI<<32)|LO early to minimize register usage.=0A= (define_code_iterator any_or_plus [plus ior xor])=0A= -(define_split=0A= - [(set (match_operand:DI 0 "register_operand")=0A= - (any_or_plus:DI=0A= - (ashift:DI (match_operand:DI 1 "register_operand")=0A= - (const_int 32))=0A= - (zero_extend:DI (match_operand:SI 2 "register_operand"))))]=0A= - "!TARGET_64BIT"=0A= - [(set (match_dup 3) (match_dup 4))=0A= - (set (match_dup 5) (match_dup 2))]=0A= +(define_insn_and_split "*concat3_1"=0A= + [(set (match_operand: 0 "register_operand" "=3Dr")=0A= + (any_or_plus:=0A= + (ashift: (match_operand: 1 "register_operand" "r")=0A= + (match_operand: 2 "const_int_operand"))=0A= + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))]=0A= + "INTVAL (operands[2]) =3D=3D * BITS_PER_UNIT=0A= + && REG_P (operands[0])=0A= + && ix86_pre_reload_split ()"=0A= + "#"=0A= + "&& 1"=0A= + [(set (match_dup 4) (match_dup 3))=0A= + (set (match_dup 5) (match_dup 6))]=0A= +{=0A= + operands[1] =3D force_reg (mode, operands[1]);=0A= + operands[4] =3D gen_lowpart (mode, operands[0]);=0A= + operands[5] =3D gen_highpart (mode, operands[0]);=0A= + operands[6] =3D gen_lowpart (mode, operands[1]);=0A= +})=0A= +=0A= +(define_insn_and_split "*concat3_2"=0A= + [(set (match_operand: 0 "register_operand" "=3Dr")=0A= + (any_or_plus:=0A= + (zero_extend: (match_operand:DWIH 1 "register_operand" "r"))=0A= + (ashift: (match_operand: 2 "register_operand" "r")=0A= + (match_operand: 3 "const_int_operand"))))]=0A= + "INTVAL (operands[3]) =3D=3D * BITS_PER_UNIT=0A= + && REG_P (operands[0])=0A= + && ix86_pre_reload_split ()"=0A= + "#"=0A= + "&& 1"=0A= + [(set (match_dup 4) (match_dup 1))=0A= + (set (match_dup 5) (match_dup 6))]=0A= +{=0A= + operands[2] =3D force_reg (mode, operands[2]);=0A= + operands[4] =3D gen_lowpart (mode, operands[0]);=0A= + operands[5] =3D gen_highpart (mode, operands[0]);=0A= + operands[6] =3D gen_lowpart (mode, operands[2]);=0A= +})=0A= +=0A= +(define_insn_and_split "*concat3_3"=0A= + [(set (match_operand: 0 "register_operand" "=3Dr")=0A= + (any_or_plus:=0A= + (ashift:=0A= + (zero_extend: (match_operand:DWIH 1 "register_operand" "r"))=0A= + (match_operand: 2 "const_int_operand"))=0A= + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))]=0A= + "INTVAL (operands[2]) =3D=3D * BITS_PER_UNIT=0A= + && REG_P (operands[0])=0A= + && ix86_pre_reload_split ()"=0A= + "#"=0A= + "&& 1"=0A= + [(set (match_dup 4) (match_dup 3))=0A= + (set (match_dup 5) (match_dup 1))]=0A= {=0A= - operands[3] =3D gen_highpart (SImode, operands[0]);=0A= - operands[4] =3D gen_lowpart (SImode, operands[1]);=0A= - operands[5] =3D gen_lowpart (SImode, operands[0]);=0A= + operands[4] =3D gen_lowpart (mode, operands[0]);=0A= + operands[5] =3D gen_highpart (mode, operands[0]);=0A= })=0A= =0A= -(define_split=0A= - [(set (match_operand:DI 0 "register_operand")=0A= - (any_or_plus:DI=0A= - (zero_extend:DI (match_operand:SI 1 "register_operand"))=0A= - (ashift:DI (match_operand:DI 2 "register_operand")=0A= - (const_int 32))))]=0A= - "!TARGET_64BIT"=0A= - [(set (match_dup 3) (match_dup 4))=0A= - (set (match_dup 5) (match_dup 1))]=0A= +(define_insn_and_split "*concat3_4"=0A= + [(set (match_operand: 0 "register_operand" "=3Dr")=0A= + (any_or_plus:=0A= + (zero_extend: (match_operand:DWIH 1 "register_operand" "r"))=0A= + (ashift:=0A= + (zero_extend: (match_operand:DWIH 2 "register_operand" "r"))=0A= + (match_operand: 3 "const_int_operand"))))]=0A= + "INTVAL (operands[3]) =3D=3D * BITS_PER_UNIT=0A= + && REG_P (operands[0])=0A= + && ix86_pre_reload_split ()"=0A= + "#"=0A= + "&& 1"=0A= + [(set (match_dup 4) (match_dup 1))=0A= + (set (match_dup 5) (match_dup 2))]=0A= {=0A= - operands[3] =3D gen_highpart (SImode, operands[0]);=0A= - operands[4] =3D gen_lowpart (SImode, operands[2]);=0A= - operands[5] =3D gen_lowpart (SImode, operands[0]);=0A= + operands[4] =3D gen_lowpart (mode, operands[0]);=0A= + operands[5] =3D gen_highpart (mode, operands[0]);=0A= })=0A= =0C=0A= ;; Negation instructions=0A= diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md=0A= index 8b2602b..0198156 100644=0A= --- a/gcc/config/i386/sse.md=0A= +++ b/gcc/config/i386/sse.md=0A= @@ -2070,7 +2070,8 @@=0A= (ashift:HI=0A= (zero_extend:HI (match_operand:QI 1 "register_operand" "k"))=0A= (const_int 8))=0A= - (zero_extend:HI (match_operand:QI 2 "register_operand" "k"))))]=0A= + (zero_extend:HI (match_operand:QI 2 "register_operand" "k"))))=0A= + (unspec [(const_int 0)] UNSPEC_MASKOP)]=0A= "TARGET_AVX512F"=0A= "kunpckbw\t{%2, %1, %0|%0, %1, %2}"=0A= [(set_attr "mode" "HI")=0A= @@ -2083,7 +2084,8 @@=0A= (ashift:SI=0A= (zero_extend:SI (match_operand:HI 1 "register_operand" "k"))=0A= (const_int 16))=0A= - (zero_extend:SI (match_operand:HI 2 "register_operand" "k"))))]=0A= + (zero_extend:SI (match_operand:HI 2 "register_operand" "k"))))=0A= + (unspec [(const_int 0)] UNSPEC_MASKOP)]=0A= "TARGET_AVX512BW"=0A= "kunpckwd\t{%2, %1, %0|%0, %1, %2}"=0A= [(set_attr "mode" "SI")])=0A= @@ -2094,7 +2096,8 @@=0A= (ashift:DI=0A= (zero_extend:DI (match_operand:SI 1 "register_operand" "k"))=0A= (const_int 32))=0A= - (zero_extend:DI (match_operand:SI 2 "register_operand" "k"))))]=0A= + (zero_extend:DI (match_operand:SI 2 "register_operand" "k"))))=0A= + (unspec [(const_int 0)] UNSPEC_MASKOP)]=0A= "TARGET_AVX512BW"=0A= "kunpckdq\t{%2, %1, %0|%0, %1, %2}"=0A= [(set_attr "mode" "DI")])=0A= @@ -17398,21 +17401,26 @@=0A= })=0A= =0A= (define_expand "vec_pack_trunc_qi"=0A= - [(set (match_operand:HI 0 "register_operand")=0A= - (ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 2 = "register_operand"))=0A= - (const_int 8))=0A= - (zero_extend:HI (match_operand:QI 1 "register_operand"))))]=0A= + [(parallel=0A= + [(set (match_operand:HI 0 "register_operand")=0A= + (ior:HI=0A= + (ashift:HI (zero_extend:HI (match_operand:QI 2 "register_operand"))=0A= + (const_int 8))=0A= + (zero_extend:HI (match_operand:QI 1 "register_operand"))))=0A= + (unspec [(const_int 0)] UNSPEC_MASKOP)])]=0A= "TARGET_AVX512F")=0A= =0A= (define_expand "vec_pack_trunc_"=0A= - [(set (match_operand: 0 "register_operand")=0A= - (ior:=0A= - (ashift:=0A= + [(parallel=0A= + [(set (match_operand: 0 "register_operand")=0A= + (ior:=0A= + (ashift:=0A= + (zero_extend:=0A= + (match_operand:SWI24 2 "register_operand"))=0A= + (match_dup 3))=0A= (zero_extend:=0A= - (match_operand:SWI24 2 "register_operand"))=0A= - (match_dup 3))=0A= - (zero_extend:=0A= - (match_operand:SWI24 1 "register_operand"))))]=0A= + (match_operand:SWI24 1 "register_operand"))))=0A= + (unspec [(const_int 0)] UNSPEC_MASKOP)])]=0A= "TARGET_AVX512BW"=0A= {=0A= operands[3] =3D GEN_INT (GET_MODE_BITSIZE (mode));=0A= diff --git a/gcc/testsuite/g++.target/i386/pr91681.C = b/gcc/testsuite/g++.target/i386/pr91681.C=0A= new file mode 100644=0A= index 0000000..0271e43=0A= --- /dev/null=0A= +++ b/gcc/testsuite/g++.target/i386/pr91681.C=0A= @@ -0,0 +1,20 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +void multiply128x64x2_3 ( =0A= + const unsigned long a, =0A= + const unsigned long b, =0A= + const unsigned long c, =0A= + const unsigned long d, =0A= + __uint128_t o[2])=0A= +{=0A= + __uint128_t B0 =3D (__uint128_t) b * c;=0A= + __uint128_t B2 =3D (__uint128_t) a * c;=0A= + __uint128_t B1 =3D (__uint128_t) b * d;=0A= + __uint128_t B3 =3D (__uint128_t) a * d;=0A= +=0A= + o[0] =3D B2 + (B0 >> 64);=0A= + o[1] =3D B3 + (B1 >> 64);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr91681-1.c = b/gcc/testsuite/gcc.target/i386/pr91681-1.c=0A= new file mode 100644=0A= index 0000000..ab83cc4=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr91681-1.c=0A= @@ -0,0 +1,20 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +unsigned __int128 m;=0A= +=0A= +unsigned __int128 foo(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + return x + y;=0A= +}=0A= +=0A= +void bar(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + m =3D x + y;=0A= +}=0A= +=0A= +void baz(unsigned long long y)=0A= +{=0A= + m +=3D y;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr91681-2.c = b/gcc/testsuite/gcc.target/i386/pr91681-2.c=0A= new file mode 100644=0A= index 0000000..ea52c72=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr91681-2.c=0A= @@ -0,0 +1,20 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +unsigned __int128 m;=0A= +=0A= +unsigned __int128 foo(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + return x - y;=0A= +}=0A= +=0A= +void bar(unsigned __int128 x, unsigned long long y)=0A= +{=0A= + m =3D x - y;=0A= +}=0A= +=0A= +void baz(unsigned long long y)=0A= +{=0A= + m -=3D y;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr91681-3.c = b/gcc/testsuite/gcc.target/i386/pr91681-3.c=0A= new file mode 100644=0A= index 0000000..22a03c2=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr91681-3.c=0A= @@ -0,0 +1,16 @@=0A= +/* { dg-do compile { target ia32 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +unsigned long long m;=0A= +=0A= +unsigned long long foo(unsigned long long x, unsigned int y)=0A= +{=0A= + return x - y;=0A= +}=0A= +=0A= +void bar(unsigned long long x, unsigned int y)=0A= +{=0A= + m =3D x - y;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xor" } } */=0A= ------=_NextPart_000_0093_01D878DA.87566760--