From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 779CA3858416 for ; Fri, 22 Oct 2021 07:19:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 779CA3858416 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=ViZOsEI6jJhTJXFGXL7KAj6u1zK70h1LJ6nNJ/S3daE=; b=nFwp4Sk7ddZXB7efLTYx28kZc5 c0LTT0VLuEsAt9LGf8KQxntB3Knxoi78oVBpQU6Oi3bkBSK11C7oMa0Cnpzt+OqqVb4pIrAa85/KZ pO6uI5n9CQitOXbMYhDyarowpbTjqQFajK1ZQOg29c5RCeCsmTMPl9TNFhNaApbJ9jn1QvpyO3cpr s0gACP7HCx71RQe2VflB6P3CKZYDO1MkEtNCrPs3kPLoSIqQyNOr4IK+aUOLY0gUkzeiiCnxnTkn3 KgdLkMZ9fnAuXo2MpFCBxj6r53qcwoKI2ptBKDigsudgaSI/QOYxiIY7k0rCTzLE8Sk+cFWzUafTl SJaDJ3bQ==; Received: from host86-163-35-115.range86-163.btcentralplus.com ([86.163.35.115]:51454 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mdopd-0005n6-TE; Fri, 22 Oct 2021 03:19:14 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] x86_64: Add insn patterns for V1TI mode logic operations. Date: Fri, 22 Oct 2021 08:19:10 +0100 Message-ID: <002f01d7c715$1cc96400$565c2c00$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0030_01D7C71D.7E9015F0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdfHFFoya7hYWHs0SFmGK5ZMxSkfDA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2021 07:19:18 -0000 This is a multipart message in MIME format. ------=_NextPart_000_0030_01D7C71D.7E9015F0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE register (where regular TI mode uses a pair of 64-bit general purpose scalar registers). This patch improves the implementation of AND, IOR, XOR and NOT on these values. The benefit is demonstrated by the following simple test program: typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16))); v1ti and(v1ti x, v1ti y) { return x & y; } v1ti ior(v1ti x, v1ti y) { return x | y; } v1ti xor(v1ti x, v1ti y) { return x ^ y; } v1ti not(v1ti x) { return ~x; } For which GCC currently generates the rather large: and: movdqa %xmm0, %xmm2 movq %xmm1, %rdx movq %xmm0, %rax andq %rdx, %rax movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 movq %rax, %xmm0 movq %xmm4, %rdx movq %xmm3, %rax andq %rdx, %rax movq %rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret ior: movdqa %xmm0, %xmm2 movq %xmm1, %rdx movq %xmm0, %rax orq %rdx, %rax movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 movq %rax, %xmm0 movq %xmm4, %rdx movq %xmm3, %rax orq %rdx, %rax movq %rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret xor: movdqa %xmm0, %xmm2 movq %xmm1, %rdx movq %xmm0, %rax xorq %rdx, %rax movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 movq %rax, %xmm0 movq %xmm4, %rdx movq %xmm3, %rax xorq %rdx, %rax movq %rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret not: movdqa %xmm0, %xmm1 movq %xmm0, %rax notq %rax movhlps %xmm1, %xmm2 movq %rax, %xmm0 movq %xmm2, %rax notq %rax movq %rax, %xmm3 punpcklqdq %xmm3, %xmm0 ret with this patch we now generate the much more efficient: and: pand %xmm1, %xmm0 ret ior: por %xmm1, %xmm0 ret xor: pxor %xmm1, %xmm0 ret not: pcmpeqd %xmm1, %xmm1 pxor %xmm1, %xmm0 ret For my first few attempts at this patch I tried adding V1TI to the existing VI and VI12_AVX_512F mode iterators, but these then have dependencies on other iterators (and attributes), and so on until everything ties itself into a knot, as V1TI mode isn't really a first-class vector mode on x86_64. Hence I ultimately opted to use simple stand-alone patterns (as used by the existing TF mode support). This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-10-22 Roger Sayle gcc/ChangeLog * config/i386/sse.md (v1ti3): New define_insn to implement V1TImode AND, IOR and XOR on TARGET_SSE2 (and above). (one_cmplv1ti2): New define expand. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-logic.c: New test case. * gcc.target/i386/sse2-v1ti-logic-2.c: New test case. Thanks in advance, Roger -- ------=_NextPart_000_0030_01D7C71D.7E9015F0 Content-Type: text/plain; name="patchv.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchv.txt" diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md=0A= index fbf056b..f37c5c0 100644=0A= --- a/gcc/config/i386/sse.md=0A= +++ b/gcc/config/i386/sse.md=0A= @@ -16268,6 +16268,31 @@=0A= ]=0A= (const_string "")))])=0A= =0A= +(define_insn "v1ti3"=0A= + [(set (match_operand:V1TI 0 "register_operand" "=3Dx,x,v")=0A= + (any_logic:V1TI=0A= + (match_operand:V1TI 1 "register_operand" "%0,x,v")=0A= + (match_operand:V1TI 2 "vector_operand" "xBm,xm,vm")))]=0A= + "TARGET_SSE2"=0A= + "@=0A= + p\t{%2, %0|%0, %2}=0A= + vp\t{%2, %1, %0|%0, %1, %2}=0A= + vp\t{%2, %1, %0|%0, %1, %2}"=0A= + [(set_attr "isa" "noavx,avx,avx")=0A= + (set_attr "prefix" "orig,vex,evex")=0A= + (set_attr "prefix_data16" "1,*,*")=0A= + (set_attr "type" "sselog")=0A= + (set_attr "mode" "TI")])=0A= +=0A= +(define_expand "one_cmplv1ti2"=0A= + [(set (match_operand:V1TI 0 "register_operand")=0A= + (xor:V1TI (match_operand:V1TI 1 "register_operand")=0A= + (match_dup 2)))]=0A= + "TARGET_SSE2"=0A= +{=0A= + operands[2] =3D force_reg (V1TImode, CONSTM1_RTX (V1TImode));=0A= +})=0A= +=0A= (define_mode_iterator AVX512ZEXTMASK=0A= [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI])=0A= =0A= ------=_NextPart_000_0030_01D7C71D.7E9015F0 Content-Type: text/plain; name="sse2-v1ti-logic.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sse2-v1ti-logic.c" /* { dg-do compile { target int128 } } */=0A= /* { dg-options "-O2 -msse2" } */=0A= /* { dg-require-effective-target sse2 } */=0A= =0A= typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));=0A= =0A= v1ti and(v1ti x, v1ti y)=0A= {=0A= return x & y;=0A= }=0A= =0A= v1ti ior(v1ti x, v1ti y)=0A= {=0A= return x | y;=0A= }=0A= =0A= v1ti xor(v1ti x, v1ti y)=0A= {=0A= return x ^ y;=0A= }=0A= =0A= v1ti not(v1ti x)=0A= {=0A= return ~x;=0A= }=0A= =0A= /* { dg-final { scan-assembler "pand" } } */=0A= /* { dg-final { scan-assembler "por" } } */=0A= /* { dg-final { scan-assembler-times "pxor" 2 } } */=0A= ------=_NextPart_000_0030_01D7C71D.7E9015F0 Content-Type: text/plain; name="sse2-v1ti-logic-2.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sse2-v1ti-logic-2.c" /* { dg-do compile { target int128 } } */=0A= /* { dg-options "-O2 -msse2" } */=0A= /* { dg-require-effective-target sse2 } */=0A= =0A= typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));=0A= =0A= v1ti x;=0A= v1ti y;=0A= v1ti z;=0A= =0A= void and2()=0A= {=0A= x &=3D y;=0A= }=0A= =0A= void and3()=0A= {=0A= x =3D y & z;=0A= }=0A= =0A= void ior2()=0A= {=0A= x |=3D y;=0A= }=0A= =0A= void ior3()=0A= {=0A= x =3D y | z;=0A= }=0A= =0A= =0A= void xor2()=0A= {=0A= x ^=3D y;=0A= }=0A= =0A= void xor3()=0A= {=0A= x =3D y ^ z;=0A= }=0A= =0A= void not1()=0A= {=0A= x =3D ~x;=0A= }=0A= =0A= void not2()=0A= {=0A= x =3D ~y;=0A= }=0A= =0A= /* { dg-final { scan-assembler-times "pand" 2 } } */=0A= /* { dg-final { scan-assembler-times "por" 2 } } */=0A= /* { dg-final { scan-assembler-times "pxor" 4 } } */=0A= ------=_NextPart_000_0030_01D7C71D.7E9015F0--