From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 779B73858D28 for ; Tue, 27 Jun 2023 17:22:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 779B73858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=HOsp9HSPRD3Nr1DpxONniArCGf05FHKpRvQ0K8JS3xQ=; b=Hwa6THRkEAHeYu6Qg4mSAYRseQ 9q/XwqfdCawxqVab/rxXb63Jww7D52k0fozVB3hUAvLkj1t3GxZLIgjJAcp23Buy3hFZiL5YgDBx/ C/kht3ch2bAPbN7V92XaJN0Y0OaDFRCQTPSC+8qaFrhS5KQKuqkg4Xc1MQgGzgu29+ZR6o79ieqft sF4sWqgjuJSw1Z7TzNLaqA2mcTRqQRK1WP0x//4Wv31TrCqVa8GrgzH9DkqomuVpUYHa+aOcgujcK ogpmaRjbCgn9mYprpUuWF7bPtgu+76+Jal0nSfUCGo2ivo8KQzEkuDSyePCYXmsDD07Qc38BW0N1L taToEqvQ==; Received: from [185.62.158.67] (port=49984 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qECOO-0006qm-2P; Tue, 27 Jun 2023 13:22:16 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces). Date: Tue, 27 Jun 2023 18:22:14 +0100 Message-ID: <013101d9a91b$eb84cb60$c28e6220$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0132_01D9A924.4D4BA460" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdmpGrCuCdWTZpBDQcCwiHn01YhmGg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,MEDICAL_SUBJECT,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0132_01D9A924.4D4BA460 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch fixes some very odd (unanticipated) code generation by compare_by_pieces with -m32 -mavx, since the recent addition of the cbranchoi4 pattern. The issue is that cbranchoi4 is available with TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT which results in the odd behaviour (thanks to OPTAB_WIDEN) that with -m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit comparisons to 256-bits before performing PTEST. This patch fixes this by providing a cbranchti4 pattern that's available with either TARGET_64BIT or TARGET_SSE4_1. For the test case below (again from PR 104610): int foo(char *a) { static const char t[] = "0123456789012345678901234567890"; return __builtin_memcmp(a, &t[0], sizeof(t)) == 0; } GCC with -m32 -O2 -mavx currently produces the bonkers: foo: pushl %ebp movl %esp, %ebp andl $-32, %esp subl $64, %esp movl 8(%ebp), %eax vmovdqa .LC0, %xmm4 movl $0, 48(%esp) vmovdqu (%eax), %xmm2 movl $0, 52(%esp) movl $0, 56(%esp) movl $0, 60(%esp) movl $0, 16(%esp) movl $0, 20(%esp) movl $0, 24(%esp) movl $0, 28(%esp) vmovdqa %xmm2, 32(%esp) vmovdqa %xmm4, (%esp) vmovdqa (%esp), %ymm5 vpxor 32(%esp), %ymm5, %ymm0 vptest %ymm0, %ymm0 jne .L2 vmovdqu 16(%eax), %xmm7 movl $0, 48(%esp) movl $0, 52(%esp) vmovdqa %xmm7, 32(%esp) vmovdqa .LC1, %xmm7 movl $0, 56(%esp) movl $0, 60(%esp) movl $0, 16(%esp) movl $0, 20(%esp) movl $0, 24(%esp) movl $0, 28(%esp) vmovdqa %xmm7, (%esp) vmovdqa (%esp), %ymm1 vpxor 32(%esp), %ymm1, %ymm0 vptest %ymm0, %ymm0 je .L6 .L2: movl $1, %eax xorl $1, %eax vzeroupper leave ret .L6: xorl %eax, %eax xorl $1, %eax vzeroupper leave ret with this patch, we now generate the (slightly) more sensible: foo: vmovdqa .LC0, %xmm0 movl 4(%esp), %eax vpxor (%eax), %xmm0, %xmm0 vptest %xmm0, %xmm0 jne .L2 vmovdqa .LC1, %xmm0 vpxor 16(%eax), %xmm0, %xmm0 vptest %xmm0, %xmm0 je .L5 .L2: movl $1, %eax xorl $1, %eax ret .L5: xorl %eax, %eax xorl $1, %eax ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-27 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest for TImode comparisons on 32-bit architectures. * config/i386/i386.md (cbranch4): Change from SDWIM to SWIM1248x to exclude/avoid TImode being conditional on -m64. (cbranchti4): New define_expand for TImode on both TARGET_64BIT and/or with TARGET_SSE4_1. * config/i386/predicates.md (ix86_timode_comparison_operator): New predicate that depends upon TARGET_64BIT. (ix86_timode_comparison_operand): Likewise. gcc/testsuite/ChangeLog * gcc.target/i386/pieces-memcmp-2.c: New test case. Thanks in advance, Roger -- ------=_NextPart_000_0132_01D9A924.4D4BA460 Content-Type: text/plain; name="patchti.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchti.txt" diff --git a/gcc/config/i386/i386-expand.cc = b/gcc/config/i386/i386-expand.cc=0A= index 9a8d244..567248d 100644=0A= --- a/gcc/config/i386/i386-expand.cc=0A= +++ b/gcc/config/i386/i386-expand.cc=0A= @@ -2365,6 +2365,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, = rtx op1, rtx label)=0A= /* Handle special case - vector comparsion with boolean result, = transform=0A= it using ptest instruction. */=0A= if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_INT=0A= + || (mode =3D=3D TImode && !TARGET_64BIT)=0A= || mode =3D=3D OImode)=0A= {=0A= rtx flag =3D gen_rtx_REG (CCZmode, FLAGS_REG);=0A= @@ -2372,7 +2373,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, = rtx op1, rtx label)=0A= =0A= gcc_assert (code =3D=3D EQ || code =3D=3D NE);=0A= =0A= - if (mode =3D=3D OImode)=0A= + if (GET_MODE_CLASS (mode) !=3D MODE_VECTOR_INT)=0A= {=0A= op0 =3D lowpart_subreg (p_mode, force_reg (mode, op0), mode);=0A= op1 =3D lowpart_subreg (p_mode, force_reg (mode, op1), mode);=0A= diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index b50d82b..dcf0ba6 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -1352,8 +1352,8 @@=0A= =0A= (define_expand "cbranch4"=0A= [(set (reg:CC FLAGS_REG)=0A= - (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")=0A= - (match_operand:SDWIM 2 "")))=0A= + (compare:CC (match_operand:SWIM1248x 1 "nonimmediate_operand")=0A= + (match_operand:SWIM1248x 2 "")))=0A= (set (pc) (if_then_else=0A= (match_operator 0 "ordered_comparison_operator"=0A= [(reg:CC FLAGS_REG) (const_int 0)])=0A= @@ -1368,6 +1368,22 @@=0A= DONE;=0A= })=0A= =0A= +(define_expand "cbranchti4"=0A= + [(set (reg:CC FLAGS_REG)=0A= + (compare:CC (match_operand:TI 1 "nonimmediate_operand")=0A= + (match_operand:TI 2 "ix86_timode_comparison_operand")))=0A= + (set (pc) (if_then_else=0A= + (match_operator 0 "ix86_timode_comparison_operator"=0A= + [(reg:CC FLAGS_REG) (const_int 0)])=0A= + (label_ref (match_operand 3))=0A= + (pc)))]=0A= + "TARGET_64BIT || TARGET_SSE4_1"=0A= +{=0A= + ix86_expand_branch (GET_CODE (operands[0]),=0A= + operands[1], operands[2], operands[3]);=0A= + DONE;=0A= +})=0A= +=0A= (define_expand "cbranchoi4"=0A= [(set (reg:CC FLAGS_REG)=0A= (compare:CC (match_operand:OI 1 "nonimmediate_operand")=0A= diff --git a/gcc/config/i386/predicates.md = b/gcc/config/i386/predicates.md=0A= index fb07707..2d50cbf 100644=0A= --- a/gcc/config/i386/predicates.md=0A= +++ b/gcc/config/i386/predicates.md=0A= @@ -1641,6 +1641,19 @@=0A= (match_operand 0 "comparison_operator")=0A= (match_operand 0 "ix86_trivial_fp_comparison_operator")))=0A= =0A= +;; Return true if we can perform this comparison on TImode operands.=0A= +(define_predicate "ix86_timode_comparison_operator"=0A= + (if_then_else (match_test "TARGET_64BIT")=0A= + (match_operand 0 "ordered_comparison_operator")=0A= + (match_operand 0 "bt_comparison_operator")))=0A= +=0A= +;; Return true if this is a valid second operand for a TImode = comparison.=0A= +(define_predicate "ix86_timode_comparison_operand"=0A= + (if_then_else (match_test "TARGET_64BIT")=0A= + (match_operand 0 "x86_64_general_operand")=0A= + (match_operand 0 "nonimmediate_operand")))=0A= +=0A= +=0A= ;; Nearly general operand, but accept any const_double, since we wish=0A= ;; to be able to drop them into memory rather than have them get pulled=0A= ;; into registers.=0A= diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c = b/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c=0A= new file mode 100644=0A= index 0000000..6f996fa=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c=0A= @@ -0,0 +1,13 @@=0A= +/* { dg-do compile { target ia32 } } */=0A= +/* { dg-options "-O2 -mavx2" } */=0A= +=0A= +int foo(char *a)=0A= +{=0A= + static const char t[] =3D "0123456789012345678901234567890";=0A= + return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "movl\[ \\t]*\\\$0," } } */=0A= +/* { dg-final { scan-assembler-not "vptest\[ \\t]*%ymm" } } */=0A= +/* { dg-final { scan-assembler-times "vptest\[ \\t]*%xmm" 2 } } */=0A= +=0A= ------=_NextPart_000_0132_01D9A924.4D4BA460--