From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id E09E03858418 for ; Mon, 12 Jun 2023 14:03:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E09E03858418 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=HwSKaKVjaXwFoOFs6bIkdxxaCsCXbQzm/jqC/2OrpuM=; b=pzWqUTUuUJxCi4SsdXrVyqo1/g DpOLgIr/VP46fL5fUNnGC31WC1w4cKc6ybW6jMgcdifLhsIGGnXxE9+IqfdXEw59QIvD8ppG0KMFh VO+tyRQOywTuVaC+qpWJ1t5F902RKdydRDrPjfeERcH/4hOpMec2nFaV5zQ4TcrWto830acVXEBYG VjS+2wooidiOEAkuOnZmEfYzHNugWkzL1C1fsrwgpvNxOqwgMwPfFiphE3uECuY7Hb9ntqk+UyLRM LMnHLkNYFSlSL8lpYQdoa6u18FnoARAGZ3cvLgENgve/mbJB97+8m8GRvgHoxUigUUVq7bNufv0BC pU6cg+qw==; Received: from [185.62.158.67] (port=56080 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1q8i8b-0000xL-1a; Mon, 12 Jun 2023 10:03:17 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" , "'Jakub Jelinek'" Subject: [PATCH] New finish_compare_by_pieces target hook (for x86). Date: Mon, 12 Jun 2023 15:03:16 +0100 Message-ID: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0011_01D99D3F.054579C0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdmdNKbFx5NyLJ76Q/OMJEq5x1i3QQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0011_01D99D3F.054579C0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit The following simple test case, from PR 104610, shows that memcmp () == 0 can result in some bizarre code sequences on x86. int foo(char *a) { static const char t[] = "0123456789012345678901234567890"; return __builtin_memcmp(a, &t[0], sizeof(t)) == 0; } with -O2 currently contains both: xorl %eax, %eax xorl $1, %eax and also movl $1, %eax xorl $1, %eax Changing the return type of foo to _Bool results in the equally bizarre: xorl %eax, %eax testl %eax, %eax sete %al and also movl $1, %eax testl %eax, %eax sete %al All these sequences set the result to a constant, but this optimization opportunity only occurs very late during compilation, by basic block duplication in the 322r.bbro pass, too late for CSE or peephole2 to do anything about it. The problem is that the idiom expanded by compare_by_pieces for __builtin_memcmp_eq contains basic blocks that can't easily be optimized by if-conversion due to the multiple incoming edges on the fail block. In summary, compare_by_pieces generates code that looks like: if (x[0] != y[0]) goto fail_label; if (x[1] != y[1]) goto fail_label; ... if (x[n] != y[n]) goto fail_label; result = 1; goto end_label; fail_label: result = 0; end_label: In theory, the RTL if-conversion pass could be enhanced to tackle arbitrarily complex if-then-else graphs, but the solution proposed here is to allow suitable targets to perform if-conversion during compare_by_pieces. The x86, for example, can take advantage that all of the above comparisons set and test the zero flag (ZF), which can then be used in combination with sete. Hence compare_by_pieces could instead generate: if (x[0] != y[0]) goto fail_label; if (x[1] != y[1]) goto fail_label; ... if (x[n] != y[n]) goto fail_label; fail_label: sete result which requires one less basic block, and the redundant conditional branch to a label immediately after is cleaned up by GCC's existing RTL optimizations. For the test case above, where -O2 -msse4 previously generated: foo: movdqu (%rdi), %xmm0 pxor .LC0(%rip), %xmm0 ptest %xmm0, %xmm0 je .L5 .L2: movl $1, %eax xorl $1, %eax ret .L5: movdqu 16(%rdi), %xmm0 pxor .LC1(%rip), %xmm0 ptest %xmm0, %xmm0 jne .L2 xorl %eax, %eax xorl $1, %eax ret we now generate: foo: movdqu (%rdi), %xmm0 pxor .LC0(%rip), %xmm0 ptest %xmm0, %xmm0 jne .L2 movdqu 16(%rdi), %xmm0 pxor .LC1(%rip), %xmm0 ptest %xmm0, %xmm0 .L2: sete %al movzbl %al, %eax ret Using a target hook allows the large amount of intelligence already in compare_by_pieces to be re-used by the i386 backend, but this can also help other backends with condition flags where the equality result can be materialized. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-12 Roger Sayle gcc/ChangeLog * config/i386/i386.cc (ix86_finish_compare_by_pieces): New function to provide a backend specific implementation. (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function. * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook. * doc/tm.texi: Regenerate. * expr.cc (compare_by_pieces): Call finish_compare_by_pieces in targetm to finalize the RTL expansion. Move the current implementation to a default target hook. * target.def (finish_compare_by_pieces): New target hook to allow compare_by_pieces to be customized by the target. * targhooks.cc (default_finish_compare_by_pieces): Default implementation moved here from expr.cc's compare_by_pieces. * targhooks.h (default_finish_compare_by_pieces): Prototype. gcc/testsuite/ChangeLog * gcc.target/i386/pieces-memcmp-1.c: New test case. Thanks in advance, Roger -- ------=_NextPart_000_0011_01D99D3F.054579C0 Content-Type: text/plain; name="patchne.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchne.txt" diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc=0A= index 3a1444d..509c0ee 100644=0A= --- a/gcc/config/i386/i386.cc=0A= +++ b/gcc/config/i386/i386.cc=0A= @@ -16146,6 +16146,20 @@ ix86_fp_compare_code_to_integer (enum rtx_code = code)=0A= }=0A= }=0A= =0A= +/* Override compare_by_pieces' default implementation using the state=0A= + of the CCZmode FLAGS_REG and sete instruction. TARGET is the = integral=0A= + mode result, and FAIL_LABEL is the branch target of mismatched=0A= + comparisons. */=0A= +=0A= +void=0A= +ix86_finish_compare_by_pieces (rtx target, rtx_code_label *fail_label)=0A= +{=0A= + rtx tmp =3D gen_reg_rtx (QImode);=0A= + emit_label (fail_label);=0A= + ix86_expand_setcc (tmp, NE, gen_rtx_REG (CCZmode, FLAGS_REG), = const0_rtx);=0A= + convert_move (target, tmp, 1);=0A= +}=0A= +=0A= /* Zero extend possibly SImode EXP to Pmode register. */=0A= rtx=0A= ix86_zero_extend_to_Pmode (rtx exp)=0A= @@ -25127,6 +25141,8 @@ ix86_run_selftests (void)=0A= =0A= #undef TARGET_OVERLAP_OP_BY_PIECES_P=0A= #define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true=0A= +#undef TARGET_FINISH_COMPARE_BY_PIECES=0A= +#define TARGET_FINISH_COMPARE_BY_PIECES ix86_finish_compare_by_pieces=0A= =0A= #undef TARGET_FLAGS_REGNUM=0A= #define TARGET_FLAGS_REGNUM FLAGS_REG=0A= diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi=0A= index 95ba56e..28d8361 100644=0A= --- a/gcc/doc/tm.texi=0A= +++ b/gcc/doc/tm.texi=0A= @@ -6922,6 +6922,18 @@ particular mode from being used for block = comparisons by returning a=0A= negative number from this hook.=0A= @end deftypefn=0A= =0A= +@deftypefn {Target Hook} void TARGET_FINISH_COMPARE_BY_PIECES (rtx = @var{target}, rtx_code_label *@var{fail_label})=0A= +Allow targets with a zero flag and suitable setcc instruction to provide=0A= +an alternate implementation for @code{compare_by_pieces}. The function=0A= +@code{compare_by_pieces} generates a sequence of equality tests that=0A= +branch to @var{failure_label} on mismatches, and fall through on = success.=0A= +By default, this hook assigns @code{const1_rtx} to @var{target} in the=0A= +current basic block and @code{const0_rtx} to @var{target} in a new=0A= +@var{fail_label} basic block. Targets like x86 can take advantage=0A= +of the property that the condition codes/zero flag are appropriately=0A= +set to avoid introducing a new basic block.=0A= +@end deftypefn=0A= +=0A= @defmac MOVE_MAX_PIECES=0A= A C expression used by @code{move_by_pieces} to determine the largest = unit=0A= a load or store used to copy memory is. Defaults to @code{MOVE_MAX}.=0A= diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in=0A= index 4ac96dc..45711db 100644=0A= --- a/gcc/doc/tm.texi.in=0A= +++ b/gcc/doc/tm.texi.in=0A= @@ -4529,6 +4529,8 @@ If you don't define this, a reasonable default is = used.=0A= =0A= @hook TARGET_COMPARE_BY_PIECES_BRANCH_RATIO=0A= =0A= +@hook TARGET_FINISH_COMPARE_BY_PIECES=0A= +=0A= @defmac MOVE_MAX_PIECES=0A= A C expression used by @code{move_by_pieces} to determine the largest = unit=0A= a load or store used to copy memory is. Defaults to @code{MOVE_MAX}.=0A= diff --git a/gcc/expr.cc b/gcc/expr.cc=0A= index 868fa6e..25fca17 100644=0A= --- a/gcc/expr.cc=0A= +++ b/gcc/expr.cc=0A= @@ -1923,7 +1923,6 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned = HOST_WIDE_INT len,=0A= by_pieces_constfn a1_cfn, void *a1_cfn_data)=0A= {=0A= rtx_code_label *fail_label =3D gen_label_rtx ();=0A= - rtx_code_label *end_label =3D gen_label_rtx ();=0A= =0A= if (target =3D=3D NULL_RTX=0A= || !REG_P (target) || REGNO (target) < FIRST_PSEUDO_REGISTER)=0A= @@ -1934,12 +1933,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned = HOST_WIDE_INT len,=0A= =0A= data.run ();=0A= =0A= - emit_move_insn (target, const0_rtx);=0A= - emit_jump (end_label);=0A= - emit_barrier ();=0A= - emit_label (fail_label);=0A= - emit_move_insn (target, const1_rtx);=0A= - emit_label (end_label);=0A= + /* Allow the backend to override the default implementation. */=0A= + targetm.finish_compare_by_pieces (target, fail_label);=0A= =0A= return target;=0A= }=0A= diff --git a/gcc/target.def b/gcc/target.def=0A= index 7d68429..0593917 100644=0A= --- a/gcc/target.def=0A= +++ b/gcc/target.def=0A= @@ -3749,6 +3749,20 @@ negative number from this hook.",=0A= default_compare_by_pieces_branch_ratio)=0A= =0A= DEFHOOK=0A= +(finish_compare_by_pieces,=0A= + "Allow targets with a zero flag and suitable setcc instruction to = provide\n\=0A= +an alternate implementation for @code{compare_by_pieces}. The = function\n\=0A= +@code{compare_by_pieces} generates a sequence of equality tests that\n\=0A= +branch to @var{failure_label} on mismatches, and fall through on = success.\n\=0A= +By default, this hook assigns @code{const1_rtx} to @var{target} in = the\n\=0A= +current basic block and @code{const0_rtx} to @var{target} in a new\n\=0A= +@var{fail_label} basic block. Targets like x86 can take advantage\n\=0A= +of the property that the condition codes/zero flag are appropriately\n\=0A= +set to avoid introducing a new basic block.",=0A= + void, (rtx target, rtx_code_label *fail_label),=0A= + default_finish_compare_by_pieces)=0A= +=0A= +DEFHOOK=0A= (slow_unaligned_access,=0A= "This hook returns true if memory accesses described by the\n\=0A= @var{mode} and @var{alignment} parameters have a cost many times = greater\n\=0A= diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc=0A= index e190369..767fe38 100644=0A= --- a/gcc/targhooks.cc=0A= +++ b/gcc/targhooks.cc=0A= @@ -2094,6 +2094,24 @@ default_compare_by_pieces_branch_ratio = (machine_mode)=0A= return 1;=0A= }=0A= =0A= +/* This hook allows the backend to modify/override code generation for=0A= + compare_by_pieces. Targets with a zero flag and a suitable setcc=0A= + function can use them instead of the default "compare ? 0 : 1"=0A= + implementation below. TARGET is the integral mode result and=0A= + FAIL_LABEL is the destination for comparison mismatches. */=0A= +=0A= +void=0A= +default_finish_compare_by_pieces (rtx target, rtx_code_label = *fail_label)=0A= +{=0A= + rtx_code_label *end_label =3D gen_label_rtx ();=0A= + emit_move_insn (target, const0_rtx);=0A= + emit_jump (end_label);=0A= + emit_barrier ();=0A= + emit_label (fail_label);=0A= + emit_move_insn (target, const1_rtx);=0A= + emit_label (end_label);=0A= +}=0A= +=0A= /* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a = function=0A= entry. If RECORD_P is true and the target supports named sections,=0A= the location of the NOPs will be recorded in a special object section=0A= diff --git a/gcc/targhooks.h b/gcc/targhooks.h=0A= index 1a0db8d..b99fff6 100644=0A= --- a/gcc/targhooks.h=0A= +++ b/gcc/targhooks.h=0A= @@ -237,6 +237,7 @@ extern bool default_use_by_pieces_infrastructure_p = (unsigned HOST_WIDE_INT,=0A= enum by_pieces_operation,=0A= bool);=0A= extern int default_compare_by_pieces_branch_ratio (machine_mode);=0A= +extern void default_finish_compare_by_pieces (rtx, rtx_code_label *);=0A= =0A= extern void default_print_patchable_function_entry (FILE *,=0A= unsigned HOST_WIDE_INT,=0A= diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcmp-1.c = b/gcc/testsuite/gcc.target/i386/pieces-memcmp-1.c=0A= new file mode 100644=0A= index 0000000..de1d82f=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pieces-memcmp-1.c=0A= @@ -0,0 +1,10 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +int foo(char *a)=0A= +{=0A= + static const char t[] =3D "0123456789012345678901234567890";=0A= + return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "xorl\[ \\t]*\\\$1," } } */=0A= ------=_NextPart_000_0011_01D99D3F.054579C0--