From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH] New finish_compare_by_pieces target hook (for x86).
Date: Mon, 12 Jun 2023 20:46:40 +0200 [thread overview]
Message-ID: <CAFULd4br7umH6iktG8KkD1skE=+Z0cb--3JtFUbuK6fNN7opWg@mail.gmail.com> (raw)
In-Reply-To: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com>
On Mon, Jun 12, 2023 at 4:03 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> The following simple test case, from PR 104610, shows that memcmp () == 0
> can result in some bizarre code sequences on x86.
>
> int foo(char *a)
> {
> static const char t[] = "0123456789012345678901234567890";
> return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> }
>
> with -O2 currently contains both:
> xorl %eax, %eax
> xorl $1, %eax
> and also
> movl $1, %eax
> xorl $1, %eax
>
> Changing the return type of foo to _Bool results in the equally
> bizarre:
> xorl %eax, %eax
> testl %eax, %eax
> sete %al
> and also
> movl $1, %eax
> testl %eax, %eax
> sete %al
>
> All these sequences set the result to a constant, but this optimization
> opportunity only occurs very late during compilation, by basic block
> duplication in the 322r.bbro pass, too late for CSE or peephole2 to
> do anything about it. The problem is that the idiom expanded by
> compare_by_pieces for __builtin_memcmp_eq contains basic blocks that
> can't easily be optimized by if-conversion due to the multiple
> incoming edges on the fail block.
>
> In summary, compare_by_pieces generates code that looks like:
>
> if (x[0] != y[0]) goto fail_label;
> if (x[1] != y[1]) goto fail_label;
> ...
> if (x[n] != y[n]) goto fail_label;
> result = 1;
> goto end_label;
> fail_label:
> result = 0;
> end_label:
>
> In theory, the RTL if-conversion pass could be enhanced to tackle
> arbitrarily complex if-then-else graphs, but the solution proposed
> here is to allow suitable targets to perform if-conversion during
> compare_by_pieces. The x86, for example, can take advantage that
> all of the above comparisons set and test the zero flag (ZF), which
> can then be used in combination with sete. Hence compare_by_pieces
> could instead generate:
>
> if (x[0] != y[0]) goto fail_label;
> if (x[1] != y[1]) goto fail_label;
> ...
> if (x[n] != y[n]) goto fail_label;
> fail_label:
> sete result
>
> which requires one less basic block, and the redundant conditional
> branch to a label immediately after is cleaned up by GCC's existing
> RTL optimizations.
>
> For the test case above, where -O2 -msse4 previously generated:
>
> foo: movdqu (%rdi), %xmm0
> pxor .LC0(%rip), %xmm0
> ptest %xmm0, %xmm0
> je .L5
> .L2: movl $1, %eax
> xorl $1, %eax
> ret
> .L5: movdqu 16(%rdi), %xmm0
> pxor .LC1(%rip), %xmm0
> ptest %xmm0, %xmm0
> jne .L2
> xorl %eax, %eax
> xorl $1, %eax
> ret
>
> we now generate:
>
> foo: movdqu (%rdi), %xmm0
> pxor .LC0(%rip), %xmm0
> ptest %xmm0, %xmm0
> jne .L2
> movdqu 16(%rdi), %xmm0
> pxor .LC1(%rip), %xmm0
> ptest %xmm0, %xmm0
> .L2: sete %al
> movzbl %al, %eax
> ret
>
> Using a target hook allows the large amount of intelligence already in
> compare_by_pieces to be re-used by the i386 backend, but this can also
> help other backends with condition flags where the equality result can
> be materialized.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2023-06-12 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_finish_compare_by_pieces): New
> function to provide a backend specific implementation.
> (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.
>
> * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook.
> * doc/tm.texi: Regenerate.
>
> * expr.cc (compare_by_pieces): Call finish_compare_by_pieces in
> targetm to finalize the RTL expansion. Move the current
> implementation to a default target hook.
> * target.def (finish_compare_by_pieces): New target hook to allow
> compare_by_pieces to be customized by the target.
> * targhooks.cc (default_finish_compare_by_pieces): Default
> implementation moved here from expr.cc's compare_by_pieces.
> * targhooks.h (default_finish_compare_by_pieces): Prototype.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/pieces-memcmp-1.c: New test case.
This patch needs middle-end approval first.
Uros.
>
>
> Thanks in advance,
> Roger
> --
>
next prev parent reply other threads:[~2023-06-12 18:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 14:03 Roger Sayle
2023-06-12 18:46 ` Uros Bizjak [this message]
2023-06-13 11:02 ` Richard Biener
2023-06-25 5:39 ` Roger Sayle
2023-06-26 7:18 ` Richard Biener
2023-06-26 18:24 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFULd4br7umH6iktG8KkD1skE=+Z0cb--3JtFUbuK6fNN7opWg@mail.gmail.com' \
--to=ubizjak@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=roger@nextmovesoftware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).