RE: [PATCH] New finish_compare_by_pieces target hook (for x86).

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Richard Biener'" <richard.guenther@gmail.com>
Cc: <gcc-patches@gcc.gnu.org>, "'Uros Bizjak'" <ubizjak@gmail.com>,
	"'Jakub Jelinek'" <jakub@redhat.com>
Subject: RE: [PATCH] New finish_compare_by_pieces target hook (for x86).
Date: Sun, 25 Jun 2023 06:39:17 +0100	[thread overview]
Message-ID: <001e01d9a727$6375adc0$2a610940$@nextmovesoftware.com> (raw)
In-Reply-To: <CAFiYyc1Jo8CTT3_BOQ9gq_yfH4brZwkB3oS_QV4QWC568GDbWg@mail.gmail.com>


On Tue, 13 June 2023 12:02, Richard Biener wrote:
> On Mon, Jun 12, 2023 at 4:04 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> > The following simple test case, from PR 104610, shows that memcmp ()
> > == 0 can result in some bizarre code sequences on x86.
> >
> > int foo(char *a)
> > {
> >     static const char t[] = "0123456789012345678901234567890";
> >     return __builtin_memcmp(a, &t[0], sizeof(t)) == 0; }
> >
> > with -O2 currently contains both:
> >         xorl    %eax, %eax
> >         xorl    $1, %eax
> > and also
> >         movl    $1, %eax
> >         xorl    $1, %eax
> >
> > Changing the return type of foo to _Bool results in the equally
> > bizarre:
> >         xorl    %eax, %eax
> >         testl   %eax, %eax
> >         sete    %al
> > and also
> >         movl    $1, %eax
> >         testl   %eax, %eax
> >         sete    %al
> >
> > All these sequences set the result to a constant, but this
> > optimization opportunity only occurs very late during compilation, by
> > basic block duplication in the 322r.bbro pass, too late for CSE or
> > peephole2 to do anything about it.  The problem is that the idiom
> > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic
> > blocks that can't easily be optimized by if-conversion due to the
> > multiple incoming edges on the fail block.
> >
> > In summary, compare_by_pieces generates code that looks like:
> >
> >         if (x[0] != y[0]) goto fail_label;
> >         if (x[1] != y[1]) goto fail_label;
> >         ...
> >         if (x[n] != y[n]) goto fail_label;
> >         result = 1;
> >         goto end_label;
> > fail_label:
> >         result = 0;
> > end_label:
> >
> > In theory, the RTL if-conversion pass could be enhanced to tackle
> > arbitrarily complex if-then-else graphs, but the solution proposed
> > here is to allow suitable targets to perform if-conversion during
> > compare_by_pieces.  The x86, for example, can take advantage that all
> > of the above comparisons set and test the zero flag (ZF), which can
> > then be used in combination with sete.  Hence compare_by_pieces could
> > instead generate:
> >
> >         if (x[0] != y[0]) goto fail_label;
> >         if (x[1] != y[1]) goto fail_label;
> >         ...
> >         if (x[n] != y[n]) goto fail_label;
> > fail_label:
> >         sete result
> >
> > which requires one less basic block, and the redundant conditional
> > branch to a label immediately after is cleaned up by GCC's existing
> > RTL optimizations.
> >
> > For the test case above, where -O2 -msse4 previously generated:
> >
> > foo:    movdqu  (%rdi), %xmm0
> >         pxor    .LC0(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> >         je      .L5
> > .L2:    movl    $1, %eax
> >         xorl    $1, %eax
> >         ret
> > .L5:    movdqu  16(%rdi), %xmm0
> >         pxor    .LC1(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> >         jne     .L2
> >         xorl    %eax, %eax
> >         xorl    $1, %eax
> >         ret
> >
> > we now generate:
> >
> > foo:    movdqu  (%rdi), %xmm0
> >         pxor    .LC0(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> >         jne     .L2
> >         movdqu  16(%rdi), %xmm0
> >         pxor    .LC1(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> > .L2:    sete    %al
> >         movzbl  %al, %eax
> >         ret
> >
> > Using a target hook allows the large amount of intelligence already in
> > compare_by_pieces to be re-used by the i386 backend, but this can also
> > help other backends with condition flags where the equality result can
> > be materialized.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> What's the guarantee that the zero flag is appropriately set on all edges incoming
> now and forever?

Is there any reason why this target hook can't be removed (in future) should it stop
being useful?  It's completely optional and not required for the correct functioning
of the compiler.

> Does this require target specific knowledge on how do_compare_rtx_and_jump
> is emitting RTL?

Yes.  Each backend can decide how best to implement finish_compare_by_pieces
given its internal knowledge of how do_compare_rtx_and_jump works.  It's not
important to the middle-end how the underlying invariants are guaranteed, just
that they are and the backend produces correct code.  A backend may store flags
on the target label, or maintain state in cfun.  Future changes to the i386 backend
might cause it to revert to the default finish_compare_by_pieces, or provide an
alternate implementation, but at the moment this patch improves the code that
GCC generates.  Very little (in software like GCC) is forever.

> Do you see matching this in ifcvt to be unreasonable?  I'm thinking of "reducing"
> the incoming edges pairwise without actually looking at the ifcvt code.

There's nothing about the proposed patch that prevents or blocks improvements
to ifcvt, which I agree completely could be improved.  But even (in future) if later
RTL passes could clean things up, that's no reason for RTL expansion to initially
generate poor/inefficient code.  I'm not sure that a (hypothetical) ifcvt improvement
would be sufficient reason to revert/remove enhancements to compare_by_pieces.

Is it that there's not enough (bootstrap and) testsuite coverage of compare_by_pieces
to make you feel comfortable with this change?  The proposed implementation should
be "obvious enough" to future generations what the intended behaviour should be.
And the x86 target hook implementation (i.e. the change) is only four lines long, a
fraction of the size of the new documentation and comments.

Thanks in advance.
Roger


> > 2023-06-12  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386.cc (ix86_finish_compare_by_pieces): New
> >         function to provide a backend specific implementation.
> >         (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.
> >
> >         * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook.
> >         * doc/tm.texi: Regenerate.
> >
> >         * expr.cc (compare_by_pieces): Call finish_compare_by_pieces in
> >         targetm to finalize the RTL expansion.  Move the current
> >         implementation to a default target hook.
> >         * target.def (finish_compare_by_pieces): New target hook to allow
> >         compare_by_pieces to be customized by the target.
> >         * targhooks.cc (default_finish_compare_by_pieces): Default
> >         implementation moved here from expr.cc's compare_by_pieces.
> >         * targhooks.h (default_finish_compare_by_pieces): Prototype.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/pieces-memcmp-1.c: New test case.
> >

next prev parent reply	other threads:[~2023-06-25  5:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 14:03 Roger Sayle
2023-06-12 18:46 ` Uros Bizjak
2023-06-13 11:02 ` Richard Biener
2023-06-25  5:39   ` Roger Sayle [this message]
2023-06-26  7:18     ` Richard Biener
2023-06-26 18:24       ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='001e01d9a727$6375adc0$2a610940$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).