From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=rP3v=CN=nextmovesoftware.com=roger@sourceware.org>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69])
	by sourceware.org (Postfix) with ESMTPS id 5B5613858D35
	for <gcc-patches@gcc.gnu.org>; Sun, 25 Jun 2023 05:39:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5B5613858D35
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type:
	MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender
	:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:
	Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:
	List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=eQ4odnln8wp9iPE8EHSERn5U2Z4mfct/J4SOMFzlRWA=; b=Q2i1/MrL5J6ezhkQbxZoP7kX8x
	nwtTY/hcn5eLjCwtO26DqcxRbXs3N7+c8jQKPwS+nxDXeCE8r8dZ0rx+Hv2O8fKE2WvNPwIGI9AOE
	7kCEGtM2C7esuhDC8Ry6FoLD05g8eEB7pnvZHRqvdBBGfOCYp/0/JvYxFY2uNuZvUD2w+HsV9ruyA
	pgOh+kurgetns3WTAQ5hlUM3MBOGc2vjpRqg+cgopgCfJ1OfH6sUdewnNQSK+Uh1r6+8IlG93CYfC
	NtSDyRK0n+B34RhHFjSgR6xpaPwaQcFmjPbhKuqd+JN5OveL3bC1+NZ3w65aR0j7NdgiXXmpz+Grx
	krxK4uWQ==;
Received: from host86-161-68-50.range86-161.btcentralplus.com ([86.161.68.50]:53064 helo=Dell)
	by server.nextmovesoftware.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.96)
	(envelope-from <roger@nextmovesoftware.com>)
	id 1qDIT1-0006TE-1I;
	Sun, 25 Jun 2023 01:39:19 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Richard Biener'" <richard.guenther@gmail.com>
Cc: <gcc-patches@gcc.gnu.org>,
	"'Uros Bizjak'" <ubizjak@gmail.com>,
	"'Jakub Jelinek'" <jakub@redhat.com>
References: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com> <CAFiYyc1Jo8CTT3_BOQ9gq_yfH4brZwkB3oS_QV4QWC568GDbWg@mail.gmail.com>
In-Reply-To: <CAFiYyc1Jo8CTT3_BOQ9gq_yfH4brZwkB3oS_QV4QWC568GDbWg@mail.gmail.com>
Subject: RE: [PATCH] New finish_compare_by_pieces target hook (for x86).
Date: Sun, 25 Jun 2023 06:39:17 +0100
Message-ID: <001e01d9a727$6375adc0$2a610940$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQHjUiizPUa+8vvkpBMWVKeUxxlQbQHdnoB0r3iw7nA=
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On Tue, 13 June 2023 12:02, Richard Biener wrote:
> On Mon, Jun 12, 2023 at 4:04=E2=80=AFPM Roger Sayle =
<roger@nextmovesoftware.com>
> wrote:
> > The following simple test case, from PR 104610, shows that memcmp ()
> > =3D=3D 0 can result in some bizarre code sequences on x86.
> >
> > int foo(char *a)
> > {
> >     static const char t[] =3D "0123456789012345678901234567890";
> >     return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0; }
> >
> > with -O2 currently contains both:
> >         xorl    %eax, %eax
> >         xorl    $1, %eax
> > and also
> >         movl    $1, %eax
> >         xorl    $1, %eax
> >
> > Changing the return type of foo to _Bool results in the equally
> > bizarre:
> >         xorl    %eax, %eax
> >         testl   %eax, %eax
> >         sete    %al
> > and also
> >         movl    $1, %eax
> >         testl   %eax, %eax
> >         sete    %al
> >
> > All these sequences set the result to a constant, but this
> > optimization opportunity only occurs very late during compilation, =
by
> > basic block duplication in the 322r.bbro pass, too late for CSE or
> > peephole2 to do anything about it.  The problem is that the idiom
> > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic
> > blocks that can't easily be optimized by if-conversion due to the
> > multiple incoming edges on the fail block.
> >
> > In summary, compare_by_pieces generates code that looks like:
> >
> >         if (x[0] !=3D y[0]) goto fail_label;
> >         if (x[1] !=3D y[1]) goto fail_label;
> >         ...
> >         if (x[n] !=3D y[n]) goto fail_label;
> >         result =3D 1;
> >         goto end_label;
> > fail_label:
> >         result =3D 0;
> > end_label:
> >
> > In theory, the RTL if-conversion pass could be enhanced to tackle
> > arbitrarily complex if-then-else graphs, but the solution proposed
> > here is to allow suitable targets to perform if-conversion during
> > compare_by_pieces.  The x86, for example, can take advantage that =
all
> > of the above comparisons set and test the zero flag (ZF), which can
> > then be used in combination with sete.  Hence compare_by_pieces =
could
> > instead generate:
> >
> >         if (x[0] !=3D y[0]) goto fail_label;
> >         if (x[1] !=3D y[1]) goto fail_label;
> >         ...
> >         if (x[n] !=3D y[n]) goto fail_label;
> > fail_label:
> >         sete result
> >
> > which requires one less basic block, and the redundant conditional
> > branch to a label immediately after is cleaned up by GCC's existing
> > RTL optimizations.
> >
> > For the test case above, where -O2 -msse4 previously generated:
> >
> > foo:    movdqu  (%rdi), %xmm0
> >         pxor    .LC0(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> >         je      .L5
> > .L2:    movl    $1, %eax
> >         xorl    $1, %eax
> >         ret
> > .L5:    movdqu  16(%rdi), %xmm0
> >         pxor    .LC1(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> >         jne     .L2
> >         xorl    %eax, %eax
> >         xorl    $1, %eax
> >         ret
> >
> > we now generate:
> >
> > foo:    movdqu  (%rdi), %xmm0
> >         pxor    .LC0(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> >         jne     .L2
> >         movdqu  16(%rdi), %xmm0
> >         pxor    .LC1(%rip), %xmm0
> >         ptest   %xmm0, %xmm0
> > .L2:    sete    %al
> >         movzbl  %al, %eax
> >         ret
> >
> > Using a target hook allows the large amount of intelligence already =
in
> > compare_by_pieces to be re-used by the i386 backend, but this can =
also
> > help other backends with condition flags where the equality result =
can
> > be materialized.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make =
bootstrap
> > and make -k check, both with and without --target_board=3Dunix{-m32}
> > with no new failures.  Ok for mainline?
>=20
> What's the guarantee that the zero flag is appropriately set on all =
edges incoming
> now and forever?

Is there any reason why this target hook can't be removed (in future) =
should it stop
being useful?  It's completely optional and not required for the correct =
functioning
of the compiler.

> Does this require target specific knowledge on how =
do_compare_rtx_and_jump
> is emitting RTL?

Yes.  Each backend can decide how best to implement =
finish_compare_by_pieces
given its internal knowledge of how do_compare_rtx_and_jump works.  It's =
not
important to the middle-end how the underlying invariants are =
guaranteed, just
that they are and the backend produces correct code.  A backend may =
store flags
on the target label, or maintain state in cfun.  Future changes to the =
i386 backend
might cause it to revert to the default finish_compare_by_pieces, or =
provide an
alternate implementation, but at the moment this patch improves the code =
that
GCC generates.  Very little (in software like GCC) is forever.

> Do you see matching this in ifcvt to be unreasonable?  I'm thinking of =
"reducing"
> the incoming edges pairwise without actually looking at the ifcvt =
code.

There's nothing about the proposed patch that prevents or blocks =
improvements
to ifcvt, which I agree completely could be improved.  But even (in =
future) if later
RTL passes could clean things up, that's no reason for RTL expansion to =
initially
generate poor/inefficient code.  I'm not sure that a (hypothetical) =
ifcvt improvement
would be sufficient reason to revert/remove enhancements to =
compare_by_pieces.

Is it that there's not enough (bootstrap and) testsuite coverage of =
compare_by_pieces
to make you feel comfortable with this change?  The proposed =
implementation should
be "obvious enough" to future generations what the intended behaviour =
should be.
And the x86 target hook implementation (i.e. the change) is only four =
lines long, a
fraction of the size of the new documentation and comments.

Thanks in advance.
Roger


> > 2023-06-12  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386.cc (ix86_finish_compare_by_pieces): New
> >         function to provide a backend specific implementation.
> >         (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.
> >
> >         * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New =
@hook.
> >         * doc/tm.texi: Regenerate.
> >
> >         * expr.cc (compare_by_pieces): Call finish_compare_by_pieces =
in
> >         targetm to finalize the RTL expansion.  Move the current
> >         implementation to a default target hook.
> >         * target.def (finish_compare_by_pieces): New target hook to =
allow
> >         compare_by_pieces to be customized by the target.
> >         * targhooks.cc (default_finish_compare_by_pieces): Default
> >         implementation moved here from expr.cc's compare_by_pieces.
> >         * targhooks.h (default_finish_compare_by_pieces): Prototype.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/pieces-memcmp-1.c: New test case.
> >