From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=HDHF=CO=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233])
	by sourceware.org (Postfix) with ESMTPS id 251443858D20
	for <gcc-patches@gcc.gnu.org>; Mon, 26 Jun 2023 07:21:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 251443858D20
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2b6a0d91e80so8476361fa.3
        for <gcc-patches@gcc.gnu.org>; Mon, 26 Jun 2023 00:21:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1687764065; x=1690356065;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=mRygsnlg22xlkjSdzJR3mcrJ5aioQFuCNHfln/aJnmo=;
        b=VYpgpxtpJLcvKOEY3Ys5bBLM0xZHdq5D3ddLTBKq4un3rE0YTZc5ywnSp9NSBpELwI
         ErMvqR15kzn+/E1gSKCiIygJHUw+ecTBGy6xxBGUGuywWtIvTZz9jIcPd5ovwaCvQT9X
         Z6d1mdKEWMeZlt0ojefHU+kFWDfemMhvEFJe1d+q24D3CMr5LtlxnsMqaHESbS5c5UoZ
         92fTIvrnJ356D8NW2KJ079fDC1NrGMSWChqwjDl57TwA2hHX842/EUjQxfE7AR1/vpRJ
         I2MvXMQS6jWMnN/YfgPg+xxNKxqzCr1L42ykcnwwsbMHPqs7OO+OAr/M0H53QsKdD6kl
         M5UA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1687764065; x=1690356065;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=mRygsnlg22xlkjSdzJR3mcrJ5aioQFuCNHfln/aJnmo=;
        b=QHfy4pp8TyWQ+4L2yRbP4YDHKDJbQtG9lPG9REffT0yqcpWAQCJYK9vJWUSHWSWz3i
         Zi8ezmPMyk/enrRvVvHMOQr+hx8c/bxd4Q0OnuRyD97JY1GXQPddYv0nMhfqFPl9rYhs
         YjrZbYYX5Sfde9n/gWTbOz1Nn1s5fc5Kh98pXa041L/ixCBgOUiuypF91Tap0YqjV6ix
         NqpHigEAhTUuC1Dhyv2SpuzjB1Eram3zmEu4Dr2r7prOhl+aV13fH5Ark/6DinticDFx
         Qv67Jtk2oKyqu0j0vBXVZzeVyYO+FU57akeXAmJwTeLdKVd+TYisa3SwpKuT+fxIFi8k
         qoEQ==
X-Gm-Message-State: AC+VfDzLzK8MQUdnAGaeqZcLqDF4veIlMQTHSjvrKRFiBoU1j4IeOrgB
	dbqzOplIXiYL5q9aQPw6lU3ZR61FRy1Uxp2oawU=
X-Google-Smtp-Source: ACHHUZ40nUCaD91CfolaE29Vx3LVO/BZx7sxIYu/Et7l5YdfMcMZBLIRNIjW5jgx4oCChufxSVaiD+DTfja/uegRvv4=
X-Received: by 2002:a2e:8855:0:b0:2b5:fef9:6ad6 with SMTP id
 z21-20020a2e8855000000b002b5fef96ad6mr2474294ljj.25.1687764064979; Mon, 26
 Jun 2023 00:21:04 -0700 (PDT)
MIME-Version: 1.0
References: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com>
 <CAFiYyc1Jo8CTT3_BOQ9gq_yfH4brZwkB3oS_QV4QWC568GDbWg@mail.gmail.com> <001e01d9a727$6375adc0$2a610940$@nextmovesoftware.com>
In-Reply-To: <001e01d9a727$6375adc0$2a610940$@nextmovesoftware.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 26 Jun 2023 09:18:00 +0200
Message-ID: <CAFiYyc2Z0e0z-mOo7n0rmaqQ85toSqdn1pFKXh-sE+pGC6UBrQ@mail.gmail.com>
Subject: Re: [PATCH] New finish_compare_by_pieces target hook (for x86).
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org, Uros Bizjak <ubizjak@gmail.com>, 
	Jakub Jelinek <jakub@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Sun, Jun 25, 2023 at 7:39=E2=80=AFAM Roger Sayle <roger@nextmovesoftware=
.com> wrote:
>
>
> On Tue, 13 June 2023 12:02, Richard Biener wrote:
> > On Mon, Jun 12, 2023 at 4:04=E2=80=AFPM Roger Sayle <roger@nextmovesoft=
ware.com>
> > wrote:
> > > The following simple test case, from PR 104610, shows that memcmp ()
> > > =3D=3D 0 can result in some bizarre code sequences on x86.
> > >
> > > int foo(char *a)
> > > {
> > >     static const char t[] =3D "0123456789012345678901234567890";
> > >     return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0; }
> > >
> > > with -O2 currently contains both:
> > >         xorl    %eax, %eax
> > >         xorl    $1, %eax
> > > and also
> > >         movl    $1, %eax
> > >         xorl    $1, %eax
> > >
> > > Changing the return type of foo to _Bool results in the equally
> > > bizarre:
> > >         xorl    %eax, %eax
> > >         testl   %eax, %eax
> > >         sete    %al
> > > and also
> > >         movl    $1, %eax
> > >         testl   %eax, %eax
> > >         sete    %al
> > >
> > > All these sequences set the result to a constant, but this
> > > optimization opportunity only occurs very late during compilation, by
> > > basic block duplication in the 322r.bbro pass, too late for CSE or
> > > peephole2 to do anything about it.  The problem is that the idiom
> > > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic
> > > blocks that can't easily be optimized by if-conversion due to the
> > > multiple incoming edges on the fail block.
> > >
> > > In summary, compare_by_pieces generates code that looks like:
> > >
> > >         if (x[0] !=3D y[0]) goto fail_label;
> > >         if (x[1] !=3D y[1]) goto fail_label;
> > >         ...
> > >         if (x[n] !=3D y[n]) goto fail_label;
> > >         result =3D 1;
> > >         goto end_label;
> > > fail_label:
> > >         result =3D 0;
> > > end_label:
> > >
> > > In theory, the RTL if-conversion pass could be enhanced to tackle
> > > arbitrarily complex if-then-else graphs, but the solution proposed
> > > here is to allow suitable targets to perform if-conversion during
> > > compare_by_pieces.  The x86, for example, can take advantage that all
> > > of the above comparisons set and test the zero flag (ZF), which can
> > > then be used in combination with sete.  Hence compare_by_pieces could
> > > instead generate:
> > >
> > >         if (x[0] !=3D y[0]) goto fail_label;
> > >         if (x[1] !=3D y[1]) goto fail_label;
> > >         ...
> > >         if (x[n] !=3D y[n]) goto fail_label;
> > > fail_label:
> > >         sete result
> > >
> > > which requires one less basic block, and the redundant conditional
> > > branch to a label immediately after is cleaned up by GCC's existing
> > > RTL optimizations.
> > >
> > > For the test case above, where -O2 -msse4 previously generated:
> > >
> > > foo:    movdqu  (%rdi), %xmm0
> > >         pxor    .LC0(%rip), %xmm0
> > >         ptest   %xmm0, %xmm0
> > >         je      .L5
> > > .L2:    movl    $1, %eax
> > >         xorl    $1, %eax
> > >         ret
> > > .L5:    movdqu  16(%rdi), %xmm0
> > >         pxor    .LC1(%rip), %xmm0
> > >         ptest   %xmm0, %xmm0
> > >         jne     .L2
> > >         xorl    %eax, %eax
> > >         xorl    $1, %eax
> > >         ret
> > >
> > > we now generate:
> > >
> > > foo:    movdqu  (%rdi), %xmm0
> > >         pxor    .LC0(%rip), %xmm0
> > >         ptest   %xmm0, %xmm0
> > >         jne     .L2
> > >         movdqu  16(%rdi), %xmm0
> > >         pxor    .LC1(%rip), %xmm0
> > >         ptest   %xmm0, %xmm0
> > > .L2:    sete    %al
> > >         movzbl  %al, %eax
> > >         ret
> > >
> > > Using a target hook allows the large amount of intelligence already i=
n
> > > compare_by_pieces to be re-used by the i386 backend, but this can als=
o
> > > help other backends with condition flags where the equality result ca=
n
> > > be materialized.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=3Dunix{-m32}
> > > with no new failures.  Ok for mainline?
> >
> > What's the guarantee that the zero flag is appropriately set on all edg=
es incoming
> > now and forever?
>
> Is there any reason why this target hook can't be removed (in future) sho=
uld it stop
> being useful?  It's completely optional and not required for the correct =
functioning
> of the compiler.
>
> > Does this require target specific knowledge on how do_compare_rtx_and_j=
ump
> > is emitting RTL?
>
> Yes.  Each backend can decide how best to implement finish_compare_by_pie=
ces
> given its internal knowledge of how do_compare_rtx_and_jump works.  It's =
not
> important to the middle-end how the underlying invariants are guaranteed,=
 just
> that they are and the backend produces correct code.  A backend may store=
 flags
> on the target label, or maintain state in cfun.  Future changes to the i3=
86 backend
> might cause it to revert to the default finish_compare_by_pieces, or prov=
ide an
> alternate implementation, but at the moment this patch improves the code =
that
> GCC generates.  Very little (in software like GCC) is forever.
>
> > Do you see matching this in ifcvt to be unreasonable?  I'm thinking of =
"reducing"
> > the incoming edges pairwise without actually looking at the ifcvt code.
>
> There's nothing about the proposed patch that prevents or blocks improvem=
ents
> to ifcvt, which I agree completely could be improved.  But even (in futur=
e) if later
> RTL passes could clean things up, that's no reason for RTL expansion to i=
nitially
> generate poor/inefficient code.  I'm not sure that a (hypothetical) ifcvt=
 improvement
> would be sufficient reason to revert/remove enhancements to compare_by_pi=
eces.
>
> Is it that there's not enough (bootstrap and) testsuite coverage of compa=
re_by_pieces
> to make you feel comfortable with this change?  The proposed implementati=
on should
> be "obvious enough" to future generations what the intended behaviour sho=
uld be.
> And the x86 target hook implementation (i.e. the change) is only four lin=
es long, a
> fraction of the size of the new documentation and comments.

My main concern was that we are communicating "implicit" dependences betwee=
n
the target hook and expand RTL code generation - we don't seem to pass down
enough info for example to have the target verify constraints and
excuse itself if
they do not hold.  They also seem to be poorly documented and the
compare_by_pieces (and all _by_pieces) stuff has been extended to be quite
"configurable" and thus is probably prone to emit vastly different RTL in
some special cases?

That said, I'm not familiar enough with the compare_by_pices logic to say t=
hat
this looks obviously safe, but maybe it is.

Yes - all target hooks have this kind of "implicit" dependence on how
the calling
code works, so I guess the main thing is that it's not obvious how the call=
ing
code works in this case?

Anyway, I'm not standing in the way of somebody approving this change - the=
 code
generation improvements are definitely worth it.

Maybe it's the abstraction boundary that makes me worry - can we make the
target simply indicate whether a do_compare_rtx_and_jump sets a flag
(maybe query the CC mode used?) and have the generic code then emit
the setcc?  The calling code would then know it emitted suitable CC produci=
ng
compare and branch and it would emit the corresponding setcc with the prope=
r
CCmode?

Richard.

> Thanks in advance.
> Roger
>
>
> > > 2023-06-12  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * config/i386/i386.cc (ix86_finish_compare_by_pieces): New
> > >         function to provide a backend specific implementation.
> > >         (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.
> > >
> > >         * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook=
.
> > >         * doc/tm.texi: Regenerate.
> > >
> > >         * expr.cc (compare_by_pieces): Call finish_compare_by_pieces =
in
> > >         targetm to finalize the RTL expansion.  Move the current
> > >         implementation to a default target hook.
> > >         * target.def (finish_compare_by_pieces): New target hook to a=
llow
> > >         compare_by_pieces to be customized by the target.
> > >         * targhooks.cc (default_finish_compare_by_pieces): Default
> > >         implementation moved here from expr.cc's compare_by_pieces.
> > >         * targhooks.h (default_finish_compare_by_pieces): Prototype.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.target/i386/pieces-memcmp-1.c: New test case.
> > >
>
>