From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=RvHa=CB=gmail.com=richard.guenther@sourceware.org>
Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631])
	by sourceware.org (Postfix) with ESMTPS id CD8983858D32
	for <gcc-patches@gcc.gnu.org>; Tue, 13 Jun 2023 11:09:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CD8983858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ej1-x631.google.com with SMTP id a640c23a62f3a-98220bb31c6so298205866b.3
        for <gcc-patches@gcc.gnu.org>; Tue, 13 Jun 2023 04:09:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1686654596; x=1689246596;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=T44xE5VXg5EreCXblXJhHue3Zx7zRenBwc0DVfCS3Dg=;
        b=A2jVQvB2IoV7+D8tIy/ivh4Hd7TX5c8G5UnpAwwbs3pC/K4vVSH0JKQ9Hpd4vZIcrE
         ULussT5pYwfeDZXTU0I79TsDwSoCOF8Ut5rPBsasZQImEHGp77r9+xb5xhACoa99uvtI
         4kQ1Ei8dmgavcp20ISJ68Nm+PUzqQBevzAcIFaUPTwrZmrcX1D5zLumbjcamXG4mz4In
         TxfXu6Zzk5n1+1ukeUkSr7aY7E8T5IMoGq6Nl0mxx4TfeTqIfG6Qx4sWdKH3fK/8zvRz
         7dy0qxn6RiiwEBicm71g706ojbc4MuSDvnArPzP3uv5LUAgcXxy15qVmKQYqTjA80BC6
         XHZg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1686654596; x=1689246596;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=T44xE5VXg5EreCXblXJhHue3Zx7zRenBwc0DVfCS3Dg=;
        b=DLqjIEGtdb6DN0reTiD5b5rC9/nNVwMLk3cixatbrKIEiR30Cy+Uc4g9yS9OKbO5sH
         E9mXsjc3R2lz5M+FHFiyJq74ZHKYZtPUVYJeuaphx08oMemXTzX74YUrbVNmxFCNLryG
         OgEfsRiVcN70syMkkyOmH2OOqQXimS98cqzMAbnkOz5Q1mteYDWyyOqYOuUNlJgIdjZ4
         ZFBdr1vaEdy4z9NNTEKrZUGQAj2krDM9w2igaGcTtR9xpO2O/B3xPnVUEa8IjQO+73V8
         T+FTiJ0vtg9/gXtsEmNFeyutG/CuIFcuYstuHqEDO/RTdoIOdegrdPITLxoWPQcDTZuT
         Q3rw==
X-Gm-Message-State: AC+VfDybGKyLwmrW7rt52m6x9+2Y2E2q+K2bj0lEkx8ZTm9A1zQKjuEi
	3QuLlmrad38BdnFKK1n4LQyDaKM2k+STl2UtiObvzPLq
X-Google-Smtp-Source: ACHHUZ4CqA+fUuF7OkIrQtM3Dtoi9ggQyGCeZIDhvVpbm6XvXZi9kEfdbLV+DWoX/KXoHkgjYG9HowDXFCc7i3PqlCI=
X-Received: by 2002:a2e:86d9:0:b0:2aa:4550:916c with SMTP id
 n25-20020a2e86d9000000b002aa4550916cmr4231099ljj.53.1686654276395; Tue, 13
 Jun 2023 04:04:36 -0700 (PDT)
MIME-Version: 1.0
References: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com>
In-Reply-To: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 13 Jun 2023 13:02:08 +0200
Message-ID: <CAFiYyc1Jo8CTT3_BOQ9gq_yfH4brZwkB3oS_QV4QWC568GDbWg@mail.gmail.com>
Subject: Re: [PATCH] New finish_compare_by_pieces target hook (for x86).
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org, Uros Bizjak <ubizjak@gmail.com>, 
	Jakub Jelinek <jakub@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Jun 12, 2023 at 4:04=E2=80=AFPM Roger Sayle <roger@nextmovesoftware=
.com> wrote:
>
>
> The following simple test case, from PR 104610, shows that memcmp () =3D=
=3D 0
> can result in some bizarre code sequences on x86.
>
> int foo(char *a)
> {
>     static const char t[] =3D "0123456789012345678901234567890";
>     return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0;
> }
>
> with -O2 currently contains both:
>         xorl    %eax, %eax
>         xorl    $1, %eax
> and also
>         movl    $1, %eax
>         xorl    $1, %eax
>
> Changing the return type of foo to _Bool results in the equally
> bizarre:
>         xorl    %eax, %eax
>         testl   %eax, %eax
>         sete    %al
> and also
>         movl    $1, %eax
>         testl   %eax, %eax
>         sete    %al
>
> All these sequences set the result to a constant, but this optimization
> opportunity only occurs very late during compilation, by basic block
> duplication in the 322r.bbro pass, too late for CSE or peephole2 to
> do anything about it.  The problem is that the idiom expanded by
> compare_by_pieces for __builtin_memcmp_eq contains basic blocks that
> can't easily be optimized by if-conversion due to the multiple
> incoming edges on the fail block.
>
> In summary, compare_by_pieces generates code that looks like:
>
>         if (x[0] !=3D y[0]) goto fail_label;
>         if (x[1] !=3D y[1]) goto fail_label;
>         ...
>         if (x[n] !=3D y[n]) goto fail_label;
>         result =3D 1;
>         goto end_label;
> fail_label:
>         result =3D 0;
> end_label:
>
> In theory, the RTL if-conversion pass could be enhanced to tackle
> arbitrarily complex if-then-else graphs, but the solution proposed
> here is to allow suitable targets to perform if-conversion during
> compare_by_pieces.  The x86, for example, can take advantage that
> all of the above comparisons set and test the zero flag (ZF), which
> can then be used in combination with sete.  Hence compare_by_pieces
> could instead generate:
>
>         if (x[0] !=3D y[0]) goto fail_label;
>         if (x[1] !=3D y[1]) goto fail_label;
>         ...
>         if (x[n] !=3D y[n]) goto fail_label;
> fail_label:
>         sete result
>
> which requires one less basic block, and the redundant conditional
> branch to a label immediately after is cleaned up by GCC's existing
> RTL optimizations.
>
> For the test case above, where -O2 -msse4 previously generated:
>
> foo:    movdqu  (%rdi), %xmm0
>         pxor    .LC0(%rip), %xmm0
>         ptest   %xmm0, %xmm0
>         je      .L5
> .L2:    movl    $1, %eax
>         xorl    $1, %eax
>         ret
> .L5:    movdqu  16(%rdi), %xmm0
>         pxor    .LC1(%rip), %xmm0
>         ptest   %xmm0, %xmm0
>         jne     .L2
>         xorl    %eax, %eax
>         xorl    $1, %eax
>         ret
>
> we now generate:
>
> foo:    movdqu  (%rdi), %xmm0
>         pxor    .LC0(%rip), %xmm0
>         ptest   %xmm0, %xmm0
>         jne     .L2
>         movdqu  16(%rdi), %xmm0
>         pxor    .LC1(%rip), %xmm0
>         ptest   %xmm0, %xmm0
> .L2:    sete    %al
>         movzbl  %al, %eax
>         ret
>
> Using a target hook allows the large amount of intelligence already in
> compare_by_pieces to be re-used by the i386 backend, but this can also
> help other backends with condition flags where the equality result can
> be materialized.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=3Dunix{-m32}
> with no new failures.  Ok for mainline?

What's the guarantee that the zero flag is appropriately set on all
edges incoming now and forever?  Does this require target specific
knowledge on how do_compare_rtx_and_jump is emitting RTL?

Do you see matching this in ifcvt to be unreasonable?  I'm thinking
of "reducing" the incoming edges pairwise without actually looking
at the ifcvt code.

Thanks,
Richard.

>
> 2023-06-12  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.cc (ix86_finish_compare_by_pieces): New
>         function to provide a backend specific implementation.
>         (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.
>
>         * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook.
>         * doc/tm.texi: Regenerate.
>
>         * expr.cc (compare_by_pieces): Call finish_compare_by_pieces in
>         targetm to finalize the RTL expansion.  Move the current
>         implementation to a default target hook.
>         * target.def (finish_compare_by_pieces): New target hook to allow
>         compare_by_pieces to be customized by the target.
>         * targhooks.cc (default_finish_compare_by_pieces): Default
>         implementation moved here from expr.cc's compare_by_pieces.
>         * targhooks.h (default_finish_compare_by_pieces): Prototype.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/pieces-memcmp-1.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>