From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x92d.google.com (mail-ua1-x92d.google.com [IPv6:2607:f8b0:4864:20::92d]) by sourceware.org (Postfix) with ESMTPS id 080643858D20 for ; Mon, 12 Jun 2023 18:46:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 080643858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ua1-x92d.google.com with SMTP id a1e0cc1a2514c-78775a5a84eso100258241.0 for ; Mon, 12 Jun 2023 11:46:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686595612; x=1689187612; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rR80f8Gh7FhVeqpKLCrvz3BOgbU6ETuIJVh65bE7f3k=; b=NMYUdEH6dCbH20aKPgH1RAmcj+9y21iJSCrbw+EEtwC6Rg7NyHX7Txlce883xVzBEx 8Gq5TZAZBCSPn01IeJ23TSkTKP15GPM86HmoqCn5n7io3Z1rdG4Y3cHvKc6husXh4OQU pY7Iq/5rk4uMripyZ/YtkxHGuVarrJen2tOWVwr0WHXJSvTre6pOqHgzhjYWdpphtGFW 3VpfeowziGjiBjIK68cC4MljxnDbJ1cojUrgHy4ux6dLhxFCLX4FLDgNGowIMU0struF eE1gVSjJv1Lpl2PdzYLhLGpqP9Q/mf4mN7b7PuJMCh+DvU49IsXW28PhXitv8gZVg66a WEbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686595612; x=1689187612; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rR80f8Gh7FhVeqpKLCrvz3BOgbU6ETuIJVh65bE7f3k=; b=Vo2/lW3Pr/oazo7jDaiXRB4ENsDrAfP90BEg9HRU0hHeBaMvdOcx3o6T38uEURL9xg RRLwDy//8uU+tmouE5XTti5qQapp4Y/ruaZUewzlj0T5n+C09r57NrtWYcLTgOq3Si3V PrHDxkUnPNs+3MWauDrTROaVuxKzelC7NjJeKEiiA6XhTQD63X9nIFhx+yyidopWEzWQ UdEtHHrKhwoCLTMgG1v8hwQw/Eq8nTkNz+dDamgj5xeeUQqWjkJnPs+Q8bCRmiDacnj0 pN56ty1grZzUADJ4AGu6QnBTEu3/ukhkdnmECWgQekwJQZ2HXXzQobEtKJpgtxIEsIuF IjsA== X-Gm-Message-State: AC+VfDyNNEidvneMdrzeSGzCbme7/1HvtN0n9eYHb2Eb0WY2BliiVPd0 CuPGRhv8Cshc++FWqKVijxqWQ6l9x+LOwlGMZCg= X-Google-Smtp-Source: ACHHUZ5GAXa/G/q0KJ5T5HeZFKXQ2dGx59k3WNErGYYVsuI6soe1w22sV2MC3Spsm9ERnkzn4wZn925BcqPemSBHQvo= X-Received: by 2002:a1f:5f53:0:b0:45c:d69f:bb89 with SMTP id t80-20020a1f5f53000000b0045cd69fbb89mr4974880vkb.5.1686595612077; Mon, 12 Jun 2023 11:46:52 -0700 (PDT) MIME-Version: 1.0 References: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com> In-Reply-To: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com> From: Uros Bizjak Date: Mon, 12 Jun 2023 20:46:40 +0200 Message-ID: Subject: Re: [PATCH] New finish_compare_by_pieces target hook (for x86). To: Roger Sayle Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jun 12, 2023 at 4:03=E2=80=AFPM Roger Sayle wrote: > > > The following simple test case, from PR 104610, shows that memcmp () =3D= =3D 0 > can result in some bizarre code sequences on x86. > > int foo(char *a) > { > static const char t[] =3D "0123456789012345678901234567890"; > return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0; > } > > with -O2 currently contains both: > xorl %eax, %eax > xorl $1, %eax > and also > movl $1, %eax > xorl $1, %eax > > Changing the return type of foo to _Bool results in the equally > bizarre: > xorl %eax, %eax > testl %eax, %eax > sete %al > and also > movl $1, %eax > testl %eax, %eax > sete %al > > All these sequences set the result to a constant, but this optimization > opportunity only occurs very late during compilation, by basic block > duplication in the 322r.bbro pass, too late for CSE or peephole2 to > do anything about it. The problem is that the idiom expanded by > compare_by_pieces for __builtin_memcmp_eq contains basic blocks that > can't easily be optimized by if-conversion due to the multiple > incoming edges on the fail block. > > In summary, compare_by_pieces generates code that looks like: > > if (x[0] !=3D y[0]) goto fail_label; > if (x[1] !=3D y[1]) goto fail_label; > ... > if (x[n] !=3D y[n]) goto fail_label; > result =3D 1; > goto end_label; > fail_label: > result =3D 0; > end_label: > > In theory, the RTL if-conversion pass could be enhanced to tackle > arbitrarily complex if-then-else graphs, but the solution proposed > here is to allow suitable targets to perform if-conversion during > compare_by_pieces. The x86, for example, can take advantage that > all of the above comparisons set and test the zero flag (ZF), which > can then be used in combination with sete. Hence compare_by_pieces > could instead generate: > > if (x[0] !=3D y[0]) goto fail_label; > if (x[1] !=3D y[1]) goto fail_label; > ... > if (x[n] !=3D y[n]) goto fail_label; > fail_label: > sete result > > which requires one less basic block, and the redundant conditional > branch to a label immediately after is cleaned up by GCC's existing > RTL optimizations. > > For the test case above, where -O2 -msse4 previously generated: > > foo: movdqu (%rdi), %xmm0 > pxor .LC0(%rip), %xmm0 > ptest %xmm0, %xmm0 > je .L5 > .L2: movl $1, %eax > xorl $1, %eax > ret > .L5: movdqu 16(%rdi), %xmm0 > pxor .LC1(%rip), %xmm0 > ptest %xmm0, %xmm0 > jne .L2 > xorl %eax, %eax > xorl $1, %eax > ret > > we now generate: > > foo: movdqu (%rdi), %xmm0 > pxor .LC0(%rip), %xmm0 > ptest %xmm0, %xmm0 > jne .L2 > movdqu 16(%rdi), %xmm0 > pxor .LC1(%rip), %xmm0 > ptest %xmm0, %xmm0 > .L2: sete %al > movzbl %al, %eax > ret > > Using a target hook allows the large amount of intelligence already in > compare_by_pieces to be re-used by the i386 backend, but this can also > help other backends with condition flags where the equality result can > be materialized. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=3Dunix{-m32} > with no new failures. Ok for mainline? > > > 2023-06-12 Roger Sayle > > gcc/ChangeLog > * config/i386/i386.cc (ix86_finish_compare_by_pieces): New > function to provide a backend specific implementation. > (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function. > > * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook. > * doc/tm.texi: Regenerate. > > * expr.cc (compare_by_pieces): Call finish_compare_by_pieces in > targetm to finalize the RTL expansion. Move the current > implementation to a default target hook. > * target.def (finish_compare_by_pieces): New target hook to allow > compare_by_pieces to be customized by the target. > * targhooks.cc (default_finish_compare_by_pieces): Default > implementation moved here from expr.cc's compare_by_pieces. > * targhooks.h (default_finish_compare_by_pieces): Prototype. > > gcc/testsuite/ChangeLog > * gcc.target/i386/pieces-memcmp-1.c: New test case. This patch needs middle-end approval first. Uros. > > > Thanks in advance, > Roger > -- >