From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 2D32E3858D20 for ; Mon, 26 Jun 2023 18:24:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D32E3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DA6212F4; Mon, 26 Jun 2023 11:25:11 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 064013F64C; Mon, 26 Jun 2023 11:24:26 -0700 (PDT) From: Richard Sandiford To: Richard Biener via Gcc-patches Mail-Followup-To: Richard Biener via Gcc-patches ,Roger Sayle , Richard Biener , Uros Bizjak , Jakub Jelinek , richard.sandiford@arm.com Cc: Roger Sayle , Richard Biener , Uros Bizjak , Jakub Jelinek Subject: Re: [PATCH] New finish_compare_by_pieces target hook (for x86). References: <001001d99d36$a38111c0$ea833540$@nextmovesoftware.com> <001e01d9a727$6375adc0$2a610940$@nextmovesoftware.com> Date: Mon, 26 Jun 2023 19:24:25 +0100 In-Reply-To: (Richard Biener via Gcc-patches's message of "Mon, 26 Jun 2023 09:18:00 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-21.4 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener via Gcc-patches writes: > On Sun, Jun 25, 2023 at 7:39=E2=80=AFAM Roger Sayle wrote: >> >> >> On Tue, 13 June 2023 12:02, Richard Biener wrote: >> > On Mon, Jun 12, 2023 at 4:04=E2=80=AFPM Roger Sayle >> > wrote: >> > > The following simple test case, from PR 104610, shows that memcmp () >> > > =3D=3D 0 can result in some bizarre code sequences on x86. >> > > >> > > int foo(char *a) >> > > { >> > > static const char t[] =3D "0123456789012345678901234567890"; >> > > return __builtin_memcmp(a, &t[0], sizeof(t)) =3D=3D 0; } >> > > >> > > with -O2 currently contains both: >> > > xorl %eax, %eax >> > > xorl $1, %eax >> > > and also >> > > movl $1, %eax >> > > xorl $1, %eax >> > > >> > > Changing the return type of foo to _Bool results in the equally >> > > bizarre: >> > > xorl %eax, %eax >> > > testl %eax, %eax >> > > sete %al >> > > and also >> > > movl $1, %eax >> > > testl %eax, %eax >> > > sete %al >> > > >> > > All these sequences set the result to a constant, but this >> > > optimization opportunity only occurs very late during compilation, by >> > > basic block duplication in the 322r.bbro pass, too late for CSE or >> > > peephole2 to do anything about it. The problem is that the idiom >> > > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic >> > > blocks that can't easily be optimized by if-conversion due to the >> > > multiple incoming edges on the fail block. >> > > >> > > In summary, compare_by_pieces generates code that looks like: >> > > >> > > if (x[0] !=3D y[0]) goto fail_label; >> > > if (x[1] !=3D y[1]) goto fail_label; >> > > ... >> > > if (x[n] !=3D y[n]) goto fail_label; >> > > result =3D 1; >> > > goto end_label; >> > > fail_label: >> > > result =3D 0; >> > > end_label: >> > > >> > > In theory, the RTL if-conversion pass could be enhanced to tackle >> > > arbitrarily complex if-then-else graphs, but the solution proposed >> > > here is to allow suitable targets to perform if-conversion during >> > > compare_by_pieces. The x86, for example, can take advantage that all >> > > of the above comparisons set and test the zero flag (ZF), which can >> > > then be used in combination with sete. Hence compare_by_pieces could >> > > instead generate: >> > > >> > > if (x[0] !=3D y[0]) goto fail_label; >> > > if (x[1] !=3D y[1]) goto fail_label; >> > > ... >> > > if (x[n] !=3D y[n]) goto fail_label; >> > > fail_label: >> > > sete result >> > > >> > > which requires one less basic block, and the redundant conditional >> > > branch to a label immediately after is cleaned up by GCC's existing >> > > RTL optimizations. >> > > >> > > For the test case above, where -O2 -msse4 previously generated: >> > > >> > > foo: movdqu (%rdi), %xmm0 >> > > pxor .LC0(%rip), %xmm0 >> > > ptest %xmm0, %xmm0 >> > > je .L5 >> > > .L2: movl $1, %eax >> > > xorl $1, %eax >> > > ret >> > > .L5: movdqu 16(%rdi), %xmm0 >> > > pxor .LC1(%rip), %xmm0 >> > > ptest %xmm0, %xmm0 >> > > jne .L2 >> > > xorl %eax, %eax >> > > xorl $1, %eax >> > > ret >> > > >> > > we now generate: >> > > >> > > foo: movdqu (%rdi), %xmm0 >> > > pxor .LC0(%rip), %xmm0 >> > > ptest %xmm0, %xmm0 >> > > jne .L2 >> > > movdqu 16(%rdi), %xmm0 >> > > pxor .LC1(%rip), %xmm0 >> > > ptest %xmm0, %xmm0 >> > > .L2: sete %al >> > > movzbl %al, %eax >> > > ret >> > > >> > > Using a target hook allows the large amount of intelligence already = in >> > > compare_by_pieces to be re-used by the i386 backend, but this can al= so >> > > help other backends with condition flags where the equality result c= an >> > > be materialized. >> > > >> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >> > > and make -k check, both with and without --target_board=3Dunix{-m32} >> > > with no new failures. Ok for mainline? >> > >> > What's the guarantee that the zero flag is appropriately set on all ed= ges incoming >> > now and forever? >> >> Is there any reason why this target hook can't be removed (in future) sh= ould it stop >> being useful? It's completely optional and not required for the correct= functioning >> of the compiler. >> >> > Does this require target specific knowledge on how do_compare_rtx_and_= jump >> > is emitting RTL? >> >> Yes. Each backend can decide how best to implement finish_compare_by_pi= eces >> given its internal knowledge of how do_compare_rtx_and_jump works. It's= not >> important to the middle-end how the underlying invariants are guaranteed= , just >> that they are and the backend produces correct code. A backend may stor= e flags >> on the target label, or maintain state in cfun. Future changes to the i= 386 backend >> might cause it to revert to the default finish_compare_by_pieces, or pro= vide an >> alternate implementation, but at the moment this patch improves the code= that >> GCC generates. Very little (in software like GCC) is forever. >> >> > Do you see matching this in ifcvt to be unreasonable? I'm thinking of= "reducing" >> > the incoming edges pairwise without actually looking at the ifcvt code. >> >> There's nothing about the proposed patch that prevents or blocks improve= ments >> to ifcvt, which I agree completely could be improved. But even (in futu= re) if later >> RTL passes could clean things up, that's no reason for RTL expansion to = initially >> generate poor/inefficient code. I'm not sure that a (hypothetical) ifcv= t improvement >> would be sufficient reason to revert/remove enhancements to compare_by_p= ieces. >> >> Is it that there's not enough (bootstrap and) testsuite coverage of comp= are_by_pieces >> to make you feel comfortable with this change? The proposed implementat= ion should >> be "obvious enough" to future generations what the intended behaviour sh= ould be. >> And the x86 target hook implementation (i.e. the change) is only four li= nes long, a >> fraction of the size of the new documentation and comments. > > My main concern was that we are communicating "implicit" dependences betw= een > the target hook and expand RTL code generation - we don't seem to pass do= wn > enough info for example to have the target verify constraints and > excuse itself if > they do not hold. They also seem to be poorly documented and the > compare_by_pieces (and all _by_pieces) stuff has been extended to be quite > "configurable" and thus is probably prone to emit vastly different RTL in > some special cases? FWIW, I share these concerns, especially about the target hook not being able to verify its assumptions. Maybe it wouldn't matter too much if the worst outcome was a missed optimisation, but I don't think the mechanism is reliable enough for us to rely on it for correctness. Richard