From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Uros Bizjak'" <ubizjak@gmail.com>
Cc: <gcc-patches@gcc.gnu.org>
Subject: RE: [x86 PING] Peephole pand;pxor into pandn
Date: Mon, 23 May 2022 11:48:55 +0100 [thread overview]
Message-ID: <016d01d86e92$b42a20d0$1c7e6270$@nextmovesoftware.com> (raw)
In-Reply-To: <CAFULd4YWtpQG7vzqAzm4J=O+tR=JGnOg8+k-nvsU21nPJVoA3Q@mail.gmail.com>
Hi Uros,
Hopefully, if I explain even more of the context, you'll better understand why
this harmless (and at worse seemingly redundant) peephole2 is actually critical
for addressing significant regressions in the compiler without introducing new
testsuite failures. I wouldn't ask (again), if I didn't feel it's important.
Basically, I'm trying to unblock Hongtao's patch (for PR target/104610)
which in your own review, explained is better handled by/during STV:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html
Unfortunately, that patch of mine to STV (that I want to ping next) that solves
the P2 code quality regression PR target/70321, is itself blocked by another
review of yours:
https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html
where this fix (alone) leads to a regression of the test case pr65105-5.c.
This pending regression has nothing to do with TARGET_BMI's andn, but
the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and stv/reload
has decided to place these values in SSE registers.
After combine we have an *anddi3_doubleword and *cmpdi3_doubleword:
(insn 22 21 23 4 (parallel [
(set (reg:DI 97)
(and:DI (reg/v:DI 92 [ p2 ])
(reg:DI 88 [ _25 ])))
(clobber (reg:CC 17 flags))
]) "pr65105-5.c":20:18 530 {*anddi3_doubleword}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 23 22 24 4 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:DI 92 [ p2 ])
(reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword}
(expr_list:REG_DEAD (reg:DI 97)
(nil)))
After STV we have:
(insn 22 21 45 4 (set (subreg:V2DI (reg:DI 97) 0)
(and:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0)
(subreg:V2DI (reg:DI 88 [ _25 ]) 0))) "pr65105-5.c":20:18 6640 {*andv2di3}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 45 22 46 4 (set (reg:V2DI 103)
(xor:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0)
(subreg:V2DI (reg:DI 97) 0))) "pr65105-5.c":20:8 -1
(nil))
(insn 46 45 23 4 (set (reg:V2DI 103)
(vec_select:V2DI (vec_concat:V4DI (reg:V2DI 103)
(reg:V2DI 103))
(parallel [
(const_int 0 [0])
(const_int 2 [0x2])
]))) "pr65105-5.c":20:8 -1
(nil))
(insn 23 46 24 4 (set (reg:CC 17 flags)
(unspec:CC [
(reg:V2DI 103) repeated x2
] UNSPEC_PTEST)) "pr65105-5.c":20:8 7425 {sse4_1_ptestv2di}
(expr_list:REG_DEAD (reg:DI 97)
(nil)))
where the XOR has been introduce to implement the equality,
as P == Q is effectively implemented as (P ^ Q) == 0. At this point,
the only remaining pass that can optimize the pand followed by
the pxor is peephole2.
The requirement to optimize this is from gcc.target/i386/pr65105-5.c
where the desired implementation is explicitly looking for pandn+ptest:
/* { dg-do compile { target ia32 } } */
/* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */
/* { dg-final { scan-assembler "pandn" } } */
/* { dg-final { scan-assembler "pxor" } } */
/* { dg-final { scan-assembler "ptest" } } */
Confusingly, I've even more patches in the queue/backlog for this part
of the compiler (it's an air traffic control problem, fallout from stage 4).
And of course, very many thanks for the various andn related patches
that have already been approved/committed to the backend, to avoid
potential regressions related to code size (-Os and -Oz). It's a long road
with many steps.
Might you reconsider? Pretty please?
Roger
--
> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 23 May 2022 10:11
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PING] Peephole pand;pxor into pandn
>
> On Mon, May 23, 2022 at 10:59 AM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> >
> > Hi Uros,
> >
> > Thanks for the speedy review. The point of this patch is that (with
> > pending changes to STV) the pand;pxor sequence isn't created until
> > after combine, and hence doesn't/won't get caught by any of the
> > current pre-reload/combine splitters.
>
> IMO this happens due to inconsistencies between integer and vector set, where
> integer andn is absent without BMI. However, we don't re-run the combine after
> reload, and I don't think it is worth to reimplement it via peephole2 patterns.
> Please note that AVX allows much more combinations that are not catched by
> your patch, and considering that combine already does the transformation, I
> don't see a compelling reason for this very specialized peephole2.
>
> Let's keep the patch shelved until a testcase shows the benefits of the patch.
>
> Uros.
>
> >
> >
> > > -----Original Message-----
> > > From: Uros Bizjak <ubizjak@gmail.com>
> > > Sent: 23 May 2022 09:51
> > > To: Roger Sayle <roger@nextmovesoftware.com>
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [x86 PING] Peephole pand;pxor into pandn
> > >
> > > On Mon, May 23, 2022 at 10:44 AM Roger Sayle
> > > <roger@nextmovesoftware.com> wrote:
> > > >
> > > >
> > > > This is a ping of a patch from April (a dependency of another stage1 patch):
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html
> > > >
> > > > This patch has been refreshed/retested against gcc 13 trunk on
> > > > x86_64-pc-linux-gnu with make bootstrap and make -k check, both
> > > > with and without --target_board=unix{-m32}, with no new failures.
> > > > Ok for mainline?
> > >
> > > I think this should be handled in a pre-reload splitter (or perhaps
> > > combine splitter). We have so many variants of SSE/AVX logic
> > > instructions that the transform after reload barely makes sense
> > > (please see the number of regno checks in the proposed patch).
> > >
> > > Uros.
> > >
> > > > 2022-05-23 Roger Sayle <roger@nextmovesoftware.com>
> > > >
> > > > gcc/ChangeLog
> > > > * config/i386/sse.md (peephole2): Convert suitable pand followed
> > > > by pxor into pandn, i.e. (X&Y)^X into X & ~Y.
> > > >
> > > > Many thanks in advance,
> > > > Roger
> > > > --
> > > >
> >
next prev parent reply other threads:[~2022-05-23 10:48 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-23 8:44 Roger Sayle
2022-05-23 8:50 ` Uros Bizjak
2022-05-23 8:59 ` Roger Sayle
2022-05-23 9:11 ` Uros Bizjak
2022-05-23 10:48 ` Roger Sayle [this message]
2022-05-23 11:28 ` Uros Bizjak
2022-05-23 12:16 ` Uros Bizjak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='016d01d86e92$b42a20d0$1c7e6270$@nextmovesoftware.com' \
--to=roger@nextmovesoftware.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).