public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Jakub Jelinek'" <jakub@redhat.com>,
	"'Jeff Law'" <jeffreyalaw@gmail.com>,
	"'Segher Boessenkool'" <segher@kernel.crashing.org>
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH] Fix RTL simplifications of FFS, POPCOUNT and PARITY.
Date: Sun, 26 Feb 2023 13:10:41 -0000	[thread overview]
Message-ID: <010401d949e3$bca11f20$35e35d60$@nextmovesoftware.com> (raw)
In-Reply-To: <Y/TloNvzI0S8Du4n@tucnak>

[-- Attachment #1: Type: text/plain, Size: 9566 bytes --]


To assist Jakub's request for progress on this, I've re-tested this patch
(which
still applies cleanly) on x86_64-pc-linux-gnu, again without any
regressions.
This is a middle-end piece of my preferred solution to PR106594, which is a
P1 regression affecting aarch64.

This patch  teaches simplify-rtx.cc to err on the side of caution, by never
creating (new) FFS, POPCOUNT or PARITY rtx with mismatched modes,
matching the documentation.

I've a related (optional) backend patch for nvptx that was posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609571.html
but this isn't strictly required, and is effectively a clean-up.

The RISC-V vector popcount instructions mentioned by Jakub are a similar
representation problem.   I'd suggest that the preferred way to represent
these in RTL is as a vector-to-vector popcount, followed by a sum reduction
[RISC-V already has a vector sum reduction insn/UNSPEC].  This would, in
theory, allow constant folding of this instruction for known operands, but
unfortunately, (I believe) GCC doesn't (yet) have an RTX for representing
(sum) reductions (from vectors to scalars)?.

This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2023-02-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * simplify-rtx.cc (simplify_unary_operation_1) <case FFS>:
        Avoid generating FFS with mismatched operand and result modes,
        by using an explicit SIGN_EXTEND/ZERO_EXTEND.
        <case POPCOUNT>: Likewise, for POPCOUNT of ZERO_EXTEND.
        <case PARITY>: Likewise, for PARITY of {ZERO,SIGN}_EXTEND.

Thanks in advance,
Roger

> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: 21 February 2023 15:39
> To: Jeff Law <jeffreyalaw@gmail.com>; 'Segher Boessenkool'
> <segher@kernel.crashing.org>; Roger Sayle <roger@nextmovesoftware.com>
> Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] Fix RTL simplifications of FFS, POPCOUNT and PARITY.
> 
> On Sun, Jan 01, 2023 at 03:55:26PM -0000, Roger Sayle wrote:
> > In 2011, the rtl.texi documentation was updated to reflect that the
> > modes of the RTX unary operations FFS, POPCOUNT and PARITY must match
> > those of their operands.  Unfortunately, some of the transformations
> > in simplify-rtx.cc predate this tightening of RTL semantics, and have
> > not (until now) been updated/fixed.  i.e. The POPCOUNT and PARITY
> > optimizations were "correct" when I originally added them back in 2007.
> >
> > Segher requested that I split this piece out from a fix for PR 106594
> > in
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-01-01  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > 	* gcc/simplify-rtx.cc (simplify_unary_operation_1) <case FFS>:
> 
> BTW, gcc/ prefix shouldn't be in the ChangeLog entry.
> 
> > 	Avoid generating FFS with mismatched operand and result modes, by
> > 	using an explicit SIGN_EXTEND/ZERO_EXTEND instead.
> > 	<case POPCOUNT>: Likewise, for POPCOUNT of ZERO_EXTEND.
> > 	<case PARITY>: Likewise, for PARITY of {ZERO,SIGN}_EXTEND.
> 
> Can we make progress on this?
> I've looked with
> make mddump
> grep '(\(ffs\|popcount\|parity\):' tmp-mddump.md | grep -v
> '(\(ffs\|popcount\|parity\):\([A-Z0-9x]*\) (match_operand:\2'
> at various targets.  This prints nothing at all on aarch64, arm, gcn,
powerpc,
> rl78, visium, on x86 it prints
>             (popcount:SI (zero_extend:SI (match_operand:HI 1
> ("nonimmediate_operand") ("")))))
>                     (popcount:SI (match_dup 1)))
>                     (popcount:DI (match_dup 1)))
>                     (and:DI (subreg:DI (popcount:SI (match_dup 1)) 0)
>                     (zero_extend:DI (popcount:SI (match_dup 1))))
>             (popcount:SI (zero_extend:SI (match_operand:HI 1
> ("nonimmediate_operand") ("")))))
>                     (popcount:SI (match_dup 0))) where the (popcount:SI
(zero_extend:SI
> cases are *popcounthi2_1 define_insn_and_split but we have *popcounthi2_2
> which has (zero_extend:SI (popcount:HI and so is ok, and I've manually
checked
> the match_dup cases and the match_operand corresponding to match_dup has
> the right mode, on mips it prints
>             (popcount:SI (truncate:SI (match_operand:DI 1
("register_operand")
> ("d"))))) which is also ok, on sparc
>             (truncate:SI (popcount:DI (match_dup 2)))) in popcountsi2 also
looks ok,
> on s390x
>                     (popcount:DI (match_dup 2)))
>                     (popcount:DI (match_dup 2))) where I've also manually
verified
> popcountsi2 and popcounthi2 to be ok.
> And then riscv, see below.
> Now, for the above arches, everything thus matches the rtl.texi
documentation
> and popcount/parity/ffs match or create only matching modes between the
RTL
> and its operand and Roger's patch seems to be IMHO the way to go, because
> what simplify-rtx.cc does now for those cases the patch touches is that it
will
> always result in something that won't match because it is invalid.  With
the patch
> there is at least some chance it might match something.
> 
> Now, riscv is the only target for which I have tmp-mddump.md around and
> which doesn't have matching modes in the machine description:
>             (popcount:SI (unspec:VNx1BI [
>             (popcount:SI (unspec:VNx2BI [
>             (popcount:SI (unspec:VNx4BI [
>             (popcount:SI (unspec:VNx8BI [
>             (popcount:SI (unspec:VNx16BI [
>             (popcount:SI (unspec:VNx32BI [
>             (popcount:SI (unspec:VNx64BI [
>             (popcount:DI (unspec:VNx1BI [
>             (popcount:DI (unspec:VNx2BI [
>             (popcount:DI (unspec:VNx4BI [
>             (popcount:DI (unspec:VNx8BI [
>             (popcount:DI (unspec:VNx16BI [
>             (popcount:DI (unspec:VNx32BI [
>             (popcount:DI (unspec:VNx64BI [
>             (plus:SI (ffs:SI (unspec:VNx1BI [
>             (plus:SI (ffs:SI (unspec:VNx2BI [
>             (plus:SI (ffs:SI (unspec:VNx4BI [
>             (plus:SI (ffs:SI (unspec:VNx8BI [
>             (plus:SI (ffs:SI (unspec:VNx16BI [
>             (plus:SI (ffs:SI (unspec:VNx32BI [
>             (plus:SI (ffs:SI (unspec:VNx64BI [
>             (plus:DI (ffs:DI (unspec:VNx1BI [
>             (plus:DI (ffs:DI (unspec:VNx2BI [
>             (plus:DI (ffs:DI (unspec:VNx4BI [
>             (plus:DI (ffs:DI (unspec:VNx8BI [
>             (plus:DI (ffs:DI (unspec:VNx16BI [
>             (plus:DI (ffs:DI (unspec:VNx32BI [
>             (plus:DI (ffs:DI (unspec:VNx64BI [ While invalid against
current
> documentation and to me it isn't clear what they actually do, sum up
popcounts
> of all vector elements and return index of first non-zero element + ffs in
it or
> something similar, I think Roger's patch shouldn't change anything for
these
> either because those patterns use unspecs with vector modes, so there
won't be
> zero_extend/sign_extend from thos to integral mode that the patch could
swap
> around.
> 
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index
> > fc0d6c3..698ca6e 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -1404,22 +1404,32 @@ simplify_context::simplify_unary_operation_1
> (rtx_code code, machine_mode mode,
> >        break;
> >
> >      case FFS:
> > -      /* (ffs (*_extend <X>)) = (ffs <X>) */
> > +      /* (ffs (*_extend <X>)) = (*_extend (ffs <X>)).  */
> >        if (GET_CODE (op) == SIGN_EXTEND
> >  	  || GET_CODE (op) == ZERO_EXTEND)
> > -	return simplify_gen_unary (FFS, mode, XEXP (op, 0),
> > -				   GET_MODE (XEXP (op, 0)));
> > +	{
> > +	  temp = simplify_gen_unary (FFS, GET_MODE (XEXP (op, 0)),
> > +				     XEXP (op, 0), GET_MODE (XEXP (op, 0)));
> > +	  return simplify_gen_unary (GET_CODE (op), mode, temp,
> > +				     GET_MODE (temp));
> > +	}
> >        break;
> >
> >      case POPCOUNT:
> >        switch (GET_CODE (op))
> >  	{
> >  	case BSWAP:
> > -	case ZERO_EXTEND:
> > -	  /* (popcount (zero_extend <X>)) = (popcount <X>) */
> > +	  /* (popcount (bswap <X>)) = (popcount <X>).  */
> >  	  return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0),
> >  				     GET_MODE (XEXP (op, 0)));
> >
> > +	case ZERO_EXTEND:
> > +	  /* (popcount (zero_extend <X>)) = (zero_extend (popcount <X>)).
*/
> > +	  temp = simplify_gen_unary (POPCOUNT, GET_MODE (XEXP (op, 0)),
> > +				     XEXP (op, 0), GET_MODE (XEXP (op, 0)));
> > +	  return simplify_gen_unary (ZERO_EXTEND, mode, temp,
> > +				     GET_MODE (temp));
> > +
> >  	case ROTATE:
> >  	case ROTATERT:
> >  	  /* Rotations don't affect popcount.  */ @@ -1438,11 +1448,16 @@
> > simplify_context::simplify_unary_operation_1 (rtx_code code,
machine_mode
> mode,
> >  	{
> >  	case NOT:
> >  	case BSWAP:
> > -	case ZERO_EXTEND:
> > -	case SIGN_EXTEND:
> >  	  return simplify_gen_unary (PARITY, mode, XEXP (op, 0),
> >  				     GET_MODE (XEXP (op, 0)));
> >
> > +	case ZERO_EXTEND:
> > +	case SIGN_EXTEND:
> > +	  temp = simplify_gen_unary (PARITY, GET_MODE (XEXP (op, 0)),
> > +				     XEXP (op, 0), GET_MODE (XEXP (op, 0)));
> > +	  return simplify_gen_unary (GET_CODE (op), mode, temp,
> > +				     GET_MODE (temp));
> > +
> >  	case ROTATE:
> >  	case ROTATERT:
> >  	  /* Rotations don't affect parity.  */
> 
> 
> 	Jakub


[-- Attachment #2: patchaa4c.txt --]
[-- Type: text/plain, Size: 2076 bytes --]

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 3955929..2c82256 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1404,22 +1404,32 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
       break;
 
     case FFS:
-      /* (ffs (*_extend <X>)) = (ffs <X>) */
+      /* (ffs (*_extend <X>)) = (*_extend (ffs <X>)).  */
       if (GET_CODE (op) == SIGN_EXTEND
 	  || GET_CODE (op) == ZERO_EXTEND)
-	return simplify_gen_unary (FFS, mode, XEXP (op, 0),
-				   GET_MODE (XEXP (op, 0)));
+	{
+	  temp = simplify_gen_unary (FFS, GET_MODE (XEXP (op, 0)),
+				     XEXP (op, 0), GET_MODE (XEXP (op, 0)));
+	  return simplify_gen_unary (GET_CODE (op), mode, temp,
+				     GET_MODE (temp));
+	}
       break;
 
     case POPCOUNT:
       switch (GET_CODE (op))
 	{
 	case BSWAP:
-	case ZERO_EXTEND:
-	  /* (popcount (zero_extend <X>)) = (popcount <X>) */
+	  /* (popcount (bswap <X>)) = (popcount <X>).  */
 	  return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0),
 				     GET_MODE (XEXP (op, 0)));
 
+	case ZERO_EXTEND:
+	  /* (popcount (zero_extend <X>)) = (zero_extend (popcount <X>)).  */
+	  temp = simplify_gen_unary (POPCOUNT, GET_MODE (XEXP (op, 0)),
+				     XEXP (op, 0), GET_MODE (XEXP (op, 0)));
+	  return simplify_gen_unary (ZERO_EXTEND, mode, temp,
+				     GET_MODE (temp));
+
 	case ROTATE:
 	case ROTATERT:
 	  /* Rotations don't affect popcount.  */
@@ -1438,11 +1448,16 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 	{
 	case NOT:
 	case BSWAP:
-	case ZERO_EXTEND:
-	case SIGN_EXTEND:
 	  return simplify_gen_unary (PARITY, mode, XEXP (op, 0),
 				     GET_MODE (XEXP (op, 0)));
 
+	case ZERO_EXTEND:
+	case SIGN_EXTEND:
+	  temp = simplify_gen_unary (PARITY, GET_MODE (XEXP (op, 0)),
+				     XEXP (op, 0), GET_MODE (XEXP (op, 0)));
+	  return simplify_gen_unary (GET_CODE (op), mode, temp,
+				     GET_MODE (temp));
+
 	case ROTATE:
 	case ROTATERT:
 	  /* Rotations don't affect parity.  */

  reply	other threads:[~2023-02-26 13:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-01 15:55 Roger Sayle
2023-01-01 21:03 ` Segher Boessenkool
2023-01-01 22:59   ` roger
2023-01-02 15:45 ` Jeff Law
2023-01-02 15:59   ` Jakub Jelinek
2023-01-02 16:20     ` Jeff Law
2023-01-02 17:22       ` Jakub Jelinek
2023-01-02 17:38         ` Jeff Law
2023-01-03 11:33       ` Segher Boessenkool
2023-01-02 17:30   ` roger
2023-01-02 17:47     ` Jeff Law
2023-02-21 15:39 ` Jakub Jelinek
2023-02-26 13:10   ` Roger Sayle [this message]
2023-02-27 16:47     ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='010401d949e3$bca11f20$35e35d60$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).