public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Segher Boessenkool'" <segher@kernel.crashing.org>
Cc: <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
Date: Fri, 29 Jul 2022 07:57:51 +0100	[thread overview]
Message-ID: <041201d8a318$8850d110$98f27330$@nextmovesoftware.com> (raw)
In-Reply-To: <20220727202336.GE25951@gate.crashing.org>


Hi Segher,
 
> On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> > This patch implements some additional zero-extension and
> > sign-extension related optimizations in simplify-rtx.cc.  The original
> > motivation comes from PR rtl-optimization/71775, where in comment #2
> Andrew Pinski sees:
> >
> > Failed to match this instruction:
> > (set (reg:DI 88 [ _1 ])
> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> >
> > On many platforms the result of DImode CTZ is constrained to be a
> > small unsigned integer (between 0 and 64), hence the truncation to
> > 32-bits (using a SUBREG) and the following sign extension back to
> > 64-bits are effectively a no-op, so the above should ideally (often)
> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> 
> And you can also do that if ctz is undefined for a zero argument!

Forgive my perhaps poor use of terminology.  The case of ctz 0 on
x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that
would allow us to do anything, but implementation defined (which
Intel calls "undefined" in their documentation).  Hence, we don't
know which DI value is placed in the result register.  In this case,
truncating to SI mode, then sign extending the result is not a no-op,
as the top bits will/must now all be the same [though admittedly to an
unknown undefined signbit].  Hence the above optimization would 
be invalid, as it doesn't guarantee the result would be sign-extended.

> > To implement this, and some closely related transformations, we build
> > upon the existing val_signbit_known_clear_p predicate.  In the first
> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> > bit set,
> 
> Is that guaranteed in all cases?  Also at -O0, also for args bigger than
> 64 bits?

val_signbit_known_clear_p should work for any size/precision arg.
I'm not sure if the results are affected by -O0, but even if they are, this
will
not affect correctness only whether these optimizations are performed,
which is precisely what -O0 controls.
 
> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> > +      if (GET_CODE (op) == SUBREG
> > +	  && subreg_lowpart_p (op)
> > +	  && GET_MODE (SUBREG_REG (op)) == mode
> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
> (int_mode)
> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> 
> (spaces around >> please)

Doh! Good catch, thanks.

> Please use val_signbit_known_{set,clear}_p?

Alas, it's not just the SI mode's signbit that we care about, but all of the
bits above it in the DImode operand/result.  These all need to be zero,
for the operand to already be zero-extended/sign_extended.

> > +	return SUBREG_REG (op);
> 
> Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the
> value it returns in its second arg does not survive sign extending
unmodified (if it
> is 0xffffffff for an extend from SI to DI for example).

Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative
result, such as -1 is already handled (accounted for) in nonzero_bits.  The
relevant
code in rtlanal.cc's nonzero_bits1 is:

    case CTZ:
      /* If CTZ has a known value at zero, then the nonzero bits are
         that value, plus the number of bits in the mode minus one.  */
      if (CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
        nonzero
          |= (HOST_WIDE_INT_1U << (floor_log2 (mode_width))) - 1;
      else
        nonzero = -1;
      break;

Hence, any bits set by the constant returned by the target's
DEFINED_VALUE_AT_ZERO will be set in the result of nonzero_bits.
So if this is negative, say -1, then val_signbit_known_clear_p (or the
more complex tests above) will return false.

I'm currently bootstrapping and regression testing the whitespace 
change/correction suggested above.

Thanks,
Roger
--



  reply	other threads:[~2022-07-29  6:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-27 13:42 Roger Sayle
2022-07-27 20:23 ` Segher Boessenkool
2022-07-29  6:57   ` Roger Sayle [this message]
2022-08-02 20:07     ` Segher Boessenkool
2022-07-29  7:28   ` Roger Sayle
2022-08-02  9:38 ` Richard Sandiford
2022-08-02 11:55   ` [PATCH take #2] " Roger Sayle
2022-08-02 13:39     ` Richard Sandiford
2022-08-02 16:46       ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='041201d8a318$8850d110$98f27330$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).