public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <jlaw@ventanamicro.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Jivan Hakobyan <jivanhakobyan9@gmail.com>,
	richard.sandiford@arm.com
Subject: Re: [RFA] New pass for sign/zero extension elimination
Date: Wed, 22 Nov 2023 10:59:04 -0700	[thread overview]
Message-ID: <929d7ab8-7ca2-4cd7-a68b-1a24862c7077@ventanamicro.com> (raw)
In-Reply-To: <mpt8r6smgba.fsf@arm.com>



On 11/20/23 11:26, Richard Sandiford wrote:

>> +
>> +/* If we know the destination of CODE only uses some low bits
>> +   (say just the QI bits of an SI operation), then return true
>> +   if we can propagate the need for just the subset of bits
>> +   from the destination to the sources.  */
>> +
>> +static bool
>> +safe_for_live_propagation (rtx_code code)
>> +{
>> +  /* First handle rtx classes which as a whole are known to
>> +     be either safe or unsafe.  */
>> +  switch (GET_RTX_CLASS (code))
>> +    {
>> +      case RTX_OBJ:
>> +	return true;
>> +
>> +      case RTX_COMPARE:
>> +      case RTX_COMM_COMPARE:
>> +      case RTX_TERNARY:
> 
> I suppose operands 1 and 2 of an IF_THEN_ELSE would be safe.
Yes.  The only downside is we'd need to special case IF_THEN_ELSE 
because it doesn't apply to operand 0.  Right now we're pretty 
conservative with anything other than binary codes.  Comment added about 
the possibility of handling I-T-E as well.



> 
> This made me wonder: is this safe for !TRULY_NOOP_TRUNCATION?  But I
> suppose it is.  What !TRULY_NOOP_TRUNCATION models is that the target
> mode has a canonical form that must be maintained, and wouldn't be by
> a plain subreg.  So TRULY_NOOP_TRUNCATION is more of an issue for
> consumers of the liveness information, rather than the computing the
> liveness information itself.
Really interesting question.  I think ext-dce is safe.  As you note this 
is more a consumer side question and on the consumer side we don't muck 
with TRUNCATE at all.




>> +    case SS_TRUNCATE:
>> +    case US_TRUNCATE:
>> +    case PLUS:
>> +    case MULT:
>> +    case SS_MULT:
>> +    case US_MULT:
>> +    case SMUL_HIGHPART:
>> +    case UMUL_HIGHPART:
>> +    case AND:
>> +    case IOR:
>> +    case XOR:
>> +    case SS_PLUS:
>> +    case US_PLUS:
> 
> I don't think it's safe to propagate through saturating ops.
> They don't have the property that (x op y)%z == (x%z op x%z)%z
Yea, you're probably right.  Removed.



>> +
>> +	  /* We don't support vector destinations or destinations
>> +	     wider than DImode.   It is safe to continue this loop.
>> +	     At worst, it will leave things live which could have
>> +	     been made dead.  */
>> +	  if (VECTOR_MODE_P (GET_MODE (x)) || GET_MODE (x) > E_DImode)
>> +	    continue;
> 
> The E_DImode comparison hard-codes an assumption about the order of
> the mode enum.  How about using something like:
Guilty as charged :-)  Not surprised you called that out.



> 
> 	  scalar_int_mode outer_mode;
> 	  if (!is_a<scalar_int_mode> (GET_MODE (x), &outer_mode)
> 	      || GET_MODE_BITSIZE (outer_mode) > 64)
> 	    continue;
Wouldn't we also want to verify that the size is constant, or is it the 
case that all the variable cases are vector (and would we want to 
actually depend on that)?

> 
> The other continues use iter.skip_subrtxes (); when continuing.
> I don't think it matters for correctness whether we do that or not,
> since SETs and CLOBBERs shouldn't be nested.  But skipping should
> be faster.
My thought on not skipping the sub-rtxs in this case was to make sure we 
processed things like memory addresses which could have embedded side 
effects.  It probably doesn't matter in practice though.

> 
> Maybe it would be worth splitting the SET/CLOBBER code out into > a subfunction, to make the loop iteration easier to handle?
Yea, it could use another round of such work.  In the originalm set and 
use handling were one big function which drove me nuts.



>> +	      /* Transfer all the LIVENOW bits for X into LIVE_TMP.  */
>> +	      HOST_WIDE_INT rn = REGNO (SUBREG_REG (x));
>> +	      for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + 4; i++)
>> +		if (bitmap_bit_p (livenow, i))
>> +		  bitmap_set_bit (live_tmp, i);
>> +
>> +	      /* The mode of the SUBREG tells us how many bits we can
>> +		 clear.  */
>> +	      machine_mode mode = GET_MODE (x);
>> +	      HOST_WIDE_INT size = GET_MODE_SIZE (mode).to_constant ();
>> +	      bitmap_clear_range (livenow, 4 * rn, size);
> 
> Is clearing SIZE bytes correct?  Feels like it should be clearing
> something like log2 (size) + 1 instead.
Yea, I think you're right.  Fixed.



>> +	      bit = SUBREG_BYTE (x).to_constant () * BITS_PER_UNIT;
>> +	      if (WORDS_BIG_ENDIAN)
>> +		bit = (GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x))).to_constant ()
>> +		       - GET_MODE_BITSIZE (GET_MODE (x)).to_constant () - bit);
>> +
>> +	      /* Catch big endian correctness issues rather than triggering
>> +		 undefined behavior.  */
>> +	      gcc_assert (bit < sizeof (HOST_WIDE_INT) * 8);
> 
> This could probably use subreg_lsb, to avoid the inline endianness adjustment.
That's the routine I was looking for!  The original totally mucked up 
the endianness adjustment and I kept thinking we must have an existing 
routine to do this for us but didn't find it immediately, so I just 
banged out a trivial endianness adjustment.

> 
>> +
>> +	      mask = GET_MODE_MASK (GET_MODE (SUBREG_REG (x))) << bit;
>> +	      if (!mask)
>> +		mask = -0x100000000ULL;
> 
> Not sure I follow this.  What does the -0x100000000ULL constant indicate?
> Also, isn't it the mask of the outer register that is shifted, rather
> than the mask of the inner mode?  E.g. if we have:
Inherited.  I should have marked it like the other one as needing 
investigation.  Probably the fastest way is to just rip it out for a 
test to see what breaks.

>> +	  /* Some ports generate (clobber (const_int)).  */
>> +	  else if (CONST_INT_P (x))
> 
> Ugh.  What on earth do they do that for?  I thought that (with const0_rtx)
> was reserved as combine's "something went wrong" representation.
I stumbled over it on avr I think.  I didn't dig into why other than to 
note it occurred in an insn pattern as opposed to rtl templates for 
generation from define_expands, splitters, etc.




>> +/* INSN has a sign/zero extended source inside SET that we will
>> +   try to turn into a SUBREG.  */
>> +static void
>> +ext_dce_try_optimize_insn (rtx_insn *insn, rtx set, bitmap changed_pseudos)
>> +{
>> +  rtx src = SET_SRC (set);
>> +  rtx inner = XEXP (src, 0);
>> +
>> +  /* Avoid (subreg (mem)) and other constructs which may are valid RTL, but
> 
> s/are// or s/may are/may be/
Fixed.  "may be".


>> +
>> +/* Process uses in INSN.  Set appropriate bits in LIVENOW for any chunks of
>> +   pseudos that become live, potentially filtering using bits from LIVE_TMP.
>> +
>> +   If MODIFIED is true, then optimize sign/zero extensions to SUBREGs when
> 
> MODIFY
Fixed.


>> +
>> +	  /* ?!? How much of this should mirror SET handling, potentially
>> +	     being shared?   */
>> +	  if (SUBREG_BYTE (dst).is_constant () && SUBREG_P (dst))
>> +	    {
>> +	      bit = SUBREG_BYTE (dst).to_constant () * BITS_PER_UNIT;
>> +	      if (WORDS_BIG_ENDIAN)
>> +		bit = (GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (dst))).to_constant ()
>> +		       - GET_MODE_BITSIZE (GET_MODE (dst)).to_constant () - bit);
> 
> Same subreg_lsb thing here.
Yea, I'll fix them all.  There's 2 or 3 IIRC.


>> +
>> +	      /* ?!? What is the point of this adjustment to DST_MASK?  */
>> +	      if (code == PLUS || code == MINUS
>> +		  || code == MULT || code == ASHIFT)
>> +		dst_mask
>> +		  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
> 
> Yeah, sympathise with the ?!? here :)
Inherited.  Like the other bit of magic I think I'll do a test with them 
pulled out to see if I can make something undesirable trigger.

> 
>> +	      /* We will handle the other operand of a binary operator
>> +		 at the bottom of the loop by resetting Y.  */
>> +	      if (BINARY_P (src))
>> +		y = XEXP (src, 0);
> 
> What about UNARY_P, given that NOT is included in the codes above?
We'll break that inner for(;;) then iterate into the subobject, marking 
the relevant bits live.  FWIW, the control flow of this code continues 
to be my biggest concern from a maintenance standpoint.  Figuring it out 
was a major pain and I've tried to document what is and what is not 
safe.  But it's still painful to walk through.

I pondered if note_uses/note_stores would be better, but concluded we'd 
just end up with a ton of state objects to carry around and reasoning 
about that would be just as hard.


> 
>> +	      else
>> +		y = src;
>> +
>> +	      /* We're inside a SET and want to process the source operands
>> +		 making things live.  Breaking from this loop will cause
>> +		 the iterator to work on sub-rtxs, so it is safe to break
>> +		 if we see something we don't know how to handle.  */
>> +	      for (;;)
>> +		{
>> +		  /* Strip an outer STRICT_LOW_PART or paradoxical subreg.
>> +		     That has the effect of making the whole referenced
>> +		     register live.  We might be able to avoid that for
>> +		     STRICT_LOW_PART at some point.  */
>> +		  if (GET_CODE (x) == STRICT_LOW_PART
>> +		      || paradoxical_subreg_p (x))
>> +		    x = XEXP (x, 0);
> 
> Isn't "x" still the set at this point?  If it was intended to be "y",
> then I thought STRICT_LOW_PART was for destinations only.
I'll have to go back through the history.  Something definitely does not 
look right here.

> 

>> +		  else if (!CONSTANT_P (y))
>> +		    break;
>> +		  /* We might have (ashift (const_int 1) (reg...)) */
>> +		  else if (CONSTANT_P (y)
>> +			   && binop_implies_op2_fully_live (GET_CODE (src)))
> 
> The CONSTANT_P (y) condition is redundant given the above.
> But couldn't the binop_implies_op2_fully_live be shared by both paths?
> Maybe it could be tested...
Yea, that was a fun little twisty maze.  THe CONSTANT_P is definitely 
reundant.  I had a bunch of fixes related to shifts in the code 
initially, but Jivan and I slowly cleaned up and refactored over time, 
but I think there's still some work to do here.



>> +
>> +  do
>> +    {
>> +      qin = qout = worklist;
>> +
>> +      /* Put every block on the worklist.  */
>> +      int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>> +      int n = inverted_rev_post_order_compute (cfun, rpo);
> 
> Does this need to be recalculated each iteration, or could it be done
> outside the loop?
I think it can/should move outside the loop.  Done.


> 
> Was also wondering if this could use a DF_BACKWARD df_simple_dataflow
> (with its precomputed order), rather than doing the worklist management
> manually.  If df_simple_dataflow isn't efficient enough then we should
> try to fix that...
I'll have to take a look.  I actually never really had to look at this 
code as all the fixes/cleanups were in the sets/uses processing side. 
The liveness iteration and such "just worked".

Thanks a ton for all the feedback.  I've got several of the fixes done 
and I'll throw them into testing today and get a V2 out as a few people 
are playing with this.  V2 won't have fixes for everything you've 
pointed out, so it probably won't be worth the time to review the V2 
knowing a V3 is inevitable.

jeff

  reply	other threads:[~2023-11-22 17:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20  0:47 Jeff Law
2023-11-20  1:22 ` Oleg Endo
2023-11-20  2:51   ` Jeff Law
2023-11-20  2:57     ` Oleg Endo
2023-11-20  2:23 ` Xi Ruoyao
2023-11-20  2:46   ` Jeff Law
2023-11-20  2:52   ` Jeff Law
2023-11-20  3:32     ` Xi Ruoyao
2023-11-20  3:48       ` Jeff Law
2023-11-20 18:26 ` Richard Sandiford
2023-11-22 17:59   ` Jeff Law [this message]
2023-11-27 20:15     ` Richard Sandiford
2023-11-20 18:56 ` Dimitar Dimitrov
2023-11-22 22:23   ` Jeff Law
2023-11-26 16:42     ` rep.dot.nop
2023-11-27 16:14       ` Jeff Law
2023-11-27 11:30 ` Andrew Stubbs
2023-11-27 16:16   ` Jeff Law
2023-12-01  1:08 ` Hans-Peter Nilsson
2023-12-01 15:09   ` Jeff Law
2023-12-01 16:17     ` Hans-Peter Nilsson
2023-11-27 17:36 Joern Rennecke
2023-11-27 17:57 ` Joern Rennecke
2023-11-27 20:03   ` Richard Sandiford
2023-11-27 20:18     ` Jeff Law
2023-11-28 13:36       ` Joern Rennecke
2023-11-28 14:09         ` Joern Rennecke
2023-11-30 17:33         ` Jeff Law
2023-11-28 13:13     ` Joern Rennecke
2023-11-28  5:50 ` Jeff Law
2023-11-27 18:19 Joern Rennecke
2023-11-28  5:51 ` Jeff Law
2023-11-29 17:37 Joern Rennecke
2023-11-29 19:13 ` Jivan Hakobyan
2023-11-30 15:37 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=929d7ab8-7ca2-4cd7-a68b-1a24862c7077@ventanamicro.com \
    --to=jlaw@ventanamicro.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jivanhakobyan9@gmail.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).