public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)
@ 2018-06-04 12:58 Segher Boessenkool
  2018-06-04 20:57 ` Eric Botcazou
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2018-06-04 12:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, Richard Sandiford

Hi!

In the PR we have insns:

Trying 23 -> 24:
   23: r123:SI=zero_extend(r122:HI)
      REG_DEAD r122:HI
   24: [r115:SI]=r123:SI
      REG_DEAD r123:SI

which should be combined to

(set (mem:SI (reg/f:SI 115 [ pretmp_19 ]) [1 *pretmp_19+0 S4 A32])
    (and:SI (subreg:SI (reg:HI 122) 0)
        (const_int 32767 [0x7fff])))

But nonzero_bits of reg:HI 122 is 0x7fff, and nonzero_bits1 thinks it
then also has that same nonzero_bits for the subreg.  This is not
correct: the bit outside of HImode are undefined.  load_extend_op
applies to loads from memory only, not anything else.  Which means the
whole AND is optimised away.

Richard, what do you think?

Tested on the testcase for an arm compiler, and bootstrapped and
regression checked on powerpc64-linux {-m32,-m64} (not that that means
much, WORD_REGISTER_OPERATIONS isn't set there).


Segher


2018-06-04  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/85925
	* rtlanal.c (nonzero_bits1): For any paradoxical subreg of
	something other than memory the high bits are undefined.

---
 gcc/rtlanal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index ac3662d..6f171de 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4742,17 +4742,17 @@ nonzero_bits1 (const_rtx x, scalar_int_mode mode, const_rtx known_x,
 	  nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
 					  known_x, known_mode, known_ret);
 
-          /* On many CISC machines, accessing an object in a wider mode
+	  /* On many CISC machines, accessing an object in a wider mode
 	     causes the high-order bits to become undefined.  So they are
 	     not known to be zero.  */
 	  rtx_code extend_op;
 	  if ((!WORD_REGISTER_OPERATIONS
 	       /* If this is a typical RISC machine, we only have to worry
 		  about the way loads are extended.  */
+	       || !MEM_P (SUBREG_REG (x))
 	       || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
 		   ? val_signbit_known_set_p (inner_mode, nonzero)
-		   : extend_op != ZERO_EXTEND)
-	       || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
+		   : extend_op != ZERO_EXTEND))
 	      && xmode_width > inner_width)
 	    nonzero
 	      |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK (inner_mode));
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)
  2018-06-04 12:58 [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925) Segher Boessenkool
@ 2018-06-04 20:57 ` Eric Botcazou
  2018-06-04 22:12   ` Richard Sandiford
  2018-09-17 22:03   ` Segher Boessenkool
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Botcazou @ 2018-06-04 20:57 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Richard Sandiford

> In the PR we have insns:
> 
> Trying 23 -> 24:
>    23: r123:SI=zero_extend(r122:HI)
>       REG_DEAD r122:HI
>    24: [r115:SI]=r123:SI
>       REG_DEAD r123:SI
> 
> which should be combined to
> 
> (set (mem:SI (reg/f:SI 115 [ pretmp_19 ]) [1 *pretmp_19+0 S4 A32])
>     (and:SI (subreg:SI (reg:HI 122) 0)
>         (const_int 32767 [0x7fff])))
> 
> But nonzero_bits of reg:HI 122 is 0x7fff, and nonzero_bits1 thinks it
> then also has that same nonzero_bits for the subreg.  This is not
> correct: the bit outside of HImode are undefined.  load_extend_op
> applies to loads from memory only, not anything else.  Which means the
> whole AND is optimised away.

No, this is done on purpose for WORD_REGISTER_OPERATIONS targets and your 
patch will pessimize them.  I'm going to have a look at the PR then.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)
  2018-06-04 20:57 ` Eric Botcazou
@ 2018-06-04 22:12   ` Richard Sandiford
  2018-06-04 23:31     ` Eric Botcazou
  2018-09-17 22:03   ` Segher Boessenkool
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2018-06-04 22:12 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Segher Boessenkool, gcc-patches

Eric Botcazou <ebotcazou@adacore.com> writes:
>> In the PR we have insns:
>> 
>> Trying 23 -> 24:
>>    23: r123:SI=zero_extend(r122:HI)
>>       REG_DEAD r122:HI
>>    24: [r115:SI]=r123:SI
>>       REG_DEAD r123:SI
>> 
>> which should be combined to
>> 
>> (set (mem:SI (reg/f:SI 115 [ pretmp_19 ]) [1 *pretmp_19+0 S4 A32])
>>     (and:SI (subreg:SI (reg:HI 122) 0)
>>         (const_int 32767 [0x7fff])))
>> 
>> But nonzero_bits of reg:HI 122 is 0x7fff, and nonzero_bits1 thinks it
>> then also has that same nonzero_bits for the subreg.  This is not
>> correct: the bit outside of HImode are undefined.  load_extend_op
>> applies to loads from memory only, not anything else.  Which means the
>> whole AND is optimised away.
>
> No, this is done on purpose for WORD_REGISTER_OPERATIONS targets and your 
> patch will pessimize them.  I'm going to have a look at the PR then.

I can see why WORD_REGISTER_OPERATIONS allows some REG cases,
but why does LOAD_EXTEND_OP have an effect on them?  The doc says:

@defmac WORD_REGISTER_OPERATIONS
Define this macro to 1 if operations between registers with integral mode
smaller than a word are always performed on the entire register.  To be
more explicit, if you start with a pair of @code{word_mode} registers with
known values and you do a subword, for example @code{QImode}, addition on
the low part of the registers, then the compiler may consider that the
result has a known value in @code{word_mode} too if the macro is defined
to 1.  Most RISC machines have this property and most CISC machines do not.
@end defmac

which doesn't make it sound like LOAD_EXTEND_OP has a direct effect
on the (cached) lhs of an arithmetic operation.

Ah well.  I guess I'm just glad that AArch64 doesn't define this :-)

Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)
  2018-06-04 22:12   ` Richard Sandiford
@ 2018-06-04 23:31     ` Eric Botcazou
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Botcazou @ 2018-06-04 23:31 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Segher Boessenkool

> I can see why WORD_REGISTER_OPERATIONS allows some REG cases,
> but why does LOAD_EXTEND_OP have an effect on them?

LOAD_EXTEND_OP has an effect on all paradoxical SUBREGs because of spilling.  
This was even originally decoupled from WORD_REGISTER_OPERATIONS in reload, 
see this comment from find_reloads:

			 On machines that extend byte operations and we have a
			 SUBREG where both the inner and outer modes are no wider
			 than a word and the inner mode is narrower, is integral,
			 and gets extended when loaded from memory, combine.c has
			 made assumptions about the behavior of the machine in such
			 register access.  If the data is, in fact, in memory we
			 must always load using the size assumed to be in the
			 register and let the insn do the different-sized
			 accesses.

			 This is doubly true if WORD_REGISTER_OPERATIONS.  In
			 this case eliminate_regs has left non-paradoxical
			 subregs for push_reload to see.  Make sure it does
			 by forcing the reload.

> Ah well.  I guess I'm just glad that AArch64 doesn't define this :-)

Nothing even remotely approaching the complexity of the SVE stuff... ;-)

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)
  2018-06-04 20:57 ` Eric Botcazou
  2018-06-04 22:12   ` Richard Sandiford
@ 2018-09-17 22:03   ` Segher Boessenkool
  2018-09-18  8:28     ` Eric Botcazou
  1 sibling, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2018-09-17 22:03 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Richard Sandiford

On Mon, Jun 04, 2018 at 10:57:05PM +0200, Eric Botcazou wrote:
> > In the PR we have insns:
> > 
> > Trying 23 -> 24:
> >    23: r123:SI=zero_extend(r122:HI)
> >       REG_DEAD r122:HI
> >    24: [r115:SI]=r123:SI
> >       REG_DEAD r123:SI
> > 
> > which should be combined to
> > 
> > (set (mem:SI (reg/f:SI 115 [ pretmp_19 ]) [1 *pretmp_19+0 S4 A32])
> >     (and:SI (subreg:SI (reg:HI 122) 0)
> >         (const_int 32767 [0x7fff])))
> > 
> > But nonzero_bits of reg:HI 122 is 0x7fff, and nonzero_bits1 thinks it
> > then also has that same nonzero_bits for the subreg.  This is not
> > correct: the bit outside of HImode are undefined.  load_extend_op
> > applies to loads from memory only, not anything else.  Which means the
> > whole AND is optimised away.
> 
> No, this is done on purpose for WORD_REGISTER_OPERATIONS targets and your 
> patch will pessimize them.  I'm going to have a look at the PR then.

Hi Eric,

Do you have any progress with this?


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)
  2018-09-17 22:03   ` Segher Boessenkool
@ 2018-09-18  8:28     ` Eric Botcazou
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Botcazou @ 2018-09-18  8:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Richard Sandiford

> Do you have any progress with this?

Once I'm done with the recent regressions, I'll get back to this older one.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-09-18  8:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 12:58 [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925) Segher Boessenkool
2018-06-04 20:57 ` Eric Botcazou
2018-06-04 22:12   ` Richard Sandiford
2018-06-04 23:31     ` Eric Botcazou
2018-09-17 22:03   ` Segher Boessenkool
2018-09-18  8:28     ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).