[RFC] Further LRA subreg handling issues

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] Further LRA subreg handling issues
@ 2017-01-16 15:48 Matthew Fortune
  2017-01-19 23:31 ` Vladimir Makarov
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Fortune @ 2017-01-16 15:48 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	Eric Botcazou (ebotcazou@adacore.com),
	gcc

Hi Vladimir,

I'm working on PR target/78660 which is looking like a latent LRA bug.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660

I believe the problem is in the same area as a bug was fixed in 2015:

https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02165.html

Eric pointed out that the new issue relates to something reload
specifically dealt with in reload1.c:eliminate_regs_1:

	  if (MEM_P (new_rtx)
	      && ((x_size < new_size
		   /* On RISC machines, combine can create rtl of the form
		      (set (subreg:m1 (reg:m2 R) 0) ...)
		      where m1 < m2, and expects something interesting to
		      happen to the entire word.  Moreover, it will use the
		      (reg:m2 R) later, expecting all bits to be preserved.
		      So if the number of words is the same, preserve the
		      subreg so that push_reload can see it.  */
		   && !(WORD_REGISTER_OPERATIONS
			&& (x_size - 1) / UNITS_PER_WORD
			   == (new_size -1 ) / UNITS_PER_WORD))
		  || x_size == new_size)
	      )
	    return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
	  else
	    return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));

However the code in lra-constraints.c:curr_insn_transform does not appear
to make any attempt to handle a special case for WORD_REGISTER_OPERATIONS.
I tried the following patch to account for this, which 'works' but I'm not
at all sure what the conditions should be (the comment from reload will
need adapting and including as well):

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 260591a..ac8d116 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4086,7 +4086,9 @@ curr_insn_transform (bool check_only_p)
 			  && (goal_alt[i] == NO_REGS
 			      || (simplify_subreg_regno
 				  (ira_class_hard_regs[goal_alt[i]][0],
-				   GET_MODE (reg), byte, mode) >= 0)))))
+				   GET_MODE (reg), byte, mode) >= 0)))
+		      || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
+			  && WORD_REGISTER_OPERATIONS)))
 		{
 		  if (type == OP_OUT)
 		    type = OP_INOUT;

I think at the very least the issue Richard pointed out in the previous
fix must be dealt with as the new testcase triggers exactly what he
described I believe

Richard Sandiford wrote:
> So IMO the patch is too broad.  I think it should only use INOUT reloads
> for !strict_low if the inner mode is wider than a word and the outer mode
> is strictly narrower than the inner mode.  That's on top of Vlad's
> comment about only converting OP_OUTs, of course.

And here is my attempt at dealing with that:

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index ac8d116..8a0f40f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4090,7 +4090,17 @@ curr_insn_transform (bool check_only_p)
 		      || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
 			  && WORD_REGISTER_OPERATIONS)))
 		{
-		  if (type == OP_OUT)
+		  /* An OP_INOUT is required when reloading a subreg of a
+		     mode wider than a word to ensure that data beyond the
+		     word being reloaded is preserved.  Also automatically
+		     ensure that strict_low_part reloads are made into
+		     OP_INOUT which should already be true from the backend
+		     constraints.  */
+		  if (type == OP_OUT
+		      && (curr_static_id->operand[i].strict_low
+			  || (GET_MODE_SIZE (GET_MODE (reg)) > UNITS_PER_WORD
+			      && GET_MODE_SIZE (mode)
+				 < GET_MODE_SIZE (GET_MODE (reg)))))
 		    type = OP_INOUT;
 		  loc = &SUBREG_REG (*loc);
 		  mode = GET_MODE (*loc);

Any thoughts on whether this is along the right track would be appreciated.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Further LRA subreg handling issues
  2017-01-16 15:48 [RFC] Further LRA subreg handling issues Matthew Fortune
@ 2017-01-19 23:31 ` Vladimir Makarov
  2017-01-19 23:41   ` Matthew Fortune
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Makarov @ 2017-01-19 23:31 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	Eric Botcazou (ebotcazou@adacore.com),
	gcc



On 01/16/2017 10:47 AM, Matthew Fortune wrote:
> Hi Vladimir,
>
> I'm working on PR target/78660 which is looking like a latent LRA bug.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660
>
> I believe the problem is in the same area as a bug was fixed in 2015:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02165.html
>
> Eric pointed out that the new issue relates to something reload
> specifically dealt with in reload1.c:eliminate_regs_1:
>
> 	  if (MEM_P (new_rtx)
> 	      && ((x_size < new_size
> 		   /* On RISC machines, combine can create rtl of the form
> 		      (set (subreg:m1 (reg:m2 R) 0) ...)
> 		      where m1 < m2, and expects something interesting to
> 		      happen to the entire word.  Moreover, it will use the
> 		      (reg:m2 R) later, expecting all bits to be preserved.
> 		      So if the number of words is the same, preserve the
> 		      subreg so that push_reload can see it.  */
> 		   && !(WORD_REGISTER_OPERATIONS
> 			&& (x_size - 1) / UNITS_PER_WORD
> 			   == (new_size -1 ) / UNITS_PER_WORD))
> 		  || x_size == new_size)
> 	      )
> 	    return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
> 	  else
> 	    return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
>
> However the code in lra-constraints.c:curr_insn_transform does not appear
> to make any attempt to handle a special case for WORD_REGISTER_OPERATIONS.
> I tried the following patch to account for this, which 'works' but I'm not
> at all sure what the conditions should be (the comment from reload will
> need adapting and including as well):
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 260591a..ac8d116 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -4086,7 +4086,9 @@ curr_insn_transform (bool check_only_p)
>   			  && (goal_alt[i] == NO_REGS
>   			      || (simplify_subreg_regno
>   				  (ira_class_hard_regs[goal_alt[i]][0],
> -				   GET_MODE (reg), byte, mode) >= 0)))))
> +				   GET_MODE (reg), byte, mode) >= 0)))
> +		      || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
> +			  && WORD_REGISTER_OPERATIONS)))
>   		{
>   		  if (type == OP_OUT)
>   		    type = OP_INOUT;
>
> I think at the very least the issue Richard pointed out in the previous
> fix must be dealt with as the new testcase triggers exactly what he
> described I believe
>
> Richard Sandiford wrote:
>> So IMO the patch is too broad.  I think it should only use INOUT reloads
>> for !strict_low if the inner mode is wider than a word and the outer mode
>> is strictly narrower than the inner mode.  That's on top of Vlad's
>> comment about only converting OP_OUTs, of course.
> And here is my attempt at dealing with that:
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index ac8d116..8a0f40f 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -4090,7 +4090,17 @@ curr_insn_transform (bool check_only_p)
>   		      || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
>   			  && WORD_REGISTER_OPERATIONS)))
>   		{
> -		  if (type == OP_OUT)
> +		  /* An OP_INOUT is required when reloading a subreg of a
> +		     mode wider than a word to ensure that data beyond the
> +		     word being reloaded is preserved.  Also automatically
> +		     ensure that strict_low_part reloads are made into
> +		     OP_INOUT which should already be true from the backend
> +		     constraints.  */
> +		  if (type == OP_OUT
> +		      && (curr_static_id->operand[i].strict_low
> +			  || (GET_MODE_SIZE (GET_MODE (reg)) > UNITS_PER_WORD
> +			      && GET_MODE_SIZE (mode)
> +				 < GET_MODE_SIZE (GET_MODE (reg)))))
>   		    type = OP_INOUT;
>   		  loc = &SUBREG_REG (*loc);
>   		  mode = GET_MODE (*loc);
>
> Any thoughts on whether this is along the right track would be appreciated.
>
>
Mathew, sorry for the delay with the answer.  I needed some time to 
think about it.  Dealing with subregs in lra/reload is a complicated and 
sensitive area.

The patch looks ok for me.  You can commit it if of course there are no 
regressions.  I hope the patch will behave well but please, be prepared 
to work more on it if there are complications. Sometimes such changes on 
LRA need a few iterations.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] Further LRA subreg handling issues
  2017-01-19 23:31 ` Vladimir Makarov
@ 2017-01-19 23:41   ` Matthew Fortune
  2017-01-20  7:12     ` Eric Botcazou
  2017-01-20 22:44     ` Eric Botcazou
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Fortune @ 2017-01-19 23:41 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	Eric Botcazou (ebotcazou@adacore.com),
	gcc

Vladimir Makarov <vmakarov@redhat.com> writes:
> On 01/16/2017 10:47 AM, Matthew Fortune wrote:
> > Hi Vladimir,
> >
> > I'm working on PR target/78660 which is looking like a latent LRA bug.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660
> >
> > I believe the problem is in the same area as a bug was fixed in 2015:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02165.html
> >
> > Eric pointed out that the new issue relates to something reload
> > specifically dealt with in reload1.c:eliminate_regs_1:
> >
> > 	  if (MEM_P (new_rtx)
> > 	      && ((x_size < new_size
> > 		   /* On RISC machines, combine can create rtl of the form
> > 		      (set (subreg:m1 (reg:m2 R) 0) ...)
> > 		      where m1 < m2, and expects something interesting to
> > 		      happen to the entire word.  Moreover, it will use the
> > 		      (reg:m2 R) later, expecting all bits to be preserved.
> > 		      So if the number of words is the same, preserve the
> > 		      subreg so that push_reload can see it.  */
> > 		   && !(WORD_REGISTER_OPERATIONS
> > 			&& (x_size - 1) / UNITS_PER_WORD
> > 			   == (new_size -1 ) / UNITS_PER_WORD))
> > 		  || x_size == new_size)
> > 	      )
> > 	    return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
> > 	  else
> > 	    return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
> >
> > However the code in lra-constraints.c:curr_insn_transform does not appear
> > to make any attempt to handle a special case for WORD_REGISTER_OPERATIONS.
> > I tried the following patch to account for this, which 'works' but I'm not
> > at all sure what the conditions should be (the comment from reload will
> > need adapting and including as well):
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> > index 260591a..ac8d116 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -4086,7 +4086,9 @@ curr_insn_transform (bool check_only_p)
> >   			  && (goal_alt[i] == NO_REGS
> >   			      || (simplify_subreg_regno
> >   				  (ira_class_hard_regs[goal_alt[i]][0],
> > -				   GET_MODE (reg), byte, mode) >= 0)))))
> > +				   GET_MODE (reg), byte, mode) >= 0)))
> > +		      || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
> > +			  && WORD_REGISTER_OPERATIONS)))
> >   		{
> >   		  if (type == OP_OUT)
> >   		    type = OP_INOUT;
> >
> > I think at the very least the issue Richard pointed out in the previous
> > fix must be dealt with as the new testcase triggers exactly what he
> > described I believe
> >
> > Richard Sandiford wrote:
> >> So IMO the patch is too broad.  I think it should only use INOUT reloads
> >> for !strict_low if the inner mode is wider than a word and the outer mode
> >> is strictly narrower than the inner mode.  That's on top of Vlad's
> >> comment about only converting OP_OUTs, of course.
> > And here is my attempt at dealing with that:
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> > index ac8d116..8a0f40f 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -4090,7 +4090,17 @@ curr_insn_transform (bool check_only_p)
> >   		      || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
> >   			  && WORD_REGISTER_OPERATIONS)))
> >   		{
> > -		  if (type == OP_OUT)
> > +		  /* An OP_INOUT is required when reloading a subreg of a
> > +		     mode wider than a word to ensure that data beyond the
> > +		     word being reloaded is preserved.  Also automatically
> > +		     ensure that strict_low_part reloads are made into
> > +		     OP_INOUT which should already be true from the backend
> > +		     constraints.  */
> > +		  if (type == OP_OUT
> > +		      && (curr_static_id->operand[i].strict_low
> > +			  || (GET_MODE_SIZE (GET_MODE (reg)) > UNITS_PER_WORD
> > +			      && GET_MODE_SIZE (mode)
> > +				 < GET_MODE_SIZE (GET_MODE (reg)))))
> >   		    type = OP_INOUT;
> >   		  loc = &SUBREG_REG (*loc);
> >   		  mode = GET_MODE (*loc);
> >
> > Any thoughts on whether this is along the right track would be appreciated.
> >
> >
> Mathew, sorry for the delay with the answer.  I needed some time to
> think about it.  Dealing with subregs in lra/reload is a complicated and
> sensitive area.
> 
> The patch looks ok for me.  You can commit it if of course there are no
> regressions.  I hope the patch will behave well but please, be prepared
> to work more on it if there are complications. Sometimes such changes on
> LRA need a few iterations.

Thanks for taking a look. I will do wider testing before commit and I'm certainly
braced for further complications. I unfortunately need to try and address another
LRA issue hitting MIPS as well so I have a lot of testing to do. 

I'll run testing for at least x86_64, MIPS and another WORD_REGISTER_OPERATIONS
target and try to get this committed in the next couple of days so it can
get into everyone's testing well before release.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Further LRA subreg handling issues
  2017-01-19 23:41   ` Matthew Fortune
@ 2017-01-20  7:12     ` Eric Botcazou
  2017-01-20 22:44     ` Eric Botcazou
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Botcazou @ 2017-01-20  7:12 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Vladimir Makarov, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	gcc

> I'll run testing for at least x86_64, MIPS and another
> WORD_REGISTER_OPERATIONS target and try to get this committed in the next
> couple of days so it can get into everyone's testing well before release.

I'm going to give it a try on SPARC.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Further LRA subreg handling issues
  2017-01-19 23:41   ` Matthew Fortune
  2017-01-20  7:12     ` Eric Botcazou
@ 2017-01-20 22:44     ` Eric Botcazou
  2017-01-25  9:45       ` Matthew Fortune
  2017-01-26 13:00       ` Matthew Fortune
  1 sibling, 2 replies; 11+ messages in thread
From: Eric Botcazou @ 2017-01-20 22:44 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: gcc, Vladimir Makarov, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com)

> I'll run testing for at least x86_64, MIPS and another
> WORD_REGISTER_OPERATIONS target and try to get this committed in the next
> couple of days so it can get into everyone's testing well before release.

No issues found on SPARC.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] Further LRA subreg handling issues
  2017-01-20 22:44     ` Eric Botcazou
@ 2017-01-25  9:45       ` Matthew Fortune
  2017-01-26 13:00       ` Matthew Fortune
  1 sibling, 0 replies; 11+ messages in thread
From: Matthew Fortune @ 2017-01-25  9:45 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc, Vladimir Makarov, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	Matthias Klose (doko@debian.org)

Eric Botcazou <ebotcazou@adacore.com> writes:
> > I'll run testing for at least x86_64, MIPS and another
> > WORD_REGISTER_OPERATIONS target and try to get this committed in the
> > next couple of days so it can get into everyone's testing well before
> release.
> 
> No issues found on SPARC.

Thanks Eric.

I'm still bootstrap testing this on MIPS. It's taking longer as although
the fix allows the bootstrap to continue there are now stage2/stage3
differences. The bug appears to be essentially the same issue as I just
fixed but in reverse i.e. a paradoxical sub-reg loading too much data
from its stack slot. Extracts from dumps are below; first of all the
input rtl to the combine pass:

(insn 248 246 249 38 (set (reg:QI 282)
        (subreg:QI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 362 {*movqi_internal}
     (nil))
(insn 249 248 250 38 (set (reg:DI 284)
        (zero_extend:DI (reg:QI 282))) "/home/mfortune/gcc/gcc/predict.c":2904 216 {*zero_extendqidi2}
     (expr_list:REG_DEAD (reg:QI 282)
        (nil)))
(insn 250 249 251 38 (set (reg:DI 6 $6)
        (reg:DI 284)) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 284)
        (nil)))

Trying 248 -> 249:
Successfully matched this instruction:
(set (reg:DI 284)
    (subreg:DI (reg:SI 300) 0))
allowing combination of insns 248 and 249
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 248.
modifying insn i3   249: r284:DI=r300:SI#0
deferring rescan insn with uid = 249.

Trying 249 -> 250:
Successfully matched this instruction:
(set (reg:DI 6 $6)
    (subreg:DI (reg:SI 300) 0))
allowing combination of insns 249 and 250
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 249.
modifying insn i3   250: $6:DI=r300:SI#0
deferring rescan insn with uid = 250.

Which results in the following coming out of combine:

(note 248 246 249 38 NOTE_INSN_DELETED)
(note 249 248 250 38 NOTE_INSN_DELETED)
(insn 250 249 251 38 (set (reg:DI 6 $6)
        (subreg:DI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
     (nil))

Pseudo 300 is assigned to memory and then LRA produces a simple DImode load from
the assigned stack slot. The only instruction to set pseudo 300 is:

(insn 247 212 389 3 (set (reg:SI 300)
        (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
            (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 {*sne_zero_sisi}
     (nil))

Which leads to an SImode store to the stack slot:

(insn 247 392 393 3 (set (reg:SI 4 $4 [300])
        (ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
            (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 {*sne_zero_sisi}
     (nil))
(insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
                (const_int 16 [0x10])) [403 %sfp+16 S4 A64])
        (reg:SI 4 $4 [300])) "/home/mfortune/gcc/gcc/predict.c":2904 312 {*movsi_internal}
     (nil))
...

(note 248 246 249 40 NOTE_INSN_DELETED)
(note 249 248 256 40 NOTE_INSN_DELETED)
(note 256 249 250 40 NOTE_INSN_DELETED)
(insn 250 256 251 40 (set (reg:DI 6 $6)
        (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
                (const_int 16 [0x10])) [403 %sfp+16 S8 A64])) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
     (nil))

My assumption is that LRA is again expected to deal with this case and for insn
250 should be recognising that it must load 32-bits and rely on implicit
LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In this case it
does not matter whether it is sign or zero extension and my assumption is that
this construct would never appear if a specific sign or zero extension was
required.

I haven't got to looking at where the issue is this time but it seems different as
this is a subreg in a simple move instruction where we already support the load/
store directly so no new reload instruction is required. I don't know if this
implies that simple move patterns should reject subregs but that doesn't sound
right either.

Resolving this fixes at least one bug and potentially all bugs in the MIPS bootstrap
as  manually modified the generated assembly code to use LW instead of LD for insn
250 and one of the buggy stage 3 objects is fixed.

I'll keep thinking, any advice in the meantime is appreciated.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] Further LRA subreg handling issues
  2017-01-20 22:44     ` Eric Botcazou
  2017-01-25  9:45       ` Matthew Fortune
@ 2017-01-26 13:00       ` Matthew Fortune
  2017-01-26 15:42         ` Eric Botcazou
  2017-01-26 16:22         ` David Malcolm
  1 sibling, 2 replies; 11+ messages in thread
From: Matthew Fortune @ 2017-01-26 13:00 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	Eric Botcazou

Matthew Fortune <matthew.fortune@imgtec.com> writes:
...
> Pseudo 300 is assigned to memory and then LRA produces a simple DImode
> load from the assigned stack slot. The only instruction to set pseudo
> 300 is:
> 
> (insn 247 212 389 3 (set (reg:SI 300)
>         (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
>             (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904
> 504 {*sne_zero_sisi}
>      (nil))
> 
> Which leads to an SImode store to the stack slot:
> 
> (insn 247 392 393 3 (set (reg:SI 4 $4 [300])
>         (ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
>             (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904
> 504 {*sne_zero_sisi}
>      (nil))
> (insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
>                 (const_int 16 [0x10])) [403 %sfp+16 S4 A64])
>         (reg:SI 4 $4 [300])) "/home/mfortune/gcc/gcc/predict.c":2904 312
> {*movsi_internal}
>      (nil))
> ...
> 
> (note 248 246 249 40 NOTE_INSN_DELETED)
> (note 249 248 256 40 NOTE_INSN_DELETED)
> (note 256 249 250 40 NOTE_INSN_DELETED)
> (insn 250 256 251 40 (set (reg:DI 6 $6)
>         (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
>                 (const_int 16 [0x10])) [403 %sfp+16 S8 A64]))
> "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
>      (nil))
> 
> My assumption is that LRA is again expected to deal with this case and
> for insn
> 250 should be recognising that it must load 32-bits and rely on implicit
> LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In this
> case it does not matter whether it is sign or zero extension and my
> assumption is that this construct would never appear if a specific sign
> or zero extension was required.
> 
> I haven't got to looking at where the issue is this time but it seems
> different as this is a subreg in a simple move instruction where we
> already support the load/ store directly so no new reload instruction is
> required. I don't know if this implies that simple move patterns should
> reject subregs but that doesn't sound right either.
> 
> Resolving this fixes at least one bug and potentially all bugs in the
> MIPS bootstrap as  manually modified the generated assembly code to use
> LW instead of LD for insn
> 250 and one of the buggy stage 3 objects is fixed.
> 
> I'll keep thinking, any advice in the meantime is appreciated.

All I have been able to determine on this is that there is potentially
different behaviour for paradoxical subregs in LRA vs reload.  There is
this comment in reload.c:push_reload:

    If we have (SUBREG:M1 (MEM:M2 ...) ...) (or an inner REG that is still
     a pseudo and hence will become a MEM) with M1 wider than M2 and the
     register is a pseudo, also reload the inside expression.

To me this makes perfect sense as I believe the RTL is only saying that
there is an M2-mode object to access or at least only the M2-mode sized
bits are valid. There are comments to say there will always be sufficient
memory assigned for spill slots as they are sized to fit the largest
paradoxical subreg, I just don't know why that is useful/important.

However in lra-constraints.c:simplify_operand_subreg it quite happily
performs a reload using the outer mode in this case and only drops down to
the inner mode if the outer mode reload would be slower than the inner.

Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as the
junk upper bits in registers will be ignored; On WORD_REGISTER_OPERATIONS
targets then the narrower-than-word mode load will take care of any
'magic' needed to set the upper bits to a safe value in register.

So my thinking is that at least WORD_REGISTER_OPERATIONS targets should
always reload the inner mode for the case mentioned above much like the same
is required for normal subregs. Does that seem reasonable? Have I
misunderstood the paradoxical subreg case entirely?

I've only done superficial testing of a change to this code so far but my
testcase starts working at least which is a start.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Further LRA subreg handling issues
  2017-01-26 13:00       ` Matthew Fortune
@ 2017-01-26 15:42         ` Eric Botcazou
  2017-01-26 17:19           ` Matthew Fortune
  2017-01-26 16:22         ` David Malcolm
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Botcazou @ 2017-01-26 15:42 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: gcc, Vladimir Makarov, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com)

> However in lra-constraints.c:simplify_operand_subreg it quite happily
> performs a reload using the outer mode in this case and only drops down to
> the inner mode if the outer mode reload would be slower than the inner.
> 
> Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as the
> junk upper bits in registers will be ignored; On WORD_REGISTER_OPERATIONS
> targets then the narrower-than-word mode load will take care of any
> 'magic' needed to set the upper bits to a safe value in register.

Yes, I was leaning to the same conclusion before reading your second message.

> So my thinking is that at least WORD_REGISTER_OPERATIONS targets should
> always reload the inner mode for the case mentioned above much like the same
> is required for normal subregs. Does that seem reasonable? Have I
> misunderstood the paradoxical subreg case entirely?

No, this is correct, see find_reloads:

	      /* We must force a reload of paradoxical SUBREGs
		 of a MEM because the alignment of the inner value
		 may not be enough to do the outer reference.  On
		 big-endian machines, it may also reference outside
		 the object.

		 On machines that extend byte operations and we have a
		 SUBREG where both the inner and outer modes are no wider
		 than a word and the inner mode is narrower, is integral,
		 and gets extended when loaded from memory, combine.c has
		 made assumptions about the behavior of the machine in such
		 register access.  If the data is, in fact, in memory we
		 must always load using the size assumed to be in the
		 register and let the insn do the different-sized
		 accesses.

		 This is doubly true if WORD_REGISTER_OPERATIONS.  In
		 this case eliminate_regs has left non-paradoxical
		 subregs for push_reload to see.  Make sure it does
		 by forcing the reload.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Further LRA subreg handling issues
  2017-01-26 13:00       ` Matthew Fortune
  2017-01-26 15:42         ` Eric Botcazou
@ 2017-01-26 16:22         ` David Malcolm
  1 sibling, 0 replies; 11+ messages in thread
From: David Malcolm @ 2017-01-26 16:22 UTC (permalink / raw)
  To: Matthew Fortune, Vladimir Makarov
  Cc: gcc, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com),
	Eric Botcazou

On Thu, 2017-01-26 at 13:00 +0000, Matthew Fortune wrote:
> Matthew Fortune <matthew.fortune@imgtec.com> writes:
> ...
> > Pseudo 300 is assigned to memory and then LRA produces a simple
> > DImode
> > load from the assigned stack slot. The only instruction to set
> > pseudo
> > 300 is:
> > 
> > (insn 247 212 389 3 (set (reg:SI 300)
> >         (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
> >             (const_int 0 [0])))
> > "/home/mfortune/gcc/gcc/predict.c":2904
> > 504 {*sne_zero_sisi}
> >      (nil))
> > 
> > Which leads to an SImode store to the stack slot:
> > 
> > (insn 247 392 393 3 (set (reg:SI 4 $4 [300])
> >         (ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
> >             (const_int 0 [0])))
> > "/home/mfortune/gcc/gcc/predict.c":2904
> > 504 {*sne_zero_sisi}
> >      (nil))
> > (insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
> >                 (const_int 16 [0x10])) [403 %sfp+16 S4 A64])
> >         (reg:SI 4 $4 [300]))
> > "/home/mfortune/gcc/gcc/predict.c":2904 312
> > {*movsi_internal}
> >      (nil))
> > ...
> > 
> > (note 248 246 249 40 NOTE_INSN_DELETED)
> > (note 249 248 256 40 NOTE_INSN_DELETED)
> > (note 256 249 250 40 NOTE_INSN_DELETED)
> > (insn 250 256 251 40 (set (reg:DI 6 $6)
> >         (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
> >                 (const_int 16 [0x10])) [403 %sfp+16 S8 A64]))
> > "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
> >      (nil))
> > 
> > My assumption is that LRA is again expected to deal with this case
> > and
> > for insn
> > 250 should be recognising that it must load 32-bits and rely on
> > implicit
> > LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In
> > this
> > case it does not matter whether it is sign or zero extension and my
> > assumption is that this construct would never appear if a specific
> > sign
> > or zero extension was required.
> > 
> > I haven't got to looking at where the issue is this time but it
> > seems
> > different as this is a subreg in a simple move instruction where we
> > already support the load/ store directly so no new reload
> > instruction is
> > required. I don't know if this implies that simple move patterns
> > should
> > reject subregs but that doesn't sound right either.
> > 
> > Resolving this fixes at least one bug and potentially all bugs in
> > the
> > MIPS bootstrap as  manually modified the generated assembly code to
> > use
> > LW instead of LD for insn
> > 250 and one of the buggy stage 3 objects is fixed.
> > 
> > I'll keep thinking, any advice in the meantime is appreciated.
> 
> All I have been able to determine on this is that there is
> potentially
> different behaviour for paradoxical subregs in LRA vs reload.  There
> is
> this comment in reload.c:push_reload:
> 
>     If we have (SUBREG:M1 (MEM:M2 ...) ...) (or an inner REG that is
> still
>      a pseudo and hence will become a MEM) with M1 wider than M2 and
> the
>      register is a pseudo, also reload the inside expression.
> 
> To me this makes perfect sense as I believe the RTL is only saying
> that
> there is an M2-mode object to access or at least only the M2-mode
> sized
> bits are valid. There are comments to say there will always be
> sufficient
> memory assigned for spill slots as they are sized to fit the largest
> paradoxical subreg, I just don't know why that is useful/important.
> 
> However in lra-constraints.c:simplify_operand_subreg it quite happily
> performs a reload using the outer mode in this case and only drops
> down to
> the inner mode if the outer mode reload would be slower than the
> inner.
> 
> Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as
> the
> junk upper bits in registers will be ignored; On
> WORD_REGISTER_OPERATIONS
> targets then the narrower-than-word mode load will take care of any
> 'magic' needed to set the upper bits to a safe value in register.
> 
> So my thinking is that at least WORD_REGISTER_OPERATIONS targets
> should
> always reload the inner mode for the case mentioned above much like
> the same
> is required for normal subregs. Does that seem reasonable? Have I
> misunderstood the paradoxical subreg case entirely?
> 
> I've only done superficial testing of a change to this code so far
> but my
> testcase starts working at least which is a start.

FWIW, the RTL "frontend" [1] is now in trunk (as of r244878), so it
should now possible to write small fragments of RTL as testcases in
DejaGnu.

I don't know if it's helpful for this bug though.

In case it is, I started some documentation for it here:
  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02065.html


Dave

[1] I put "frontend" in quotes as it's actually an extension to the C
frontend, rather than a true frontend.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] Further LRA subreg handling issues
  2017-01-26 15:42         ` Eric Botcazou
@ 2017-01-26 17:19           ` Matthew Fortune
  2017-01-26 22:33             ` Eric Botcazou
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Fortune @ 2017-01-26 17:19 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc, Vladimir Makarov, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com)

Eric Botcazou <ebotcazou@adacore.com> writes:
> > However in lra-constraints.c:simplify_operand_subreg it quite happily
> > performs a reload using the outer mode in this case and only drops
> > down to the inner mode if the outer mode reload would be slower than
> the inner.
> >
> > Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as
> > the junk upper bits in registers will be ignored; On
> > WORD_REGISTER_OPERATIONS targets then the narrower-than-word mode load
> > will take care of any 'magic' needed to set the upper bits to a safe
> value in register.
> 
> Yes, I was leaning to the same conclusion before reading your second
> message.
> 
> > So my thinking is that at least WORD_REGISTER_OPERATIONS targets
> > should always reload the inner mode for the case mentioned above much
> > like the same is required for normal subregs. Does that seem
> > reasonable? Have I misunderstood the paradoxical subreg case entirely?
> 
> No, this is correct, see find_reloads:
> 
> 	      /* We must force a reload of paradoxical SUBREGs
> 		 of a MEM because the alignment of the inner value
> 		 may not be enough to do the outer reference.  On
> 		 big-endian machines, it may also reference outside
> 		 the object.
> 
> 		 On machines that extend byte operations and we have a
> 		 SUBREG where both the inner and outer modes are no wider
> 		 than a word and the inner mode is narrower, is integral,
> 		 and gets extended when loaded from memory, combine.c has
> 		 made assumptions about the behavior of the machine in such
> 		 register access.  If the data is, in fact, in memory we
> 		 must always load using the size assumed to be in the
> 		 register and let the insn do the different-sized
> 		 accesses.

This part suggests to me that LRA should never be reloading the
paradoxical subreg meaning the whole SLOW_UNALIGNED_ACCESS checking code in
simplify_operand_subreg could be removed unconditionally.  But I get the
feeling the big valid_address_p check (below) will still prevent some
paradoxical subregs from being reloaded via their inner mode.  I haven't
quite understood exactly what the check is trying to achieve yet though:

      if (!addr_was_valid
          || valid_address_p (GET_MODE (subst), XEXP (subst, 0),
                              MEM_ADDR_SPACE (subst))
          || ((get_constraint_type (lookup_constraint
                                    (curr_static_id->operand[nop].constraint))
               != CT_SPECIAL_MEMORY)
              /* We still can reload address and if the address is
                 valid, we can remove subreg without reloading its
                 inner memory.  */
              && valid_address_p (GET_MODE (subst),
                                  regno_reg_rtx
                                  [ira_class_hard_regs
                                   [base_reg_class (GET_MODE (subst),
                                                    MEM_ADDR_SPACE (subst),
                                                    ADDRESS, SCRATCH)][0]],
                                  MEM_ADDR_SPACE (subst))))
        {

> 		 This is doubly true if WORD_REGISTER_OPERATIONS.  In
> 		 this case eliminate_regs has left non-paradoxical
> 		 subregs for push_reload to see.  Make sure it does
> 		 by forcing the reload.

This statement covers the fix I already proposed but perhaps
simplify_operand_subreg can also hit this issue if a 'normal' subreg appears
in an instruction where registers and memory are supported (like move
instructions). In this case the constraints are satisfied and the fix I
proposed would never get run but simplify_operand_subreg would.

Eric: I see you recently had to modify the code I'm talking about in the post
below. Out of interest... was this another issue brought to light by the
improvements to zero extension elimination?

https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01202.html

Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Further LRA subreg handling issues
  2017-01-26 17:19           ` Matthew Fortune
@ 2017-01-26 22:33             ` Eric Botcazou
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Botcazou @ 2017-01-26 22:33 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: gcc, Vladimir Makarov, Richard Sandiford,
	Jeff Law <law@redhat.com> (law@redhat.com)

> This part suggests to me that LRA should never be reloading the
> paradoxical subreg meaning the whole SLOW_UNALIGNED_ACCESS checking code in
> simplify_operand_subreg could be removed unconditionally.

Why?  For a little-endian target which is neither strict-alignment nor 
WORD_REGISTER_OPERATIONS, typically x86, you can reload the outer reg.
Problems arise only for big-endian or strict-alignment or W_R_O, as explained 
by the find_reloads code.

IOW simplify_operand_subreg should mimic the handling of paradoxical SUBREGs 
done by find_reloads, with specific checks for WORD_REGISTER_OPERATIONS, 
BYTES_BIG_ENDIAN, etc.

> Eric: I see you recently had to modify the code I'm talking about in the
> post below. Out of interest... was this another issue brought to light by
> the improvements to zero extension elimination?

Nope, there were a couple of other, unrelated bugs in the code.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-26 22:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 15:48 [RFC] Further LRA subreg handling issues Matthew Fortune
2017-01-19 23:31 ` Vladimir Makarov
2017-01-19 23:41   ` Matthew Fortune
2017-01-20  7:12     ` Eric Botcazou
2017-01-20 22:44     ` Eric Botcazou
2017-01-25  9:45       ` Matthew Fortune
2017-01-26 13:00       ` Matthew Fortune
2017-01-26 15:42         ` Eric Botcazou
2017-01-26 17:19           ` Matthew Fortune
2017-01-26 22:33             ` Eric Botcazou
2017-01-26 16:22         ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).