[PATCH] Skip re-computing the mips frame info after reload completed

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Skip re-computing the mips frame info after reload completed
@ 2016-01-23 13:25 Bernd Edlinger
  2016-01-24 21:57 ` Matthew Fortune
  0 siblings, 1 reply; 19+ messages in thread
From: Bernd Edlinger @ 2016-01-23 13:25 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: gcc-patches, Nick Clifton

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

Hi,

this patch skips the redundant re-computing of the frame info after reload completed.

I looked at all available targets initial_elimination_offset functions:

All of them currently use either a trivial function of crtl->outgoing_args_size, get_frame_size ()
and df_regs_ever_live_p (x), that can be expected to be evaluated quickly and to be constant,
or they use a cached value, if they are called when reload_completed is true.

I believe this patch will both be a performance optimization and guarantee that the frame info
can not unexpectedly change when it should not.

I have successfully built a cross-compiler with this patch.
Is it OK for trunk?

Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-mips.diff --]
[-- Type: text/x-patch; name="patch-mips.diff", Size: 697 bytes --]

2016-01-23  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/mips/mips.c (mips_compute_frame_info): Skip re-computing
	the frame info after reload completed.

Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 231954)
+++ gcc/config/mips/mips.c	(working copy)
@@ -10321,6 +10321,10 @@ mips_compute_frame_info (void)
   HOST_WIDE_INT offset, size;
   unsigned int regno, i;

+  /* Skip re-computing the frame info after reload completed.  */
+  if (reload_completed)
+    return;
+
   /* Set this function's interrupt properties.  */
   if (mips_interrupt_type_p (TREE_TYPE (current_function_decl)))
     {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] Skip re-computing the mips frame info after reload completed
  2016-01-23 13:25 [PATCH] Skip re-computing the mips frame info after reload completed Bernd Edlinger
@ 2016-01-24 21:57 ` Matthew Fortune
  2016-01-25  7:44   ` AW: " Bernd Edlinger
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Fortune @ 2016-01-24 21:57 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches, Nick Clifton

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> this patch skips the redundant re-computing of the frame info after reload completed.
> 
> I looked at all available targets initial_elimination_offset functions:
> 
> All of them currently use either a trivial function of crtl->outgoing_args_size,
> get_frame_size ()
> and df_regs_ever_live_p (x), that can be expected to be evaluated quickly and to be
> constant,
> or they use a cached value, if they are called when reload_completed is true.
> 

Hi Bernd,

For the sake of consistency with other targets then this looks fine. I'd
both prefer the MIPS backend to be as efficient as other targets and not
crash when something like initial elimination offset gets used post
reload. Whether it is right or wrong that we end up with these calls
post-reload is a different matter but I don't understand the detail
enough to comment.

> I believe this patch will both be a performance optimization and
> guarantee that the frame info can not unexpectedly change when it
> should not.

I am a bit dubious about this comment as simply not re-computing the
frame info is not quite the same as it being consistent with
register usage etc. However, we have no check for this right now
post-reload so there is no requirement to improve the situation.

> I have successfully built a cross-compiler with this patch.
> Is it OK for trunk?

Has the patch been tested beyond just building GCC? I can do a
test run for you if you don't have things set up to do one yourself.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: [PATCH] Skip re-computing the mips frame info after reload completed
  2016-01-24 21:57 ` Matthew Fortune
@ 2016-01-25  7:44   ` Bernd Edlinger
  2016-01-25 19:57     ` Matthew Fortune
  0 siblings, 1 reply; 19+ messages in thread
From: Bernd Edlinger @ 2016-01-25  7:44 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: gcc-patches, Nick Clifton


Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Has the patch been tested beyond just building GCC? I can do a
> test run for you if you don't have things set up to do one yourself.

I built a cross-gcc with all languages and a cross-glibc, but I have
not set up an emulation environment, so if you could give it a test
that would be highly welcome.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] Skip re-computing the mips frame info after reload completed
  2016-01-25  7:44   ` AW: " Bernd Edlinger
@ 2016-01-25 19:57     ` Matthew Fortune
  2016-01-26 21:18       ` Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed) Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Fortune @ 2016-01-25 19:57 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches, Nick Clifton

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> > Has the patch been tested beyond just building GCC? I can do a
> > test run for you if you don't have things set up to do one yourself.
> 
> I built a cross-gcc with all languages and a cross-glibc, but I have
> not set up an emulation environment, so if you could give it a test
> that would be highly welcome.

mipsel-linux-gnu test results are the same before and after this patch.

Please go ahead and commit.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-25 19:57     ` Matthew Fortune
@ 2016-01-26 21:18       ` Richard Sandiford
  2016-01-27 21:09         ` Bernd Edlinger
  2016-01-27 23:36         ` Eric Botcazou
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Sandiford @ 2016-01-26 21:18 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: Bernd Edlinger, gcc-patches, Nick Clifton, ebotcazou

[cc-ing Eric as RTL maintainer]

Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>> > Has the patch been tested beyond just building GCC? I can do a
>> > test run for you if you don't have things set up to do one yourself.
>> 
>> I built a cross-gcc with all languages and a cross-glibc, but I have
>> not set up an emulation environment, so if you could give it a test
>> that would be highly welcome.
>
> mipsel-linux-gnu test results are the same before and after this patch.
>
> Please go ahead and commit.

I still object to this.  And it feels like the patch was posted
as though it was a new one in order to avoid answering the objections
that were raised when it was last posted:

  https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html

IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
bit of LRA/reload logic:

------------------------------------------------------------------------
/* Compute an approximation for the offset between the register
   FROM and TO for the current function, as it was at the start
   of the routine.  */

static HOST_WIDE_INT
get_initial_register_offset (int from, int to)
{
#ifdef ELIMINABLE_REGS
  static const struct elim_table_t
  {
    const int from;
    const int to;
  } table[] = ELIMINABLE_REGS;
  HOST_WIDE_INT offset1, offset2;
  unsigned int i, j;

  if (to == from)
    return 0;

  /* It is not safe to call INITIAL_ELIMINATION_OFFSET
     before the reload pass.  We need to give at least
     an estimation for the resulting frame size.  */
  if (! reload_completed)
    {
      offset1 = crtl->outgoing_args_size + get_frame_size ();
#if !STACK_GROWS_DOWNWARD
      offset1 = - offset1;
#endif
      if (to == STACK_POINTER_REGNUM)
	return offset1;
      else if (from == STACK_POINTER_REGNUM)
	return - offset1;
      else
	return 0;
     }

  for (i = 0; i < ARRAY_SIZE (table); i++)
      if (table[i].from == from)
	{
	  if (table[i].to == to)
	    {
	      INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
					  offset1);
	      return offset1;
	    }
	  for (j = 0; j < ARRAY_SIZE (table); j++)
	    {
	      if (table[j].to == to
		  && table[j].from == table[i].to)
		{
		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
					      offset1);
		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
					      offset2);
		  return offset1 + offset2;
		}
	      if (table[j].from == to
		  && table[j].to == table[i].to)
		{
		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
					      offset1);
		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
					      offset2);
		  return offset1 - offset2;
		}
	    }
	}
      else if (table[i].to == from)
	{
	  if (table[i].from == to)
	    {
	      INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
					  offset1);
	      return - offset1;
	    }
	  for (j = 0; j < ARRAY_SIZE (table); j++)
	    {
	      if (table[j].to == to
		  && table[j].from == table[i].from)
		{
		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
					      offset1);
		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
					      offset2);
		  return - offset1 + offset2;
		}
	      if (table[j].from == to
		  && table[j].to == table[i].from)
		{
		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
					      offset1);
		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
					      offset2);
		  return - offset1 - offset2;
		}
	    }
	}

  /* If the requested register combination was not found,
     try a different more simple combination.  */
  if (from == ARG_POINTER_REGNUM)
    return get_initial_register_offset (HARD_FRAME_POINTER_REGNUM, to);
  else if (to == ARG_POINTER_REGNUM)
    return get_initial_register_offset (from, HARD_FRAME_POINTER_REGNUM);
  else if (from == HARD_FRAME_POINTER_REGNUM)
    return get_initial_register_offset (FRAME_POINTER_REGNUM, to);
  else if (to == HARD_FRAME_POINTER_REGNUM)
    return get_initial_register_offset (from, FRAME_POINTER_REGNUM);
  else
    return 0;

#else
  HOST_WIDE_INT offset;

  if (to == from)
    return 0;

  if (reload_completed)
    {
      INITIAL_FRAME_POINTER_OFFSET (offset);
    }
  else
    {
      offset = crtl->outgoing_args_size + get_frame_size ();
#if !STACK_GROWS_DOWNWARD
      offset = - offset;
#endif
    }

  if (to == STACK_POINTER_REGNUM)
    return offset;
  else if (from == STACK_POINTER_REGNUM)
    return - offset;
  else
    return 0;

#endif
}
------------------------------------------------------------------------

Under the current interface macros like INITIAL_ELIMINATION_OFFSET
are expected to trigger the calculation of the target's frame layout
(since you need that information to answer the question).
To me it seems wrong that we're attempting to call that sort of
macro in a query routine like rtx_addr_can_trap_p_1.

The frame layout is fixed for each LRA/reload cycle.  The layout
for the last of those cycles carries forward through the rest
of compilation.

IMO we should cache the information we need at the start of each
LRA/reload cycle.  This will be more robust, because there will
be no accidental changes to global state either during or after
LRA/reload.  It will also be more efficient because
rtx_addr_can_trap_p_1 can read the cached variables rather
than calling back into the target.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-26 21:18       ` Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed) Richard Sandiford
@ 2016-01-27 21:09         ` Bernd Edlinger
  2016-01-28 22:17           ` Richard Sandiford
  2016-01-27 23:36         ` Eric Botcazou
  1 sibling, 1 reply; 19+ messages in thread
From: Bernd Edlinger @ 2016-01-27 21:09 UTC (permalink / raw)
  To: Matthew Fortune, gcc-patches, Nick Clifton, ebotcazou, rdsandiford

On 26.01.2016 22:18, Richard Sandiford wrote:
> [cc-ing Eric as RTL maintainer]
>
> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>> Has the patch been tested beyond just building GCC? I can do a
>>>> test run for you if you don't have things set up to do one yourself.
>>>
>>> I built a cross-gcc with all languages and a cross-glibc, but I have
>>> not set up an emulation environment, so if you could give it a test
>>> that would be highly welcome.
>>
>> mipsel-linux-gnu test results are the same before and after this patch.
>>
>> Please go ahead and commit.
>
> I still object to this.  And it feels like the patch was posted
> as though it was a new one in order to avoid answering the objections
> that were raised when it was last posted:
>
>    https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>

Richard, I am really sorry when you feel now like I did not take your
objections seriously.  Let me first explain what happened from my point
of view:

When I posted this response to your objections here:

  https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html

I was waiting for your response, but nothing happened, so I kind of
forgot about the issue.  In the meantime Ubuntu and Debian began to
roll out GCC-6 and they got stuck at the same issue, but instead of
waiting for pr69012 to be eventually resolved they created a duplicate
pr69129, and then last Thursday Nick applied my initial patch without
my intervention, probably because of the pressure they put on us.

That changed significantly how I looked at the issue after that point,
as there was no actual regression anymore, the revised patch still made
sense, but for another reason.  When you look at the 20+ targets in the
gcc tree you'll see that almost all of them have a frame-layout
computation function and all except mips have a shortcut
"if (reload_completed) return;" in that function.  And OTOH mips has
one of the most complicated frame layout functions of all targets.

For all of these reasons I posted a new patch which tries to resolve
differences between mips and other targets inital_elimination_offset
functions.

I still think that it is not OK for the mips target to do the frame
layout already in mips_frame_pointer_required because the frame layout
will change several times, until reload is completed, and that function
is only called right in the beginning.

And I think that it is not really well-designed to have a frame layout
function F(x,y)->z unless you assume F to be a trivial O(1) function
that has no significant computation overhead.  When you assume F to be
an expensive O(N) or higher complexity you would design the interface
like compute_frame_layout_now() and
get_cached_initial_elimination_offset(x,y)->z.

Also note, that with the current design, of initial_elimination_offset
the frame layout is already computed several times, because reload has
to call the function with different parameters, and there is no way how
to know when the cached value can be used and when not.

I do also think that having to cache the initial elimination offset
values in the back-end that are already cached in the target does not
improve anything.  Then I would prefer to have a set of new target
callbacks that makes it easy to access the cached target values.

If you want to propose a patch for that I will not object, but I doubt
it will be suitable for Stage 4.

Thanks
Bernd.

> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
> bit of LRA/reload logic:
>
> ------------------------------------------------------------------------
> /* Compute an approximation for the offset between the register
>     FROM and TO for the current function, as it was at the start
>     of the routine.  */
>
> static HOST_WIDE_INT
> get_initial_register_offset (int from, int to)
> {
> #ifdef ELIMINABLE_REGS
>    static const struct elim_table_t
>    {
>      const int from;
>      const int to;
>    } table[] = ELIMINABLE_REGS;
>    HOST_WIDE_INT offset1, offset2;
>    unsigned int i, j;
>
>    if (to == from)
>      return 0;
>
>    /* It is not safe to call INITIAL_ELIMINATION_OFFSET
>       before the reload pass.  We need to give at least
>       an estimation for the resulting frame size.  */
>    if (! reload_completed)
>      {
>        offset1 = crtl->outgoing_args_size + get_frame_size ();
> #if !STACK_GROWS_DOWNWARD
>        offset1 = - offset1;
> #endif
>        if (to == STACK_POINTER_REGNUM)
> 	return offset1;
>        else if (from == STACK_POINTER_REGNUM)
> 	return - offset1;
>        else
> 	return 0;
>       }
>
>    for (i = 0; i < ARRAY_SIZE (table); i++)
>        if (table[i].from == from)
> 	{
> 	  if (table[i].to == to)
> 	    {
> 	      INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> 					  offset1);
> 	      return offset1;
> 	    }
> 	  for (j = 0; j < ARRAY_SIZE (table); j++)
> 	    {
> 	      if (table[j].to == to
> 		  && table[j].from == table[i].to)
> 		{
> 		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> 					      offset1);
> 		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
> 					      offset2);
> 		  return offset1 + offset2;
> 		}
> 	      if (table[j].from == to
> 		  && table[j].to == table[i].to)
> 		{
> 		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> 					      offset1);
> 		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
> 					      offset2);
> 		  return offset1 - offset2;
> 		}
> 	    }
> 	}
>        else if (table[i].to == from)
> 	{
> 	  if (table[i].from == to)
> 	    {
> 	      INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> 					  offset1);
> 	      return - offset1;
> 	    }
> 	  for (j = 0; j < ARRAY_SIZE (table); j++)
> 	    {
> 	      if (table[j].to == to
> 		  && table[j].from == table[i].from)
> 		{
> 		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> 					      offset1);
> 		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
> 					      offset2);
> 		  return - offset1 + offset2;
> 		}
> 	      if (table[j].from == to
> 		  && table[j].to == table[i].from)
> 		{
> 		  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> 					      offset1);
> 		  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
> 					      offset2);
> 		  return - offset1 - offset2;
> 		}
> 	    }
> 	}
>
>    /* If the requested register combination was not found,
>       try a different more simple combination.  */
>    if (from == ARG_POINTER_REGNUM)
>      return get_initial_register_offset (HARD_FRAME_POINTER_REGNUM, to);
>    else if (to == ARG_POINTER_REGNUM)
>      return get_initial_register_offset (from, HARD_FRAME_POINTER_REGNUM);
>    else if (from == HARD_FRAME_POINTER_REGNUM)
>      return get_initial_register_offset (FRAME_POINTER_REGNUM, to);
>    else if (to == HARD_FRAME_POINTER_REGNUM)
>      return get_initial_register_offset (from, FRAME_POINTER_REGNUM);
>    else
>      return 0;
>
> #else
>    HOST_WIDE_INT offset;
>
>    if (to == from)
>      return 0;
>
>    if (reload_completed)
>      {
>        INITIAL_FRAME_POINTER_OFFSET (offset);
>      }
>    else
>      {
>        offset = crtl->outgoing_args_size + get_frame_size ();
> #if !STACK_GROWS_DOWNWARD
>        offset = - offset;
> #endif
>      }
>
>    if (to == STACK_POINTER_REGNUM)
>      return offset;
>    else if (from == STACK_POINTER_REGNUM)
>      return - offset;
>    else
>      return 0;
>
> #endif
> }
> ------------------------------------------------------------------------
>
> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
> are expected to trigger the calculation of the target's frame layout
> (since you need that information to answer the question).
> To me it seems wrong that we're attempting to call that sort of
> macro in a query routine like rtx_addr_can_trap_p_1.
>
> The frame layout is fixed for each LRA/reload cycle.  The layout
> for the last of those cycles carries forward through the rest
> of compilation.
>
> IMO we should cache the information we need at the start of each
> LRA/reload cycle.  This will be more robust, because there will
> be no accidental changes to global state either during or after
> LRA/reload.  It will also be more efficient because
> rtx_addr_can_trap_p_1 can read the cached variables rather
> than calling back into the target.
>
> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-26 21:18       ` Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed) Richard Sandiford
  2016-01-27 21:09         ` Bernd Edlinger
@ 2016-01-27 23:36         ` Eric Botcazou
  2016-01-29  1:09           ` Bernd Schmidt
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Botcazou @ 2016-01-27 23:36 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, Matthew Fortune, Bernd Edlinger, Nick Clifton

> [cc-ing Eric as RTL maintainer]

Sorry for the delay, the message apparently bounced]

> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
> bit of LRA/reload logic:
> 
> [...]
> 
> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
> are expected to trigger the calculation of the target's frame layout
> (since you need that information to answer the question).
> To me it seems wrong that we're attempting to call that sort of
> macro in a query routine like rtx_addr_can_trap_p_1.

I'm a little uncomfortable stepping in here because, while I essentially share 
your objections and was opposed to the patch (I was almost sure that it would 
introduce maintainance issues for no practical benefit), I didn't imagine that 
such a failure mode would have been possible (computing an approximation of 
the frame layout after reload being problematic) so I didn't really object to 
being overruled after seeing Bernd's patch...

> IMO we should cache the information we need@the start of each
> LRA/reload cycle.  This will be more robust, because there will
> be no accidental changes to global state either during or after
> LRA/reload.  It will also be more efficient because
> rtx_addr_can_trap_p_1 can read the cached variables rather
> than calling back into the target.

That would be a better design for sure and would eliminate the maintainance 
issues I was originally afraid of.

My recommendation would be to back out Bernd's patch for GCC 6.0 (again, it 
doesn't fix any regression and, more importantly, any bug in real software, 
but only corner case bugs in dumb computer-generated testcases) but with the 
commitment to address the underlying issue for GCC 7.0 and backport the fix to 
GCC 6.x unless really impracticable.  That being said, it's ultimately Jakub 
and Richard's call.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-27 21:09         ` Bernd Edlinger
@ 2016-01-28 22:17           ` Richard Sandiford
  2016-01-29 17:54             ` Bernd Edlinger
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2016-01-28 22:17 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Matthew Fortune, gcc-patches, Nick Clifton, ebotcazou

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> On 26.01.2016 22:18, Richard Sandiford wrote:
>> [cc-ing Eric as RTL maintainer]
>>
>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>>> Has the patch been tested beyond just building GCC? I can do a
>>>>> test run for you if you don't have things set up to do one yourself.
>>>>
>>>> I built a cross-gcc with all languages and a cross-glibc, but I have
>>>> not set up an emulation environment, so if you could give it a test
>>>> that would be highly welcome.
>>>
>>> mipsel-linux-gnu test results are the same before and after this patch.
>>>
>>> Please go ahead and commit.
>>
>> I still object to this.  And it feels like the patch was posted
>> as though it was a new one in order to avoid answering the objections
>> that were raised when it was last posted:
>>
>>    https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>>
>
> Richard, I am really sorry when you feel now like I did not take your
> objections seriously.  Let me first explain what happened from my point
> of view:
>
> When I posted this response to your objections here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html
>
> I was waiting for your response, but nothing happened, so I kind of
> forgot about the issue.  In the meantime Ubuntu and Debian began to
> roll out GCC-6 and they got stuck at the same issue, but instead of
> waiting for pr69012 to be eventually resolved they created a duplicate
> pr69129, and then last Thursday Nick applied my initial patch without
> my intervention, probably because of the pressure they put on us.

Ah, I'd missed that, sorry.  It's not obvious from the email thread
that the patch was actually approved.  I just see a message from Matthias
saying that it worked for him and then a message from Nick saying that
he'd applied it.

Ah well.  I guess at some point I have to get over the fact that I'm no
longer the MIPS maintainer :-)

> That changed significantly how I looked at the issue after that point,
> as there was no actual regression anymore, the revised patch still made
> sense, but for another reason.  When you look at the 20+ targets in the
> gcc tree you'll see that almost all of them have a frame-layout
> computation function and all except mips have a shortcut
> "if (reload_completed) return;" in that function.  And OTOH mips has
> one of the most complicated frame layout functions of all targets.
>
> For all of these reasons I posted a new patch which tries to resolve
> differences between mips and other targets inital_elimination_offset
> functions.

OK.  But the point still stands that the patch is only useful because
we're now calling mips_compute_frame_info in cases where we wouldn't
previously, because of the rtx_addr_can_trap_p changes.

> I still think that it is not OK for the mips target to do the frame
> layout already in mips_frame_pointer_required because the frame layout
> will change several times, until reload is completed, and that function
> is only called right in the beginning.

I don't think it's any better or worse than doing the frame layout in
INITIAL_ELIMINATION_OFFSET (which is common practice and pretty much
required).  They're both part of the initial setup phase --
targetm.frame_layout_required determines frame_pointer_needed, which is
a vital input to the code that decides which eliminations to make.

And this is an example of us doing the kind of caching that I was
suggesting.  Code that wants to know whether the frame pointer is needed
for the current function should use frame_pointer_needed.  Only the code
that sets up frame_pointer_needed should call frame_pointer_required
directly. 

> And I think that it is not really well-designed to have a frame layout
> function F(x,y)->z unless you assume F to be a trivial O(1) function
> that has no significant computation overhead.  When you assume F to be
> an expensive O(N) or higher complexity you would design the interface
> like compute_frame_layout_now() and
> get_cached_initial_elimination_offset(x,y)->z.
>
> Also note, that with the current design, of initial_elimination_offset
> the frame layout is already computed several times, because reload has
> to call the function with different parameters, and there is no way how
> to know when the cached value can be used and when not.

Agreed -- the current interface is pretty poor.  But that's even more
reason not to add uses of it after LRA/reload.  Caching would be both
simpler and more efficient.

> I do also think that having to cache the initial elimination offset
> values in the back-end that are already cached in the target does not
> improve anything.  Then I would prefer to have a set of new target
> callbacks that makes it easy to access the cached target values.

I don't agree.  Asking each backend to cache a particular value in
its frame_info structure and then providing a hook to query that
cached value means adding code to every target.  And calling an
indirect function is going to much less efficient than accessing
a structure field.

It seems better to have one piece of code (or maybe two, until
reload goes away) that caches the information that the target-independent
code needs.

I think that applies to other frame-related information too.  At the
moment there's no simple way for the postreload passes to tell which
registers are available for use and which have to be left untouched,
so we get complicated tests like:

  for (i = nregs - 1; i >= 0; --i)
    if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
	|| fixed_regs[new_reg + i]
	|| global_regs[new_reg + i]
	/* Can't use regs which aren't saved by the prologue.  */
	|| (! df_regs_ever_live_p (new_reg + i)
	    && ! call_used_regs[new_reg + i])
#ifdef LEAF_REGISTERS
	/* We can't use a non-leaf register if we're in a
	   leaf function.  */
	|| (crtl->is_leaf
	    && !LEAF_REGISTERS[new_reg + i])
#endif
	|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))
      return false;

This is an example of relying (to some extent) on querying the target
code every time we want to know something, but surely it would be
better to have LRA/reload create a single "usable registers" HARD_REG_SET.
(I have an old patch series for that but never found time to clean it up
and submit it.)

> If you want to propose a patch for that I will not object, but I doubt
> it will be suitable for Stage 4.

Fair enough.  Guess I just have to learn to live with it.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-27 23:36         ` Eric Botcazou
@ 2016-01-29  1:09           ` Bernd Schmidt
  2016-01-29 15:18             ` Bernd Edlinger
  2016-01-29 15:41             ` Jakub Jelinek
  0 siblings, 2 replies; 19+ messages in thread
From: Bernd Schmidt @ 2016-01-29  1:09 UTC (permalink / raw)
  To: Eric Botcazou, Richard Sandiford
  Cc: gcc-patches, Matthew Fortune, Bernd Edlinger, Nick Clifton

On 01/28/2016 12:36 AM, Eric Botcazou wrote:
>> [cc-ing Eric as RTL maintainer]
>
> Sorry for the delay, the message apparently bounced]
>
>> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
>> bit of LRA/reload logic:
>>
>> [...]
>>
>> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
>> are expected to trigger the calculation of the target's frame layout
>> (since you need that information to answer the question).
>> To me it seems wrong that we're attempting to call that sort of
>> macro in a query routine like rtx_addr_can_trap_p_1.
>
> I'm a little uncomfortable stepping in here because, while I essentially share
> your objections and was opposed to the patch (I was almost sure that it would
> introduce maintainance issues for no practical benefit), I didn't imagine that
> such a failure mode would have been possible (computing an approximation of
> the frame layout after reload being problematic) so I didn't really object to
> being overruled after seeing Bernd's patch...
>
>> IMO we should cache the information we need@the start of each
>> LRA/reload cycle.  This will be more robust, because there will
>> be no accidental changes to global state either during or after
>> LRA/reload.  It will also be more efficient because
>> rtx_addr_can_trap_p_1 can read the cached variables rather
>> than calling back into the target.
>
> That would be a better design for sure and would eliminate the maintainance
> issues I was originally afraid of.
>
> My recommendation would be to back out Bernd's patch for GCC 6.0 (again, it
> doesn't fix any regression and, more importantly, any bug in real software,
> but only corner case bugs in dumb computer-generated testcases) but with the
> commitment to address the underlying issue for GCC 7.0 and backport the fix to
> GCC 6.x unless really impracticable.  That being said, it's ultimately Jakub
> and Richard's call.

I'm on the fence; I do think the original problem is an issue we should 
fix, but I'm also not terribly happy with the implementation we have 
right now. Besides the issues already mentioned, doesn't it kind of 
assume these offsets are constant (which they definitely aren't, 
consider arg pushes for example)?

I think a better approach might be to just mark accesses at known 
locations in the frame, or arg pushes, as MEM_NOTRAP_P, and consider 
accesses with non-constant or calculated offsets as potentially trapping.

Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-29  1:09           ` Bernd Schmidt
@ 2016-01-29 15:18             ` Bernd Edlinger
  2016-01-29 15:41             ` Jakub Jelinek
  1 sibling, 0 replies; 19+ messages in thread
From: Bernd Edlinger @ 2016-01-29 15:18 UTC (permalink / raw)
  To: Bernd Schmidt, Eric Botcazou, Richard Sandiford
  Cc: gcc-patches, Matthew Fortune, Nick Clifton

On 29.01.2016 02:09, Bernd Schmidt wrote:
> On 01/28/2016 12:36 AM, Eric Botcazou wrote:
>>> [cc-ing Eric as RTL maintainer]
>>
>> Sorry for the delay, the message apparently bounced]
>>
>>> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
>>> bit of LRA/reload logic:
>>>
>>> [...]
>>>
>>> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
>>> are expected to trigger the calculation of the target's frame layout
>>> (since you need that information to answer the question).
>>> To me it seems wrong that we're attempting to call that sort of
>>> macro in a query routine like rtx_addr_can_trap_p_1.
>>
>> I'm a little uncomfortable stepping in here because, while I
>> essentially share
>> your objections and was opposed to the patch (I was almost sure that
>> it would
>> introduce maintainance issues for no practical benefit), I didn't
>> imagine that
>> such a failure mode would have been possible (computing an
>> approximation of
>> the frame layout after reload being problematic) so I didn't really
>> object to
>> being overruled after seeing Bernd's patch...
>>
>>> IMO we should cache the information we need@the start of each
>>> LRA/reload cycle.  This will be more robust, because there will
>>> be no accidental changes to global state either during or after
>>> LRA/reload.  It will also be more efficient because
>>> rtx_addr_can_trap_p_1 can read the cached variables rather
>>> than calling back into the target.
>>
>> That would be a better design for sure and would eliminate the
>> maintainance
>> issues I was originally afraid of.
>>
>> My recommendation would be to back out Bernd's patch for GCC 6.0
>> (again, it
>> doesn't fix any regression and, more importantly, any bug in real
>> software,
>> but only corner case bugs in dumb computer-generated testcases) but
>> with the
>> commitment to address the underlying issue for GCC 7.0 and backport
>> the fix to
>> GCC 6.x unless really impracticable.  That being said, it's ultimately
>> Jakub
>> and Richard's call.
>
> I'm on the fence; I do think the original problem is an issue we should
> fix, but I'm also not terribly happy with the implementation we have
> right now. Besides the issues already mentioned, doesn't it kind of
> assume these offsets are constant (which they definitely aren't,
> consider arg pushes for example)?
>

Yes that is right.  I saw it as a thing that could possibly happen more
often than we know, because it is difficult to spot a wrong code by
ordinary tests.  Even the reproducer from pr61047 does not crash when
it runs in the gcc testsuite, I had to tweak the example first to use
a larger offset to compensate the large number of environment values
that are passed from the test suite in each test execution.

Yes, rtx_addr_can_trap_p_1 does not know how many bytes are pushed on
the stack and I saw no easy way how to get to the REG_NOTES for
instance, because they are attached to the INSN and rtx_addr_can_trap_p
has only access to a MEM rtx.  It is no exact science, but the error is
on the safe side.

Nevertheless, with the data from the target hook the approximation is
good enough, to not pessimize any single bit of code that is generated
in stage2 vs. stage3, which would not have been the case without some
help from the target.

> I think a better approach might be to just mark accesses at known
> locations in the frame, or arg pushes, as MEM_NOTRAP_P, and consider
> accesses with non-constant or calculated offsets as potentially trapping.
>

Yes I think also that might be a next step.  It would probably be good
to somehow fixate the result of rtx_addr_can_trap_p immediately before
reload when the RTX's are still FRAMEP+x and ARGP+x, and annotate that
somehow to the reloaded RTX's, that way it would finally be superfluous
to call the target hook at all, because the actual addresses should not
change during or after reload.  It will probably have to be a spare bit
on the RTX that is currently unused, because MEM_NOTRAP_P is already
used for something different.

It will however not be simple to find a valid piece of C code where the
current implementation with all of its limitations generates different
code compared to an implementation that has access to the exact offsets.
As I said, I tried already, but could not find an example of a missed
optimization due to my patch.

Thanks
Bernd.

>
> Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-29  1:09           ` Bernd Schmidt
  2016-01-29 15:18             ` Bernd Edlinger
@ 2016-01-29 15:41             ` Jakub Jelinek
  2016-01-29 15:47               ` Bernd Schmidt
  1 sibling, 1 reply; 19+ messages in thread
From: Jakub Jelinek @ 2016-01-29 15:41 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Eric Botcazou, Richard Sandiford, gcc-patches, Matthew Fortune,
	Bernd Edlinger, Nick Clifton

On Fri, Jan 29, 2016 at 02:09:25AM +0100, Bernd Schmidt wrote:
> I'm on the fence; I do think the original problem is an issue we should fix,
> but I'm also not terribly happy with the implementation we have right now.

The fact that it has been only reported from generated testcases only means
we are lucky nobody encountered it yet in real-world programs.
Plus, we need to be thankful to people working on those generators that keep
reporting GCC bugs, they significantly improve the compiler.

> Besides the issues already mentioned, doesn't it kind of assume these
> offsets are constant (which they definitely aren't, consider arg pushes for
> example)?

Sure, for some registers the offsets aren't constant.  In some cases we e.g.
have REG_ARGS_SIZE notes, but in other cases don't and don't have anything
else that would help us understand the difference between current sp value
and one at the end of the prologue or so.

> I think a better approach might be to just mark accesses at known locations
> in the frame, or arg pushes, as MEM_NOTRAP_P, and consider accesses with
> non-constant or calculated offsets as potentially trapping.

I don't see how that would work generally.  Sure, if there is e.g. a constant
offset array access, it could be checked easily, but if there is variable offset
array access, that is at some point later on changed into a constant offset
access, you'd need to be conservative, unless you can prove it is in range.
Or perhaps we could also use some other flag (or turn it into __builtin_trap
or __builtin_unreachable or whatever) to mark MEMs that are always invalid
if executed.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-29 15:41             ` Jakub Jelinek
@ 2016-01-29 15:47               ` Bernd Schmidt
  2016-01-29 19:42                 ` Bernd Edlinger
  0 siblings, 1 reply; 19+ messages in thread
From: Bernd Schmidt @ 2016-01-29 15:47 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Eric Botcazou, Richard Sandiford, gcc-patches, Matthew Fortune,
	Bernd Edlinger, Nick Clifton

On 01/29/2016 04:41 PM, Jakub Jelinek wrote:
> On Fri, Jan 29, 2016 at 02:09:25AM +0100, Bernd Schmidt wrote:

>> I think a better approach might be to just mark accesses at known locations
>> in the frame, or arg pushes, as MEM_NOTRAP_P, and consider accesses with
>> non-constant or calculated offsets as potentially trapping.
>
> I don't see how that would work generally.  Sure, if there is e.g. a constant
> offset array access, it could be checked easily, but if there is variable offset
> array access, that is at some point later on changed into a constant offset
> access, you'd need to be conservative, unless you can prove it is in range.

Yes. What is the problem with that? If we have (plus sfp const_int) at 
any point before reload, we can check whether that offset is inside 
frame_size. If it isn't or if the offset isn't known, it could trap.


Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-28 22:17           ` Richard Sandiford
@ 2016-01-29 17:54             ` Bernd Edlinger
  0 siblings, 0 replies; 19+ messages in thread
From: Bernd Edlinger @ 2016-01-29 17:54 UTC (permalink / raw)
  To: Matthew Fortune, gcc-patches, Nick Clifton, ebotcazou, rdsandiford

On 28.01.2016 23:17, Richard Sandiford wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> On 26.01.2016 22:18, Richard Sandiford wrote:
>>> [cc-ing Eric as RTL maintainer]
>>>
>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>>>> Has the patch been tested beyond just building GCC? I can do a
>>>>>> test run for you if you don't have things set up to do one yourself.
>>>>>
>>>>> I built a cross-gcc with all languages and a cross-glibc, but I have
>>>>> not set up an emulation environment, so if you could give it a test
>>>>> that would be highly welcome.
>>>>
>>>> mipsel-linux-gnu test results are the same before and after this patch.
>>>>
>>>> Please go ahead and commit.
>>>
>>> I still object to this.  And it feels like the patch was posted
>>> as though it was a new one in order to avoid answering the objections
>>> that were raised when it was last posted:
>>>
>>>     https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>>>
>>
>> Richard, I am really sorry when you feel now like I did not take your
>> objections seriously.  Let me first explain what happened from my point
>> of view:
>>
>> When I posted this response to your objections here:
>>
>>    https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html
>>
>> I was waiting for your response, but nothing happened, so I kind of
>> forgot about the issue.  In the meantime Ubuntu and Debian began to
>> roll out GCC-6 and they got stuck at the same issue, but instead of
>> waiting for pr69012 to be eventually resolved they created a duplicate
>> pr69129, and then last Thursday Nick applied my initial patch without
>> my intervention, probably because of the pressure they put on us.
>
> Ah, I'd missed that, sorry.  It's not obvious from the email thread
> that the patch was actually approved.  I just see a message from Matthias
> saying that it worked for him and then a message from Nick saying that
> he'd applied it.
>
> Ah well.  I guess at some point I have to get over the fact that I'm no
> longer the MIPS maintainer :-)
>
>> That changed significantly how I looked at the issue after that point,
>> as there was no actual regression anymore, the revised patch still made
>> sense, but for another reason.  When you look at the 20+ targets in the
>> gcc tree you'll see that almost all of them have a frame-layout
>> computation function and all except mips have a shortcut
>> "if (reload_completed) return;" in that function.  And OTOH mips has
>> one of the most complicated frame layout functions of all targets.
>>
>> For all of these reasons I posted a new patch which tries to resolve
>> differences between mips and other targets inital_elimination_offset
>> functions.
>
> OK.  But the point still stands that the patch is only useful because
> we're now calling mips_compute_frame_info in cases where we wouldn't
> previously, because of the rtx_addr_can_trap_p changes.
>

Yes of course, but it cannot hurt to have all targets behave
identical in such a central point.  Even if at some point also the
implementation in rtx_addr_can_trap_p will eventually be improved in
one way or the other.

>> I still think that it is not OK for the mips target to do the frame
>> layout already in mips_frame_pointer_required because the frame layout
>> will change several times, until reload is completed, and that function
>> is only called right in the beginning.
>
> I don't think it's any better or worse than doing the frame layout in
> INITIAL_ELIMINATION_OFFSET (which is common practice and pretty much
> required).  They're both part of the initial setup phase --
> targetm.frame_layout_required determines frame_pointer_needed, which is
> a vital input to the code that decides which eliminations to make.
>

Hmm... That can also be a difference between LRA and traditional reload.
Reload calls targetm.frame_pointer_required in update_eliminables, and
can apparently handle 0=>1 and 1=>0 transitions here.

While LRA calls frame_pointer_required only once in
ira_setup_eliminable_regset and can only handle 0=>1 transitions of
frame_pointer_needed in setup_can_eliminate, but not based on
targetm.frame_pointer_required but
targetm.can_eliminate(FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM).

At least it I read the source correctly, I could be wrong of course.

> And this is an example of us doing the kind of caching that I was
> suggesting.  Code that wants to know whether the frame pointer is needed
> for the current function should use frame_pointer_needed.  Only the code
> that sets up frame_pointer_needed should call frame_pointer_required
> directly.
>

But frame_pointer_needed adds at least some value to the raw
targetm.frame_pointer_required, while a cached result of
initial_elimination_offset would not, just conserve the current
interface exactly as it is.

>> And I think that it is not really well-designed to have a frame layout
>> function F(x,y)->z unless you assume F to be a trivial O(1) function
>> that has no significant computation overhead.  When you assume F to be
>> an expensive O(N) or higher complexity you would design the interface
>> like compute_frame_layout_now() and
>> get_cached_initial_elimination_offset(x,y)->z.
>>
>> Also note, that with the current design, of initial_elimination_offset
>> the frame layout is already computed several times, because reload has
>> to call the function with different parameters, and there is no way how
>> to know when the cached value can be used and when not.
>
> Agreed -- the current interface is pretty poor.  But that's even more
> reason not to add uses of it after LRA/reload.  Caching would be both
> simpler and more efficient.
>
>> I do also think that having to cache the initial elimination offset
>> values in the back-end that are already cached in the target does not
>> improve anything.  Then I would prefer to have a set of new target
>> callbacks that makes it easy to access the cached target values.
>
> I don't agree.  Asking each backend to cache a particular value in
> its frame_info structure and then providing a hook to query that
> cached value means adding code to every target.  And calling an
> indirect function is going to much less efficient than accessing
> a structure field.
>
> It seems better to have one piece of code (or maybe two, until
> reload goes away) that caches the information that the target-independent
> code needs.
>

It is for sure a matter of personal taste, nothing more.

And yes, it is do-able, and not even really complicated, but it would
not improve anything, just conserve the interface as it is (or as it was
before I spoiled it :).

I would rather like to improve something at this point.  For instance
I think it would be good to split the initial_elimination_offset
function in an optional compute_initial_frame_layout function, doing
only the O(N) computation and leave the actual O(1) querying of the
result in the initial_elimination_offset function.  If the target does
not implement the compute_initial_frame_layout the default just does
nothing, and the initial_elimination_offset function does not have
to change, but if the target implements the optional
compute_initial_frame_layout, it can rely on that function to be called
immediately before the initial_elimination_offset, which will then be
only an O(1) function.  That would at least improve a tiny bit over the
current state of the initial_elimination_offset function.


Tanks
Bernd.

> I think that applies to other frame-related information too.  At the
> moment there's no simple way for the postreload passes to tell which
> registers are available for use and which have to be left untouched,
> so we get complicated tests like:
>
>    for (i = nregs - 1; i >= 0; --i)
>      if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
> 	|| fixed_regs[new_reg + i]
> 	|| global_regs[new_reg + i]
> 	/* Can't use regs which aren't saved by the prologue.  */
> 	|| (! df_regs_ever_live_p (new_reg + i)
> 	    && ! call_used_regs[new_reg + i])
> #ifdef LEAF_REGISTERS
> 	/* We can't use a non-leaf register if we're in a
> 	   leaf function.  */
> 	|| (crtl->is_leaf
> 	    && !LEAF_REGISTERS[new_reg + i])
> #endif
> 	|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))
>        return false;
>
> This is an example of relying (to some extent) on querying the target
> code every time we want to know something, but surely it would be
> better to have LRA/reload create a single "usable registers" HARD_REG_SET.
> (I have an old patch series for that but never found time to clean it up
> and submit it.)
>
>> If you want to propose a patch for that I will not object, but I doubt
>> it will be suitable for Stage 4.
>
> Fair enough.  Guess I just have to learn to live with it.
>
> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-29 15:47               ` Bernd Schmidt
@ 2016-01-29 19:42                 ` Bernd Edlinger
  2016-02-01 12:18                   ` Bernd Schmidt
  0 siblings, 1 reply; 19+ messages in thread
From: Bernd Edlinger @ 2016-01-29 19:42 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek
  Cc: Eric Botcazou, Richard Sandiford, gcc-patches, Matthew Fortune,
	Nick Clifton

On 29.01.2016 16:47 Bernd Schmidt wrote:
> On 01/29/2016 04:41 PM, Jakub Jelinek wrote:
>> On Fri, Jan 29, 2016 at 02:09:25AM +0100, Bernd Schmidt wrote:
>
>>> I think a better approach might be to just mark accesses at known
>>> locations
>>> in the frame, or arg pushes, as MEM_NOTRAP_P, and consider accesses with
>>> non-constant or calculated offsets as potentially trapping.
>>
>> I don't see how that would work generally.  Sure, if there is e.g. a
>> constant
>> offset array access, it could be checked easily, but if there is
>> variable offset
>> array access, that is at some point later on changed into a constant
>> offset
>> access, you'd need to be conservative, unless you can prove it is in
>> range.
>
> Yes. What is the problem with that? If we have (plus sfp const_int) at
> any point before reload, we can check whether that offset is inside
> frame_size. If it isn't or if the offset isn't known, it could trap.
>
>

Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
but wait, in the if statement we know, that x==1234, so everything
turns in one magic constant, and we have a totally new constant offset
from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
trap we think we do not need the if at all => BANG.


Bernd.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-01-29 19:42                 ` Bernd Edlinger
@ 2016-02-01 12:18                   ` Bernd Schmidt
  2016-02-01 12:49                     ` Richard Biener
  2016-02-01 13:19                     ` AW: " Bernd Edlinger
  0 siblings, 2 replies; 19+ messages in thread
From: Bernd Schmidt @ 2016-02-01 12:18 UTC (permalink / raw)
  To: Bernd Edlinger, Jakub Jelinek
  Cc: Eric Botcazou, Richard Sandiford, gcc-patches, Matthew Fortune,
	Nick Clifton

On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
> On 29.01.2016 16:47 Bernd Schmidt wrote:
>>
>> Yes. What is the problem with that? If we have (plus sfp const_int) at
>> any point before reload, we can check whether that offset is inside
>> frame_size. If it isn't or if the offset isn't known, it could trap.
>>
>>
>
> Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
> and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
> but wait, in the if statement we know, that x==1234, so everything
> turns in one magic constant, and we have a totally new constant offset
> from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
> Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
> trap we think we do not need the if at all => BANG.

What are you trying to say here? As far as I can tell this isn't a 
problem with my proposed solution (set MEM_NOTRAP_P for valid SFP+x 
offsets).


Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-02-01 12:18                   ` Bernd Schmidt
@ 2016-02-01 12:49                     ` Richard Biener
  2016-02-01 12:51                       ` Bernd Schmidt
  2016-02-01 13:19                     ` AW: " Bernd Edlinger
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Biener @ 2016-02-01 12:49 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Bernd Edlinger, Jakub Jelinek, Eric Botcazou, Richard Sandiford,
	gcc-patches, Matthew Fortune, Nick Clifton

On Mon, Feb 1, 2016 at 1:18 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
>>
>> On 29.01.2016 16:47 Bernd Schmidt wrote:
>>>
>>>
>>> Yes. What is the problem with that? If we have (plus sfp const_int) at
>>> any point before reload, we can check whether that offset is inside
>>> frame_size. If it isn't or if the offset isn't known, it could trap.
>>>
>>>
>>
>> Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
>> and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
>> but wait, in the if statement we know, that x==1234, so everything
>> turns in one magic constant, and we have a totally new constant offset
>> from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
>> Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
>> trap we think we do not need the if at all => BANG.
>
>
> What are you trying to say here? As far as I can tell this isn't a problem
> with my proposed solution (set MEM_NOTRAP_P for valid SFP+x offsets).

What prevents motion of those across a stack adjustment (thus a place
they are _not_ valid?)

Richard.

>
> Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-02-01 12:49                     ` Richard Biener
@ 2016-02-01 12:51                       ` Bernd Schmidt
  2016-02-01 13:10                         ` Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Bernd Schmidt @ 2016-02-01 12:51 UTC (permalink / raw)
  To: Richard Biener
  Cc: Bernd Edlinger, Jakub Jelinek, Eric Botcazou, Richard Sandiford,
	gcc-patches, Matthew Fortune, Nick Clifton

On 02/01/2016 01:49 PM, Richard Biener wrote:

> What prevents motion of those across a stack adjustment (thus a place
> they are _not_ valid?)

If the address is SP-based, dependencies on the address register. If 
you're thinking prologue stack adjustments, ports where this could be an 
issue emit suitable barrier insns.


Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-02-01 12:51                       ` Bernd Schmidt
@ 2016-02-01 13:10                         ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2016-02-01 13:10 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Bernd Edlinger, Jakub Jelinek, Eric Botcazou, Richard Sandiford,
	gcc-patches, Matthew Fortune, Nick Clifton

On Mon, Feb 1, 2016 at 1:51 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 02/01/2016 01:49 PM, Richard Biener wrote:
>
>> What prevents motion of those across a stack adjustment (thus a place
>> they are _not_ valid?)
>
>
> If the address is SP-based, dependencies on the address register. If you're
> thinking prologue stack adjustments, ports where this could be an issue emit
> suitable barrier insns.

Ok.  Just I remember we can't mark non-trapping memory references in general
as they might be non-trapping only conditional, aka if (p != NULL) *p = 1;

I suppose stack accesses are special enough where issues like that can't pop up?
I can think of

foo (int i)
{
 char *p = alloca(10);
 if (i < 10)
   p[i] = 1;
}

or stuff like that.  But I guess your proposal is to handle the simple
non-variable-indexed
cases and leave the rest as always conservatively trapping.

Richard.

>
> Bernd
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* AW: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)
  2016-02-01 12:18                   ` Bernd Schmidt
  2016-02-01 12:49                     ` Richard Biener
@ 2016-02-01 13:19                     ` Bernd Edlinger
  1 sibling, 0 replies; 19+ messages in thread
From: Bernd Edlinger @ 2016-02-01 13:19 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek
  Cc: Eric Botcazou, Richard Sandiford, gcc-patches, Matthew Fortune,
	Nick Clifton, Richard Biener

Hi,

On 02/01/2016 13:18, Bernd Schmidt wrote:
> On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
> > Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
> > and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
> > but wait, in the if statement we know, that x==1234, so everything
> > turns in one magic constant, and we have a totally new constant offset
> > from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
> > Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
> > trap we think we do not need the if at all => BANG.
> 
> What are you trying to say here? As far as I can tell this isn't a
> problem with my proposed solution (set MEM_NOTRAP_P for valid SFP+x
> offsets).

First, I think there are cases when x is still a pseudo before reload, and after
reload, some constant propagation manages to replace x with a constant value.
Then it will not help to know what rtx_addr_can_trap_p was before reload.

Second, I do also think that it may be helpful to annotate the RTX during
reload.  MEM_NOTRAP_P is already used, so maybe something new
like a spare bit would be necessary.

But maybe a single bit will not be enough, what I would need, would be to
somehow annotate the stack_pointer_rtx, that it has a known constant offset
to the initial stack pointer value.  The problem with that is, that stack_pointer_rtx
is a shared pointer, and it is therefore not possible to modify it, and attach
some context to it.

Bernd.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-02-01 13:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-23 13:25 [PATCH] Skip re-computing the mips frame info after reload completed Bernd Edlinger
2016-01-24 21:57 ` Matthew Fortune
2016-01-25  7:44   ` AW: " Bernd Edlinger
2016-01-25 19:57     ` Matthew Fortune
2016-01-26 21:18       ` Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed) Richard Sandiford
2016-01-27 21:09         ` Bernd Edlinger
2016-01-28 22:17           ` Richard Sandiford
2016-01-29 17:54             ` Bernd Edlinger
2016-01-27 23:36         ` Eric Botcazou
2016-01-29  1:09           ` Bernd Schmidt
2016-01-29 15:18             ` Bernd Edlinger
2016-01-29 15:41             ` Jakub Jelinek
2016-01-29 15:47               ` Bernd Schmidt
2016-01-29 19:42                 ` Bernd Edlinger
2016-02-01 12:18                   ` Bernd Schmidt
2016-02-01 12:49                     ` Richard Biener
2016-02-01 12:51                       ` Bernd Schmidt
2016-02-01 13:10                         ` Richard Biener
2016-02-01 13:19                     ` AW: " Bernd Edlinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).