public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Follow-up to PR51471
@ 2014-11-11 22:54 Eric Botcazou
  2014-11-12 20:26 ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2014-11-11 22:54 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

In https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00363.html, Tom reported an 
ICE in the DWARF CFI pass because a frame-related insn was speculated by the 
reorg pass and, therefore, disabled the optimization.  It turns out that the 
same code (when condition == const_true_rtx) can also duplicate frame-related 
insns, leading to the same ICE in the DWARF CFI pass; we have run into that 
building newlib for a port we are about to contribute.

Hence the attached patch, which disables the optimization in this case too and 
adds a comment for both.  Tested on SPARC/Solaris, applied on the mainline.


2014-11-11  Eric Botcazou  <ebotcazou@adacore.com>

	* reorg.c (fill_slots_from_thread): Do not copy frame-related insns.


-- 
Eric Botcazou

[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 812 bytes --]

Index: reorg.c
===================================================================
--- reorg.c	(revision 217259)
+++ reorg.c	(working copy)
@@ -2501,9 +2501,11 @@ fill_slots_from_thread (rtx_insn *insn,
 
 	  /* There are two ways we can win:  If TRIAL doesn't set anything
 	     needed at the opposite thread and can't trap, or if it can
-	     go into an annulled delay slot.  */
+	     go into an annulled delay slot.  But we want neither to copy
+	     nor to speculate frame-related insns.  */
 	  if (!must_annul
-	      && (condition == const_true_rtx
+	      && ((condition == const_true_rtx
+		   && (own_thread || !RTX_FRAME_RELATED_P (trial)))
 	          || (! insn_sets_resource_p (trial, &opposite_needed, true)
 		      && ! may_trap_or_fault_p (pat)
 		      && ! RTX_FRAME_RELATED_P (trial))))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Follow-up to PR51471
  2014-11-11 22:54 Follow-up to PR51471 Eric Botcazou
@ 2014-11-12 20:26 ` Jeff Law
  2014-11-14 10:46   ` Eric Botcazou
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2014-11-12 20:26 UTC (permalink / raw)
  To: Eric Botcazou, gcc-patches

On 11/11/14 15:28, Eric Botcazou wrote:
> In https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00363.html, Tom reported an
> ICE in the DWARF CFI pass because a frame-related insn was speculated by the
> reorg pass and, therefore, disabled the optimization.  It turns out that the
> same code (when condition == const_true_rtx) can also duplicate frame-related
> insns, leading to the same ICE in the DWARF CFI pass; we have run into that
> building newlib for a port we are about to contribute.
>
> Hence the attached patch, which disables the optimization in this case too and
> adds a comment for both.  Tested on SPARC/Solaris, applied on the mainline.
>
>
> 2014-11-11  Eric Botcazou  <ebotcazou@adacore.com>
>
> 	* reorg.c (fill_slots_from_thread): Do not copy frame-related insns.
I wonder how many other problems of this nature are lurking in reorg.c. 
  For example steal_delay_list_from_{target,fallthrough} or the code 
which searches for arithmetic at the branch target, and puts the 
opposite insn in a delay slot.

In fact, I really wonder if we should be allowing anything frame related 
outside fill_simple_delay_slots.

jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Follow-up to PR51471
  2014-11-12 20:26 ` Jeff Law
@ 2014-11-14 10:46   ` Eric Botcazou
  2014-11-14 17:26     ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2014-11-14 10:46 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

> I wonder how many other problems of this nature are lurking in reorg.c.
> For example steal_delay_list_from_{target,fallthrough} or the code
> which searches for arithmetic at the branch target, and puts the
> opposite insn in a delay slot.

Right, and the latter has already been dealt with by Richard:

2012-02-11  Richard Sandiford  <rdsandiford@googlemail.com>

	PR rtl-optimization/52175
	* reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
	to frame-related instructions.

> In fact, I really wonder if we should be allowing anything frame related
> outside fill_simple_delay_slots.

That could well be the end result after a few more years of tweaking. :-)

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Follow-up to PR51471
  2014-11-14 10:46   ` Eric Botcazou
@ 2014-11-14 17:26     ` Jeff Law
  2014-11-15 10:57       ` Eric Botcazou
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2014-11-14 17:26 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

On 11/14/14 03:44, Eric Botcazou wrote:
>> I wonder how many other problems of this nature are lurking in reorg.c.
>> For example steal_delay_list_from_{target,fallthrough} or the code
>> which searches for arithmetic at the branch target, and puts the
>> opposite insn in a delay slot.
>
> Right, and the latter has already been dealt with by Richard:
>
> 2012-02-11  Richard Sandiford  <rdsandiford@googlemail.com>
>
> 	PR rtl-optimization/52175
> 	* reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
> 	to frame-related instructions.
Ahhh, good.

>
>> In fact, I really wonder if we should be allowing anything frame related
>> outside fill_simple_delay_slots.
>
> That could well be the end result after a few more years of tweaking. :-)
IIRC, fill_eager and its related friends are all speculative in some way 
and aren't those precisely the ones that are causing us problems.   Also 
note we have backends working around this stuff in fairly blunt ways:

;; For conditional branches. Frame related instructions are not allowed
;; because they confuse the unwind support.
(define_attr "in_branch_delay" "false,true"
   (if_then_else (and (eq_attr "type" 
"!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,parallel_branch,trap")
                      (eq_attr "length" "4")
                      (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                 (const_string "true")
                 (const_string "false")))

;; Disallow instructions which use the FPU since they will tie up the FPU
;; even if the instruction is nullified.
(define_attr "in_nullified_branch_delay" "false,true"
   (if_then_else (and (eq_attr "type" 
"!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpdivdbl,fpsqrtsgl,fpsqrtdbl,parallel_branch,trap")
                      (eq_attr "length" "4")
                      (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                 (const_string "true")
                 (const_string "false")))

;; For calls and millicode calls.
(define_attr "in_call_delay" "false,true"
   (if_then_else (and (eq_attr "type" 
"!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,parallel_branch,trap")
                      (eq_attr "length" "4")
                      (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                 (const_string "true")
                 (const_string "false")))


Given architectural difficulties of delay slots on modern processors, 
would it be that painful to just not allow filling slots with frame 
insns and let dbr try to find something else or drop in a nop?  I 
wouldn't be all that surprised if there wasn't a measurable performance 
difference on something like a modern Sparc.


Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Follow-up to PR51471
  2014-11-14 17:26     ` Jeff Law
@ 2014-11-15 10:57       ` Eric Botcazou
  2014-11-15 23:14         ` Matthew Fortune
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2014-11-15 10:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

> IIRC, fill_eager and its related friends are all speculative in some way
> and aren't those precisely the ones that are causing us problems.   Also
> note we have backends working around this stuff in fairly blunt ways:

I'd say that the PA back-end went a bit too far here, especially if it marks 
some insns of the epilogue as frame-related.  dwarf2cfi.c has special code to 
handle delay slots (SEQUENCEs) so it's not an all-or-nothing game.

> Given architectural difficulties of delay slots on modern processors,
> would it be that painful to just not allow filling slots with frame
> insns and let dbr try to find something else or drop in a nop?  I
> wouldn't be all that surprised if there wasn't a measurable performance
> difference on something like a modern Sparc.

Yes, modern SPARCs have (short) branches without delay slots.  But the other 
big contender is MIPS here and the story might be different for it.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Follow-up to PR51471
  2014-11-15 10:57       ` Eric Botcazou
@ 2014-11-15 23:14         ` Matthew Fortune
  2014-11-17  7:18           ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Fortune @ 2014-11-15 23:14 UTC (permalink / raw)
  To: Eric Botcazou, Jeff Law; +Cc: gcc-patches

Eric Botcazou <ebotcazou@adacore.com> writes:
> > IIRC, fill_eager and its related friends are all speculative in some
> way
> > and aren't those precisely the ones that are causing us problems.
> Also
> > note we have backends working around this stuff in fairly blunt ways:
> 
> I'd say that the PA back-end went a bit too far here, especially if it
> marks some insns of the epilogue as frame-related.  dwarf2cfi.c has
> special code to handle delay slots (SEQUENCEs) so it's not an all-or-
> nothing game.
> 
> > Given architectural difficulties of delay slots on modern processors,
> > would it be that painful to just not allow filling slots with frame
> > insns and let dbr try to find something else or drop in a nop?  I
> > wouldn't be all that surprised if there wasn't a measurable
> > performance difference on something like a modern Sparc.
> 
> Yes, modern SPARCs have (short) branches without delay slots.  But the
> other big contender is MIPS here and the story might be different for
> it.

MIPSr6 introduces 'compact' branches which do not have delay slots.

So the issues of filling delay slots will be less important from R6
onwards. However, delay slots remain important for now.

I haven't thought about the problem much but instinctively I'd be surprised
if a blanket restriction on frame-related instructions would lead to lots
of NOPs in delay slots.

Matthew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Follow-up to PR51471
  2014-11-15 23:14         ` Matthew Fortune
@ 2014-11-17  7:18           ` Jeff Law
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Law @ 2014-11-17  7:18 UTC (permalink / raw)
  To: Matthew Fortune, Eric Botcazou; +Cc: gcc-patches

On 11/15/14 14:37, Matthew Fortune wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:
>>> IIRC, fill_eager and its related friends are all speculative in some
>> way
>>> and aren't those precisely the ones that are causing us problems.
>> Also
>>> note we have backends working around this stuff in fairly blunt ways:
>>
>> I'd say that the PA back-end went a bit too far here, especially if it
>> marks some insns of the epilogue as frame-related.  dwarf2cfi.c has
>> special code to handle delay slots (SEQUENCEs) so it's not an all-or-
>> nothing game.
>>
>>> Given architectural difficulties of delay slots on modern processors,
>>> would it be that painful to just not allow filling slots with frame
>>> insns and let dbr try to find something else or drop in a nop?  I
>>> wouldn't be all that surprised if there wasn't a measurable
>>> performance difference on something like a modern Sparc.
>>
>> Yes, modern SPARCs have (short) branches without delay slots.  But the
>> other big contender is MIPS here and the story might be different for
>> it.
>
> MIPSr6 introduces 'compact' branches which do not have delay slots.
>
> So the issues of filling delay slots will be less important from R6
> onwards. However, delay slots remain important for now.
>
> I haven't thought about the problem much but instinctively I'd be surprised
> if a blanket restriction on frame-related instructions would lead to lots
> of NOPs in delay slots.
Possibly.  I'd be surprised if frame-related stuff is used that often 
for filling slots...  Combine that with the decrease in importance for 
filling delay slots when the exist, I wouldn't be terribly surprised if 
nobody could actually measure the change if we were to make it.

The PA port may have gone too far, but it's certainly conservatively 
correct and on every PA processor that "matters" (for a very liberal 
definition of matters), I doubt the difference is measurable due to the 
depth of the reorder buffers and the fact that a nop can retire anytime 
that's convenient.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-11-17  6:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-11 22:54 Follow-up to PR51471 Eric Botcazou
2014-11-12 20:26 ` Jeff Law
2014-11-14 10:46   ` Eric Botcazou
2014-11-14 17:26     ` Jeff Law
2014-11-15 10:57       ` Eric Botcazou
2014-11-15 23:14         ` Matthew Fortune
2014-11-17  7:18           ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).