reload question

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* reload question
@ 2011-08-11 16:49 Hari Sandanagobalane
  0 siblings, 0 replies; 32+ messages in thread
From: Hari Sandanagobalane @ 2011-08-11 16:49 UTC (permalink / raw)
  To: gcc

Hello all,
I was making some modifications to picochip port and ran into a problem 
with cse within reload and I think it is a bug. Can someone familiar 
with reload let me know if it is indeed a bug. The c testcase that 
caused the problem was 
gcc-4.6.0/gcc/testsuite/./gcc.c-torture/execute/991216-2.c.

After IRA, the relevant section of code was.

(insn 127 32 27 2 (set (reg:HI 7 R7)
         (const_int 17767 [0x4567])) test.c:14 15 {movhi}
      (nil))

(insn 27 127 128 2 (set (mem:HI (reg/f:SI 39 SP) [0 S2 A32])
         (reg:HI 7 R7)) test.c:14 15 {movhi}
      (nil))

(insn 128 27 129 2 (set (reg:SI 6 R6)
         (symbol_ref:SI ("test") [flags 0x41]  <function_decl 
0x7f9926963800 test>)) test.c:14 18 {movsi_immediate}
      (nil))

(insn 129 128 35 2 (set (reg:SI 36 A0)
         (reg:SI 6 R6)) test.c:14 16 {movsi_nonconst}
      (nil))

(call_insn 35 129 41 2 (parallel [
             (call (mem:QI (reg:SI 36 A0) [0 S1 A8])
                 (const_int 8 [0x8]))
             (clobber (reg:SI 38 LR))
         ]) test.c:14 22 {call_using_register_32bit}
      (nil)
     (expr_list:REG_DEP_TRUE (use (reg:HI 5 R5))
         (expr_list:REG_DEP_TRUE (use (reg:HI 4 R4))
             (expr_list:REG_DEP_TRUE (use (reg:HI 2 R2))
                 (expr_list:REG_DEP_TRUE (use (reg:HI 1 R1))
                     (expr_list:REG_DEP_TRUE (use (reg:HI 0 R0))
                         (nil)))))))

(insn 41 35 40 2 (set (reg:HI 0 R0)
         (const_int 4 [0x4])) test.c:15 15 {movhi}
      (nil))

(insn 40 41 39 2 (set (reg:HI 5 R5)
         (const_int -30293 [0xffffffffffff89ab])) test.c:15 15 {movhi}
      (nil))

(insn 39 40 42 2 (set (reg:HI 4 R4)
         (const_int -12817 [0xffffffffffffcdef])) test.c:15 15 {movhi}
      (nil))

(insn 42 39 43 2 (set (reg:HI 1 R1)
         (const_int 2 [0x2])) test.c:15 15 {movhi}
      (nil))

(insn 43 42 130 2 (set (reg:HI 2 R2)
         (const_int 3 [0x3])) test.c:15 15 {movhi}
      (nil))

(insn 130 43 36 2 (set (reg:HI 3 R3)
         (const_int 85 [0x55])) test.c:15 15 {movhi}
      (nil))

(insn 36 130 44 2 (set (mem:HI (plus:SI (reg/f:SI 39 SP)
                 (const_int 4 [0x4])) [0 S2 A16])
         (reg:HI 3 R3)) test.c:15 15 {movhi}
      (nil))

(insn 44 36 131 2 (set (reg:HI 3 R3)
         (reg:HI 0 R0)) test.c:15 15 {movhi}
      (expr_list:REG_EQUAL (const_int 4 [0x4])
         (nil)))

(insn 131 44 37 2 (set (reg:HI 6 R6)
         (const_int 291 [0x123])) test.c:15 15 {movhi}
      (nil))

(insn 37 131 132 2 (set (mem:HI (plus:SI (reg/f:SI 39 SP)
                 (const_int 2 [0x2])) [0 S2 A32])
         (reg:HI 6 R6)) test.c:15 15 {movhi}
      (nil))

(insn 132 37 38 2 (set (reg:HI 7 R7)
         (const_int 17767 [0x4567])) test.c:15 15 {movhi}
      (nil))

(insn 38 132 133 2 (set (mem:HI (reg/f:SI 39 SP) [0 S2 A32])
         (reg:HI 7 R7)) test.c:15 15 {movhi}
      (nil))

(insn 133 38 134 2 (set (reg:SI 6 R6)
         (symbol_ref:SI ("test") [flags 0x41]  <function_decl 
0x7f9926963800 test>)) test.c:15 18 {movsi_immediate}
      (nil))


All of the above code is within the same basic block. Note that R7 is 
being set in the first instruction in this segment (insn 127) to the 
value 17767. It is also being set to the same value in instruction 132. 
But, the interesting thing to note here is that in insn 128 (SI R6) is 
being set to a symbol. SI R6 is R[7:6] combined.

But, after reload, this section of code becomes.

(insn 127 32 27 2 (set (reg:HI 7 R7)
         (const_int 17767 [0x4567])) test.c:14 15 {movhi}
      (nil))

(insn 27 127 128 2 (set (mem:HI (reg/f:SI 39 SP) [0 S2 A32])
         (reg:HI 7 R7)) test.c:14 15 {movhi}
      (nil))

(insn 128 27 129 2 (set (reg:SI 6 R6)
         (symbol_ref:SI ("test") [flags 0x41]  <function_decl 
0x7f9926963800 test>)) test.c:14 18 {movsi_immediate}
      (nil))

(insn 129 128 35 2 (set (reg:SI 36 A0)
         (reg:SI 6 R6)) test.c:14 16 {movsi_nonconst}
      (nil))

(call_insn 35 129 41 2 (parallel [
             (call (mem:QI (reg:SI 36 A0) [0 S1 A8])
                 (const_int 8 [0x8]))
             (clobber (reg:SI 38 LR))
         ]) test.c:14 22 {call_using_register_32bit}
      (nil)
     (expr_list:REG_DEP_TRUE (use (reg:HI 5 R5))
         (expr_list:REG_DEP_TRUE (use (reg:HI 4 R4))
             (expr_list:REG_DEP_TRUE (use (reg:HI 2 R2))
                 (expr_list:REG_DEP_TRUE (use (reg:HI 1 R1))
                     (expr_list:REG_DEP_TRUE (use (reg:HI 0 R0))
                         (nil)))))))

(insn 41 35 40 2 (set (reg:HI 0 R0)
         (const_int 4 [0x4])) test.c:15 15 {movhi}
      (nil))

(insn 40 41 39 2 (set (reg:HI 5 R5)
         (const_int -30293 [0xffffffffffff89ab])) test.c:15 15 {movhi}
      (nil))

(insn 39 40 42 2 (set (reg:HI 4 R4)
         (const_int -12817 [0xffffffffffffcdef])) test.c:15 15 {movhi}
      (nil))

(insn 42 39 43 2 (set (reg:HI 1 R1)
         (const_int 2 [0x2])) test.c:15 15 {movhi}
      (nil))

(insn 43 42 130 2 (set (reg:HI 2 R2)
         (const_int 3 [0x3])) test.c:15 15 {movhi}
      (nil))

(insn 130 43 36 2 (set (reg:HI 3 R3)
         (const_int 85 [0x55])) test.c:15 15 {movhi}
      (nil))

(insn 36 130 44 2 (set (mem:HI (plus:SI (reg/f:SI 39 SP)
                 (const_int 4 [0x4])) [0 S2 A16])
         (reg:HI 3 R3)) test.c:15 15 {movhi}
      (nil))

(insn 44 36 131 2 (set (reg:HI 3 R3)
         (reg:HI 0 R0)) test.c:15 15 {movhi}
      (expr_list:REG_EQUAL (const_int 4 [0x4])
         (nil)))

(insn 131 44 37 2 (set (reg:HI 6 R6)
         (const_int 291 [0x123])) test.c:15 15 {movhi}
      (nil))

(insn 37 131 132 2 (set (mem:HI (plus:SI (reg/f:SI 39 SP)
                 (const_int 2 [0x2])) [0 S2 A32])
         (reg:HI 6 R6)) test.c:15 15 {movhi}
      (nil))

(insn 132 37 38 2 (set (reg:HI 7 R7)
         (reg:HI 7 R7)) test.c:15 15 {movhi}
      (nil))

(insn 38 132 133 2 (set (mem:HI (reg/f:SI 39 SP) [0 S2 A32])
         (reg:HI 7 R7)) test.c:15 15 {movhi}
      (nil))

(insn 133 38 134 2 (set (reg:SI 6 R6)
         (symbol_ref:SI ("test") [flags 0x41]  <function_decl 
0x7f9926963800 test>)) test.c:15 18 {movsi_immediate}
      (nil))

So, the CSE run in postreload seems to assume that R7 still contains 
constant 17767, whereas it actually has been overwritten as part of (SI R6).

I looked at the code in postreload.c file to find the cause. In function 
reload_cse_move2add, there is a call to note_stores ~line 2030 which 
seems to handle multi-reg modes well. It looks at hard_regno_nregs and 
resets the information about each register in that set. But this 
function is not always called. There are early exit (continue) 
statements within the reload_cse_move2add which do not handle multi-reg 
modes at all.

The attached patch fixes the problem for me. I think similar code is 
probably also needed in move2add_use_add2_insn. Is that correct?

All my experiments were conducted on 4.6.0 release (but 4.6.1 was no 
different)

Many thanks for your help.

Regards
Hari


Patch:
Index: postreload.c
===================================================================
--- postreload.c        (revision 171932)
+++ postreload.c        (working copy)
@@ -1746,6 +1746,8 @@
    rtx src = SET_SRC (pat);
    int regno = REGNO (reg);
    int min_regno = 0;
+  enum machine_mode mode = GET_MODE (reg);
+  unsigned int nregs = nregs = hard_regno_nregs[regno][mode];
    bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
    int i;
    bool changed = false;
@@ -1807,11 +1809,21 @@
        if (validate_change (insn, &SET_SRC (pat), tem, 0))
         changed = true;
      }
+
    reg_set_luid[regno] = move2add_luid;
    reg_base_reg[regno] = -1;
    reg_mode[regno] = GET_MODE (reg);
    reg_symbol_ref[regno] = sym;
    reg_offset[regno] = INTVAL (off);
+  if (nregs != 1)
+  {
+    unsigned int endregno = regno + nregs;
+    for (i = regno; i < endregno; i++)
+      {
+       /* Reset the information about this register.  */
+       reg_set_luid[i] = 0;
+      }
+  }
    return changed;
  }


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2010-06-23 17:41 Alex Turjan
  2010-06-23 18:30 ` Jeff Law
@ 2010-06-24  3:44 ` Joern Rennecke
  1 sibling, 0 replies; 32+ messages in thread
From: Joern Rennecke @ 2010-06-24  3:44 UTC (permalink / raw)
  To: gcc

Quoting Alex Turjan <aturjan@yahoo.com>:

> When Im compiling a loop with high register pressure the register   
> allocator does not allocate a register for the loop counter (i.e.,   
> operand 0) as it has a long life range. Thus operand 0 has to be   
> reloaded but then I get a failure in the reload:
...
> Can anybody give me a hint?

Look at the arc[compact] machine description arc-4_4-20090909-branch -
it works quite nicely there.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2010-06-23 21:29   ` Alex Turjan
@ 2010-06-23 21:43     ` Jeff Law
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Law @ 2010-06-23 21:43 UTC (permalink / raw)
  To: Alex Turjan; +Cc: gcc

On 06/23/10 12:29, Alex Turjan wrote:
>    
>> insns which branch are not allowed to have output
>> reloads.  You must
>> support any kind of register as well as memory operands in
>> your insn for
>> the loop counter.
>>      
> Thanks for answer but what do you suggest to do, as my architecture done not support HW loops with memory operands?
>    
Emulate it using normal instructions.  You can find examples in various 
other ports.  These situations are rare, but for proper operation, you 
need to support them.

Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2010-06-23 18:30 ` Jeff Law
@ 2010-06-23 21:29   ` Alex Turjan
  2010-06-23 21:43     ` Jeff Law
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Turjan @ 2010-06-23 21:29 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc


> insns which branch are not allowed to have output
> reloads.  You must 
> support any kind of register as well as memory operands in
> your insn for 
> the loop counter.

Thanks for answer but what do you suggest to do, as my architecture done not support HW loops with memory operands?

Alex



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2010-06-23 17:41 Alex Turjan
@ 2010-06-23 18:30 ` Jeff Law
  2010-06-23 21:29   ` Alex Turjan
  2010-06-24  3:44 ` Joern Rennecke
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Law @ 2010-06-23 18:30 UTC (permalink / raw)
  To: Alex Turjan; +Cc: gcc

On 06/23/10 11:22, Alex Turjan wrote:
> Hi,
> My port supports hardware loops generated by the following (do_end) pattern:
>
> (set (pc) (if_then_else (ne (match_operand:HI 0 "general_register_operand" "d")
>                             (const_int 0))
>                         (label_ref (match_operand 1 "" ""))
>                         (pc)))
>       (set  (match_operand:HI 2 "general_register_operand" "=0")
>           (plus:HI (match_operand:HI 3 "general_register_operand" "0")
>                            (const_int -1)))
>      (clobber (match_operand:BI 4 "predicate_register_operand" "=j"))
>
>
> When Im compiling a loop with high register pressure the register allocator does not allocate a register for the loop counter (i.e., operand 0) as it has a long life range. Thus operand 0 has to be reloaded but then I get a failure in the reload:
>
> error: unable to generate reloads for:
> (jump_insn 211 182 188 11 lib5.c:51 (parallel [
>              (set (pc)
>                  (if_then_else (ne (reg:HI 292 [ N ])
>                          (const_int 0 [0x0]))
>                      (label_ref 183)
>                      (pc)))
>              (set (reg:HI 292 [ N ])
>                  (plus:HI (reg:HI 292 [ N ])
>                      (const_int -1 [0xffffffffffffffff])))
>              (clobber (reg:BI 33 p1 [293]))
>          ]) 506 {do_endhi} (expr_list:REG_UNUSED (reg:BI 33 p1 [293])
>          (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
>              (nil)))
>   ->  183)
> lib5.c:105:1: internal compiler error: in find_reloads, at reload.c:3821
>
> Can anybody give me a hint?
> I am aware of the following msg:
> http://gcc.gnu.org/ml/gcc/2001-09/msg00942.html
> but still dont know how to make reload do the work.
>    
insns which branch are not allowed to have output reloads.  You must 
support any kind of register as well as memory operands in your insn for 
the loop counter.

Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* reload question
@ 2010-06-23 17:41 Alex Turjan
  2010-06-23 18:30 ` Jeff Law
  2010-06-24  3:44 ` Joern Rennecke
  0 siblings, 2 replies; 32+ messages in thread
From: Alex Turjan @ 2010-06-23 17:41 UTC (permalink / raw)
  To: gcc

Hi,
My port supports hardware loops generated by the following (do_end) pattern: 

(set (pc) (if_then_else (ne (match_operand:HI 0 "general_register_operand" "d")
                           (const_int 0))
                       (label_ref (match_operand 1 "" ""))
                       (pc)))
     (set  (match_operand:HI 2 "general_register_operand" "=0")
         (plus:HI (match_operand:HI 3 "general_register_operand" "0")
                          (const_int -1)))
    (clobber (match_operand:BI 4 "predicate_register_operand" "=j"))


When Im compiling a loop with high register pressure the register allocator does not allocate a register for the loop counter (i.e., operand 0) as it has a long life range. Thus operand 0 has to be reloaded but then I get a failure in the reload:

error: unable to generate reloads for:
(jump_insn 211 182 188 11 lib5.c:51 (parallel [
            (set (pc)
                (if_then_else (ne (reg:HI 292 [ N ])
                        (const_int 0 [0x0]))
                    (label_ref 183)
                    (pc)))
            (set (reg:HI 292 [ N ])
                (plus:HI (reg:HI 292 [ N ])
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (reg:BI 33 p1 [293]))
        ]) 506 {do_endhi} (expr_list:REG_UNUSED (reg:BI 33 p1 [293])
        (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
            (nil)))
 -> 183)
lib5.c:105:1: internal compiler error: in find_reloads, at reload.c:3821

Can anybody give me a hint?
I am aware of the following msg: 
http://gcc.gnu.org/ml/gcc/2001-09/msg00942.html
but still dont know how to make reload do the work.

thanks,
Alex


      

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2007-08-13 14:55   ` Pat Haugen
@ 2007-08-13 17:42     ` Ian Lance Taylor
  0 siblings, 0 replies; 32+ messages in thread
From: Ian Lance Taylor @ 2007-08-13 17:42 UTC (permalink / raw)
  To: Pat Haugen; +Cc: gcc, Ulrich Weigand

Pat Haugen <pthaugen@us.ibm.com> writes:

> I was thinking of reordering the if tests such that we check if op0 is
> already ok_for_base or op1 is ok_for_index before we check the inverse
> conditions (which result in swapping the classes of the operands). I would
> also change the default else leg such that it keeps op0 as the base reg and
> op1 as an index reg.

That patch would probably be fine.  Ideally you would do some
performance testing on a couple of primary platforms.

Ian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2007-08-11  0:18 ` Ian Lance Taylor
@ 2007-08-13 14:55   ` Pat Haugen
  2007-08-13 17:42     ` Ian Lance Taylor
  0 siblings, 1 reply; 32+ messages in thread
From: Pat Haugen @ 2007-08-13 14:55 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc, Ulrich Weigand

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

Ian Lance Taylor <iant@google.com> wrote on 08/10/2007 07:17:21 PM:
>
> I'm not entirely clear: how do you propose changing the code?
>

I was thinking of reordering the if tests such that we check if op0 is
already ok_for_base or op1 is ok_for_index before we check the inverse
conditions (which result in swapping the classes of the operands). I would
also change the default else leg such that it keeps op0 as the base reg and
op1 as an index reg.


Example diff (not tested)

(See attached file: reload.diff)


[-- Attachment #2: reload.diff --]
[-- Type: application/octet-stream, Size: 2931 bytes --]

Index: gcc/reload.c
===================================================================
*** gcc/reload.c	(revision 126856)
--- gcc/reload.c	(working copy)
*************** find_reloads_address_1 (enum machine_mod
*** 5461,5476 ****
  
  	else if (code0 == REG && code1 == REG)
  	  {
! 	    if (REGNO_OK_FOR_INDEX_P (REGNO (op0))
! 		&& regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG))
  	      return 0;
! 	    else if (REGNO_OK_FOR_INDEX_P (REGNO (op1))
! 		     && regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG))
  	      return 0;
- 	    else if (regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG))
- 	      find_reloads_address_1 (mode, orig_op0, 1, PLUS, SCRATCH,
- 				      &XEXP (x, 0), opnum, type, ind_levels,
- 				      insn);
  	    else if (regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG))
  	      find_reloads_address_1 (mode, orig_op1, 1, PLUS, SCRATCH,
  				      &XEXP (x, 1), opnum, type, ind_levels,
--- 5461,5472 ----
  
  	else if (code0 == REG && code1 == REG)
  	  {
! 	    if (REGNO_OK_FOR_INDEX_P (REGNO (op1))
! 		&& regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG))
  	      return 0;
! 	    else if (REGNO_OK_FOR_INDEX_P (REGNO (op0))
! 		     && regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG))
  	      return 0;
  	    else if (regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG))
  	      find_reloads_address_1 (mode, orig_op1, 1, PLUS, SCRATCH,
  				      &XEXP (x, 1), opnum, type, ind_levels,
*************** find_reloads_address_1 (enum machine_mod
*** 5479,5494 ****
  	      find_reloads_address_1 (mode, orig_op0, 0, PLUS, REG,
  				      &XEXP (x, 0), opnum, type, ind_levels,
  				      insn);
  	    else if (REGNO_OK_FOR_INDEX_P (REGNO (op0)))
  	      find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG,
  				      &XEXP (x, 1), opnum, type, ind_levels,
  				      insn);
  	    else
  	      {
! 		find_reloads_address_1 (mode, orig_op0, 1, PLUS, SCRATCH,
  					&XEXP (x, 0), opnum, type, ind_levels,
  					insn);
! 		find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG,
  					&XEXP (x, 1), opnum, type, ind_levels,
  					insn);
  	      }
--- 5475,5494 ----
  	      find_reloads_address_1 (mode, orig_op0, 0, PLUS, REG,
  				      &XEXP (x, 0), opnum, type, ind_levels,
  				      insn);
+ 	    else if (regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG))
+ 	      find_reloads_address_1 (mode, orig_op0, 1, PLUS, SCRATCH,
+ 				      &XEXP (x, 0), opnum, type, ind_levels,
+ 				      insn);
  	    else if (REGNO_OK_FOR_INDEX_P (REGNO (op0)))
  	      find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG,
  				      &XEXP (x, 1), opnum, type, ind_levels,
  				      insn);
  	    else
  	      {
! 		find_reloads_address_1 (mode, orig_op0, 0, PLUS, REG,
  					&XEXP (x, 0), opnum, type, ind_levels,
  					insn);
! 		find_reloads_address_1 (mode, orig_op1, 1, PLUS, SCRATCH,
  					&XEXP (x, 1), opnum, type, ind_levels,
  					insn);
  	      }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2007-08-10 20:02 Pat Haugen
  2007-08-11  0:18 ` Ian Lance Taylor
@ 2007-08-11 11:45 ` Paolo Bonzini
  1 sibling, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2007-08-11 11:45 UTC (permalink / raw)
  To: Pat Haugen; +Cc: gcc, Ulrich Weigand

Pat Haugen wrote:
> Is this being done on purpose (going on assumption that operands are
> commutative), such as to allow more opportunities for a successful
> allocation with reduced spill?

I think so, but reordering the "else if"s should improve the results.

> I've also seen the same situation come up during register renaming
> (regrename.c), but not too surprising since the code there says it's based
> off find_reloads_address_1() and is coded similarly.

... and likewise here.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2007-08-10 20:02 Pat Haugen
@ 2007-08-11  0:18 ` Ian Lance Taylor
  2007-08-13 14:55   ` Pat Haugen
  2007-08-11 11:45 ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Ian Lance Taylor @ 2007-08-11  0:18 UTC (permalink / raw)
  To: Pat Haugen; +Cc: gcc, Ulrich Weigand

Pat Haugen <pthaugen@us.ibm.com> writes:

> I'm looking into a few cases where we're still getting the base/index
> operand ordering wrong on PowerPC for an indexed load/store instruction,
> even after the PTR_PLUS merge and fix for PR28690.  One of the cases I
> observed was caused by reload picking r0 to use for the base reg opnd as a
> result of spilling.  Since r0 is not a valid register for the base reg
> position, we end up switching the order of the operands before emitting the
> instruction which then causes the performance hit on Power6.  r0 is not a
> valid BASE_REG_CLASS register, only INDEX_REG_CLASS, but the following
> section of code from reload.c:find_reloads_address_1() dealing with
> PLUS(REG REG) may try assigning the base reg opnd to the INDEX_REG class in
> a couple situations.  This then allows r0 to be picked for the base reg
> opnd.  Is this being done on purpose (going on assumption that operands are
> commutative), such as to allow more opportunities for a successful
> allocation with reduced spill?  If it's not wise for me to modify this
> code, possibly due to effect on other architectures, what are some other
> options (maybe introduce a new HONOR_BASE_INDEX_ORDER target macro)?

I'm not entirely clear: how do you propose changing the code?

Ian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* reload question
@ 2007-08-10 20:02 Pat Haugen
  2007-08-11  0:18 ` Ian Lance Taylor
  2007-08-11 11:45 ` Paolo Bonzini
  0 siblings, 2 replies; 32+ messages in thread
From: Pat Haugen @ 2007-08-10 20:02 UTC (permalink / raw)
  To: gcc; +Cc: Ulrich Weigand

I'm looking into a few cases where we're still getting the base/index
operand ordering wrong on PowerPC for an indexed load/store instruction,
even after the PTR_PLUS merge and fix for PR28690.  One of the cases I
observed was caused by reload picking r0 to use for the base reg opnd as a
result of spilling.  Since r0 is not a valid register for the base reg
position, we end up switching the order of the operands before emitting the
instruction which then causes the performance hit on Power6.  r0 is not a
valid BASE_REG_CLASS register, only INDEX_REG_CLASS, but the following
section of code from reload.c:find_reloads_address_1() dealing with
PLUS(REG REG) may try assigning the base reg opnd to the INDEX_REG class in
a couple situations.  This then allows r0 to be picked for the base reg
opnd.  Is this being done on purpose (going on assumption that operands are
commutative), such as to allow more opportunities for a successful
allocation with reduced spill?  If it's not wise for me to modify this
code, possibly due to effect on other architectures, what are some other
options (maybe introduce a new HONOR_BASE_INDEX_ORDER target macro)?

        else if (code0 == REG && code1 == REG)
          {
            if (REGNO_OK_FOR_INDEX_P (REGNO (op0))
                && regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG))
              return 0;
            else if (REGNO_OK_FOR_INDEX_P (REGNO (op1))
                     && regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG))
              return 0;
            else if (regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG))
>>>>              find_reloads_address_1 (mode, orig_op0, 1, PLUS, SCRATCH,
                                      &XEXP (x, 0), opnum, type,
ind_levels,
                                      insn);
            else if (regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG))
              find_reloads_address_1 (mode, orig_op1, 1, PLUS, SCRATCH,
                                      &XEXP (x, 1), opnum, type,
ind_levels,
                                      insn);
            else if (REGNO_OK_FOR_INDEX_P (REGNO (op1)))
              find_reloads_address_1 (mode, orig_op0, 0, PLUS, REG,
                                      &XEXP (x, 0), opnum, type,
ind_levels,
                                      insn);
            else if (REGNO_OK_FOR_INDEX_P (REGNO (op0)))
              find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG,
                                      &XEXP (x, 1), opnum, type,
ind_levels,
                                      insn);
            else
              {
>>>>                find_reloads_address_1 (mode, orig_op0, 1, PLUS,
SCRATCH,
                                        &XEXP (x, 0), opnum, type,
ind_levels,
                                        insn);
                find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG,
                                        &XEXP (x, 1), opnum, type,
ind_levels,
                                        insn);
              }
          }

I've also seen the same situation come up during register renaming
(regrename.c), but not too surprising since the code there says it's based
off find_reloads_address_1() and is coded similarly.

-Pat

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-25 15:43         ` Ian Lance Taylor
@ 2005-06-02  5:56           ` Miles Bader
  0 siblings, 0 replies; 32+ messages in thread
From: Miles Bader @ 2005-06-02  5:56 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: tm_gccmail, Bernd Schmidt, gcc

Ian Lance Taylor <ian@airs.com> writes:
> I agree that gcc is not well designed to cope with an accumulator
> architecture.  Reload can't cope.

I've had a fair amount of success with the approach I initially posted
(perturbing the emission order of reloads based on dependencies between
the operand they are associated with).  It's still pretty fragile in the
sense that even a tiny mistake in something like the constraints will
cause reload to barf, but I guess that's to be expected.  The code
produced seems pretty reasonable too (with almost no peepholes at all);
for instance it will allocate a variable to the accumulator in the odd
case where that's possible.

[I don't know if this is the sort of thing that's viable for merging
into standard gcc, but maybe a simpler approach based on a backend hook
or something would be.]

I gotta admit though, by now I thoroughly hate the gcc reload code...

-Miles
-- 
((lambda (x) (list x x)) (lambda (x) (list x x)))

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-25 14:27       ` tm_gccmail
  2005-03-25 15:43         ` Ian Lance Taylor
@ 2005-03-25 16:30         ` Alan Lehotsky
  1 sibling, 0 replies; 32+ messages in thread
From: Alan Lehotsky @ 2005-03-25 16:30 UTC (permalink / raw)
  To: tm_gccmail; +Cc: Miles Bader, Ian Lance Taylor, Bernd Schmidt, gcc

Look at the IP2K port.  It's an 8-bit chip with a 16 bit accumulator 
and VERY limited
registers and addressing.  When I did this port originally, I mostlyh 
hid the accumulator
from the register allocator.  But I did implement extended precision 
arithmetic as a pattern that
optimized use of the accumulator.

We got pretty good code generated.  There's a pretty complete TCP/IP 
stack implemented for this chip (it's basically an internet toaster 
controller) and this is a machine with (IIRC) only 16k words of 
instruction and 4K BYTES of
data (and that's banked memory at that!).

On Mar 25, 2005, at 08:13, <tm_gccmail@kloo.net> wrote:

> On 22 Mar 2005, Ian Lance Taylor wrote:
>
>> Miles Bader <miles@lsi.nec.co.jp> writes:
>>
>>> I've defined SECONDARY_*_RELOAD_CLASS (and PREFERRED_* to try to help
>>> things along), and am now running into more understandable reload
>>> problems:  "unable to find a register to spill in class"  :-/
>>>
>>> The problem, as I understand, is that reload doesn't deal with 
>>> conflicts
>>> between secondary and primary reloads -- which are common with my 
>>> arch
>>> because it's an accumulator architecture.
>>>
>>> For instance, slightly modifying my previous example:
>>>
>>>    Say I've got a mov instruction that only works via an accumulator 
>>> A,
>>>    and a two-operand add instruction.  "r" regclass includes regs 
>>> A,X,Y,
>>>    and "a" regclass only includes reg A.
>>>
>>>    mov has constraints like:     0 = "g,a"   1 = "a,gi"
>>>    and add3 has constraints:     0 = "a"     1 = "0"    2 = "ri" 
>>> (say)
>>>
>>> So if before reload you've got an instruction like:
>>>
>>>    add temp, [sp + 4], [sp + 6]
>>>
>>> and v2 and v3 are in memory, it will have to have generate something 
>>> like:
>>>
>>>    mov A, [sp + 4]    ; primary reload 1 in X, with secondary reload 
>>> 0 A
>>>    mov X, A           ;   ""
>>>    mov A, [sp + 6]    ; primary reload 2 in A, with no secondary 
>>> reload
>>>    add A, X
>>>    mov temp, A
>>>
>>> There's really only _one_ register that can be used for many 
>>> reloads, A.
>>
>> I don't think there is any way that reload can cope with this
>> directly.  reload would have to get a lot smarter about ordering the
>> reloads.
>>
>> Since you need the accumulator for so much, one approach you should
>> consider is not exposing the accumulator register until after reload.
>> You could do this by writing pretty much every insn as a
>> define_insn_and_split, with reload_completed as the split condition.
>> Then you split into code that uses the accumulator.  Your add
>> instruction permits you to add any two general registers, and you
>> split into moving one into the accumulator, doing the add, and moving
>> the result whereever it should go.  If you then split all the insns
>> before the postreload pass, perhaps the generated code won't even be
>> too horrible.
>>
>> Ian
>>
>
> This approach by itself has obvious problems.
>
> It will generate a lot of redundant moves to/from the accumulator 
> because
> the accumulator is exposed much too late.
>
> Consider the 3AC code:
>
> add i,j,k
> add k,l,m
>
> it will be broken down into:
>
> mov i,a
> add j,a
> mov a,k
> mov k,a
> add l,a
> mov a,m
>
> where the third and fourth instructions are basically redundant.
>
> I did a lot of processor architecture research about three years ago, 
> and
> I came to some interesting conclusions about accumulator architectures.
>
> Basically, with naive code generation, you will generate 3x as many
> instructions for an accumulator machine than for a 3AC machine.
>
> If you have a SWAP instruction so you can swap the accumulator with the
> index registers, then you can lower the instruction count penalty to 
> about
> 2x that of a 3AC machine. If you think about this for a while, the 
> reason
> will become readily apparent.
>
> In order to reach this 2x figure, it requires a good understanding of 
> how
> the data flows through the accumulator in an accumulator arch.
>
> Toshi
>
>
>
>
>
Alan Lehotsky - apl@carbondesignsystems.com
Carbon Design Systems, Inc

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-25 14:27       ` tm_gccmail
@ 2005-03-25 15:43         ` Ian Lance Taylor
  2005-06-02  5:56           ` Miles Bader
  2005-03-25 16:30         ` Alan Lehotsky
  1 sibling, 1 reply; 32+ messages in thread
From: Ian Lance Taylor @ 2005-03-25 15:43 UTC (permalink / raw)
  To: tm_gccmail; +Cc: Miles Bader, Bernd Schmidt, gcc

<tm_gccmail@kloo.net> writes:

> It will generate a lot of redundant moves to/from the accumulator because
> the accumulator is exposed much too late.
> 
> Consider the 3AC code:
> 
> add i,j,k
> add k,l,m
> 
> it will be broken down into:
> 
> mov i,a
> add j,a
> mov a,k
> mov k,a
> add l,a
> mov a,m
> 
> where the third and fourth instructions are basically redundant.

In the gcc context, the fourth instruction can be removed by the
reload CSE pass in postreload.c.  Still, obviously the generated code
is going to be bad.

I agree that gcc is not well designed to cope with an accumulator
architecture.  Reload can't cope.

Ian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-22 18:56     ` Ian Lance Taylor
@ 2005-03-25 14:27       ` tm_gccmail
  2005-03-25 15:43         ` Ian Lance Taylor
  2005-03-25 16:30         ` Alan Lehotsky
  0 siblings, 2 replies; 32+ messages in thread
From: tm_gccmail @ 2005-03-25 14:27 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Miles Bader, Bernd Schmidt, gcc

On 22 Mar 2005, Ian Lance Taylor wrote:

> Miles Bader <miles@lsi.nec.co.jp> writes:
> 
> > I've defined SECONDARY_*_RELOAD_CLASS (and PREFERRED_* to try to help
> > things along), and am now running into more understandable reload
> > problems:  "unable to find a register to spill in class"  :-/
> > 
> > The problem, as I understand, is that reload doesn't deal with conflicts
> > between secondary and primary reloads -- which are common with my arch
> > because it's an accumulator architecture.
> > 
> > For instance, slightly modifying my previous example:
> > 
> >    Say I've got a mov instruction that only works via an accumulator A,
> >    and a two-operand add instruction.  "r" regclass includes regs A,X,Y,
> >    and "a" regclass only includes reg A.
> > 
> >    mov has constraints like:     0 = "g,a"   1 = "a,gi"
> >    and add3 has constraints:     0 = "a"     1 = "0"    2 = "ri" (say)
> > 
> > So if before reload you've got an instruction like:
> > 
> >    add temp, [sp + 4], [sp + 6]
> > 
> > and v2 and v3 are in memory, it will have to have generate something like:
> > 
> >    mov A, [sp + 4]    ; primary reload 1 in X, with secondary reload 0 A
> >    mov X, A           ;   ""
> >    mov A, [sp + 6]    ; primary reload 2 in A, with no secondary reload
> >    add A, X
> >    mov temp, A
> > 
> > There's really only _one_ register that can be used for many reloads, A.
> 
> I don't think there is any way that reload can cope with this
> directly.  reload would have to get a lot smarter about ordering the
> reloads.
> 
> Since you need the accumulator for so much, one approach you should
> consider is not exposing the accumulator register until after reload.
> You could do this by writing pretty much every insn as a
> define_insn_and_split, with reload_completed as the split condition.
> Then you split into code that uses the accumulator.  Your add
> instruction permits you to add any two general registers, and you
> split into moving one into the accumulator, doing the add, and moving
> the result whereever it should go.  If you then split all the insns
> before the postreload pass, perhaps the generated code won't even be
> too horrible.
> 
> Ian
> 

This approach by itself has obvious problems.

It will generate a lot of redundant moves to/from the accumulator because
the accumulator is exposed much too late.

Consider the 3AC code:

add i,j,k
add k,l,m

it will be broken down into:

mov i,a
add j,a
mov a,k
mov k,a
add l,a
mov a,m

where the third and fourth instructions are basically redundant.

I did a lot of processor architecture research about three years ago, and
I came to some interesting conclusions about accumulator architectures.

Basically, with naive code generation, you will generate 3x as many
instructions for an accumulator machine than for a 3AC machine.

If you have a SWAP instruction so you can swap the accumulator with the
index registers, then you can lower the instruction count penalty to about
2x that of a 3AC machine. If you think about this for a while, the reason
will become readily apparent.

In order to reach this 2x figure, it requires a good understanding of how
the data flows through the accumulator in an accumulator arch.

Toshi




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-18 10:35   ` Miles Bader
  2005-03-18 14:04     ` Miles Bader
@ 2005-03-22 18:56     ` Ian Lance Taylor
  2005-03-25 14:27       ` tm_gccmail
  1 sibling, 1 reply; 32+ messages in thread
From: Ian Lance Taylor @ 2005-03-22 18:56 UTC (permalink / raw)
  To: Miles Bader; +Cc: Bernd Schmidt, gcc

Miles Bader <miles@lsi.nec.co.jp> writes:

> I've defined SECONDARY_*_RELOAD_CLASS (and PREFERRED_* to try to help
> things along), and am now running into more understandable reload
> problems:  "unable to find a register to spill in class"  :-/
> 
> The problem, as I understand, is that reload doesn't deal with conflicts
> between secondary and primary reloads -- which are common with my arch
> because it's an accumulator architecture.
> 
> For instance, slightly modifying my previous example:
> 
>    Say I've got a mov instruction that only works via an accumulator A,
>    and a two-operand add instruction.  "r" regclass includes regs A,X,Y,
>    and "a" regclass only includes reg A.
> 
>    mov has constraints like:     0 = "g,a"   1 = "a,gi"
>    and add3 has constraints:     0 = "a"     1 = "0"    2 = "ri" (say)
> 
> So if before reload you've got an instruction like:
> 
>    add temp, [sp + 4], [sp + 6]
> 
> and v2 and v3 are in memory, it will have to have generate something like:
> 
>    mov A, [sp + 4]    ; primary reload 1 in X, with secondary reload 0 A
>    mov X, A           ;   ""
>    mov A, [sp + 6]    ; primary reload 2 in A, with no secondary reload
>    add A, X
>    mov temp, A
> 
> There's really only _one_ register that can be used for many reloads, A.

I don't think there is any way that reload can cope with this
directly.  reload would have to get a lot smarter about ordering the
reloads.

Since you need the accumulator for so much, one approach you should
consider is not exposing the accumulator register until after reload.
You could do this by writing pretty much every insn as a
define_insn_and_split, with reload_completed as the split condition.
Then you split into code that uses the accumulator.  Your add
instruction permits you to add any two general registers, and you
split into moving one into the accumulator, doing the add, and moving
the result whereever it should go.  If you then split all the insns
before the postreload pass, perhaps the generated code won't even be
too horrible.

Ian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-18 10:35   ` Miles Bader
@ 2005-03-18 14:04     ` Miles Bader
  2005-03-22 18:56     ` Ian Lance Taylor
  1 sibling, 0 replies; 32+ messages in thread
From: Miles Bader @ 2005-03-18 14:04 UTC (permalink / raw)
  To: gcc

BTW, if anybody replies, could you keep me in the CC: header?

I do read this list, but it won't be convenient in the next few days.

Thanks,

-Miles
-- 
.Numeric stability is probably not all that important when you're guessing.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-16 14:58 ` Bernd Schmidt
@ 2005-03-18 10:35   ` Miles Bader
  2005-03-18 14:04     ` Miles Bader
  2005-03-22 18:56     ` Ian Lance Taylor
  0 siblings, 2 replies; 32+ messages in thread
From: Miles Bader @ 2005-03-18 10:35 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc

Bernd Schmidt <bernds_cb1@t-online.de> writes:
> Reload insns aren't themselves reloaded.  You should look at the 
> SECONDARY_*_RELOAD_CLASS; they'll probably let you do what you want.

Ah, thank you!

I've defined SECONDARY_*_RELOAD_CLASS (and PREFERRED_* to try to help
things along), and am now running into more understandable reload
problems:  "unable to find a register to spill in class"  :-/

The problem, as I understand, is that reload doesn't deal with conflicts
between secondary and primary reloads -- which are common with my arch
because it's an accumulator architecture.

For instance, slightly modifying my previous example:

   Say I've got a mov instruction that only works via an accumulator A,
   and a two-operand add instruction.  "r" regclass includes regs A,X,Y,
   and "a" regclass only includes reg A.

   mov has constraints like:     0 = "g,a"   1 = "a,gi"
   and add3 has constraints:     0 = "a"     1 = "0"    2 = "ri" (say)

So if before reload you've got an instruction like:

   add temp, [sp + 4], [sp + 6]

and v2 and v3 are in memory, it will have to have generate something like:

   mov A, [sp + 4]    ; primary reload 1 in X, with secondary reload 0 A
   mov X, A           ;   ""
   mov A, [sp + 6]    ; primary reload 2 in A, with no secondary reload
   add A, X
   mov temp, A

There's really only _one_ register that can be used for many reloads, A.

The problem is that reload doesn't seem to be able to produce this kind of
output:  if it chooses A as a primary reload (common, as most insns use A as
a first operand), reload will think it conflicts with secondary reloads that
also use A (when it really needn't, as the secondary reloads only use A
"temporarily").  This is particularly bad with RELOAD_OTHER reloads, as

I kludged around this to some degree by changing `reload_conflicts'
(reload1.c) to always think secondary reloads _don't_ conflict [see patch1].

As that will fail in the case where a primary reload is loaded before a
secondary reload using the same register, I _also_ modified
`emit_reload_insns' to sort the order in which operand reloads are output so
that an operand who's secondary reload interferes with another operand's
primary reload is always loaded first.

However I think this is not guaranteed to always work -- certainly merely
disregarding conflicts with secondary reloads will fail for architectures
which are slightly less anemic, say with _two_ accumulators... :_)

Does anybody have a hint for a way to solve this problem?
Reload is very confusing...

Thanks,

-Miles


===== patch1 =====

--- gcc-3.4.3/gcc/reload1.c	2004-05-02 21:37:17.000000000 +0900
+++ gcc-3.4.3-supk0-20050317/gcc/reload1.c	2005-03-17 19:49:35.935534000 +0900
@@ -1680,7 +1688,7 @@           find_reg (struct insn_chain *chain, int order)
     {
       int other = reload_order[k];
 
-      if (rld[other].regno >= 0 && reloads_conflict (other, rnum))
+      if (rld[other].regno >= 0 && reloads_conflict (other, rnum, 0))
 	for (j = 0; j < rld[other].nregs; j++)
 	  SET_HARD_REG_BIT (used_by_other_reload, rld[other].regno + j);
     }
@@ -4601,18 +4609,25 @@
 }
 \f
 /* Return 1 if the reloads denoted by R1 and R2 cannot share a register.
-   Return 0 otherwise.
+   Return 0 otherwise.  If SECONDARIES_CAN_CONFLICT is zero, secondary
+   reloads are considered never to conflict; otherwise they are treated
+   normally.
 
    This function uses the same algorithm as reload_reg_free_p above.  */
 
 int
-reloads_conflict (int r1, int r2)
+reloads_conflict (int r1, int r2, int secondaries_can_conflict)
 {
   enum reload_type r1_type = rld[r1].when_needed;
   enum reload_type r2_type = rld[r2].when_needed;
   int r1_opnum = rld[r1].opnum;
   int r2_opnum = rld[r2].opnum;
 
+  /* Secondary reloads need not conflict with anything.  */
+  if (!secondaries_can_conflict
+      && (rld[r1].secondary_p || rld[r2].secondary_p))
+    return 0;
+
   /* RELOAD_OTHER conflicts with everything.  */
   if (r2_type == RELOAD_OTHER)
     return 1;



===== patch2 =====

--- gcc-3.4.3/gcc/reload1.c	2004-05-02 21:37:17.000000000 +0900
+++ gcc-3.4.3-supk0-20050317/gcc/reload1.c	2005-03-17 19:49:35.935534000 +0900
@@ -6951,6 +6966,51 @@          emit_reload_insns (struct insn_chain *chain)
       do_output_reload (chain, rld + j, j);
     }
 
+#ifdef SECONDARY_INPUT_RELOAD_CLASS
+  for (j = 0; j < reload_n_operands; j++)
+    opnum_emit_pos[j] = emit_pos_opnum[j] = j;
+
+  /*  Order the operands to avoid conflicts between the primary reload of
+      one operand and a secondary reload in another operand (which we
+      ignored before).  XXX this only works for input reloads!!  */
+  for (j = 0; j < n_reloads; j++)
+    if (rld[j].secondary_p)
+      /* This is a secondary reload; see if it should go before something
+	 else.  */
+      {
+	int i;
+	int this_opnum = rld[j].opnum;
+	int cur_emit_pos = opnum_emit_pos[this_opnum];
+	int new_emit_pos = cur_emit_pos;
+
+	for (i = j; i >= 0; i--)
+	  if (opnum_emit_pos[rld[i].opnum] < new_emit_pos
+	      && !rld[i].secondary_p
+	      && reloads_conflict (j, i, 1))
+	    new_emit_pos = opnum_emit_pos[rld[i].opnum];
+
+	if (new_emit_pos < cur_emit_pos)
+	  {
+	    for (i = cur_emit_pos; i > new_emit_pos; i--)
+	      emit_pos_opnum[i] = emit_pos_opnum[i - 1];
+	    emit_pos_opnum[new_emit_pos] = this_opnum;
+
+	    for (i = 0; i < reload_n_operands; i++)
+	      opnum_emit_pos[emit_pos_opnum[i]] = i;
+	  }
+      }
+
+  fprintf (stderr, "========================================\n");
+  fprintf (stderr, "emit_pos_opnum =");
+  for (j = 0; j < reload_n_operands; j++)
+    fprintf (stderr, " %d", emit_pos_opnum[j]);
+  putc ('\n', stderr);
+  fprintf (stderr, "opnum_emit_pos =");
+  for (j = 0; j < reload_n_operands; j++)
+    fprintf (stderr, " %d", opnum_emit_pos[j]);
+  putc ('\n', stderr);
+#endif
+
   /* Now write all the insns we made for reloads in the order expected by
      the allocation functions.  Prior to the insn being reloaded, we write
      the following reloads:
@@ -6975,25 +7035,27 @@
      reloads for the operand.  The RELOAD_OTHER output reloads are
      output in descending order by reload number.  */
 
-  emit_insn_before (other_input_address_reload_insns, insn);
-  emit_insn_before (other_input_reload_insns, insn);
-
   for (j = 0; j < reload_n_operands; j++)
     {
-      emit_insn_before (inpaddr_address_reload_insns[j], insn);
-      emit_insn_before (input_address_reload_insns[j], insn);
-      emit_insn_before (input_reload_insns[j], insn);
+      int emit = emit_pos_opnum[j];
+      emit_insn_before (inpaddr_address_reload_insns[emit], insn);
+      emit_insn_before (input_address_reload_insns[emit], insn);
+      emit_insn_before (input_reload_insns[emit], insn);
     }
 
+  emit_insn_before (other_input_address_reload_insns, insn);
+  emit_insn_before (other_input_reload_insns, insn);
+
   emit_insn_before (other_operand_reload_insns, insn);
   emit_insn_before (operand_reload_insns, insn);
 
   for (j = 0; j < reload_n_operands; j++)
     {
-      rtx x = emit_insn_after (outaddr_address_reload_insns[j], insn);
-      x = emit_insn_after (output_address_reload_insns[j], x);
-      x = emit_insn_after (output_reload_insns[j], x);
-      emit_insn_after (other_output_reload_insns[j], x);
+      int emit = emit_pos_opnum[j];
+      rtx x = emit_insn_after (outaddr_address_reload_insns[emit], insn);
+      x = emit_insn_after (output_address_reload_insns[emit], x);
+      x = emit_insn_after (output_reload_insns[emit], x);
+      emit_insn_after (other_output_reload_insns[emit], x);
     }
 
   /* For all the spill regs newly reloaded in this instruction,


-- 
"An atheist doesn't have to be someone who thinks he has a proof that there
can't be a god.  He only has to be someone who believes that the evidence
on the God question is at a similar level to the evidence on the werewolf
question."  [John McCarthy]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: reload question
  2005-03-16 10:08 Miles Bader
@ 2005-03-16 14:58 ` Bernd Schmidt
  2005-03-18 10:35   ` Miles Bader
  0 siblings, 1 reply; 32+ messages in thread
From: Bernd Schmidt @ 2005-03-16 14:58 UTC (permalink / raw)
  To: Miles Bader; +Cc: gcc

Miles Bader wrote:
> Say I've got a mov instruction that only works via an accumulator A, and
> a two-operand add instruction.  "r" regclass includes regs A,X,Y, and
> "a" regclass only includes reg A.
> 
> So mov has constraints like:  0 = "g,a"   1 = "a,gi"
> and add3 has constraints:     0 = "r"     1 = "0"    2 = "i" (say)
> 
> Then if there's an insn before reload like:
> 
>    add3  X, Y, 1
> 
> Reload will notice this, and generate a reload to put Y into X, so
> you'll end up with:
> 
>    mov   X, Y
>    add3  X, X, 1

> Now, what I _expected_ to happen is that reload would then be repeated
> to handle the newly added mov insn, which would see that the mov insn's
> constraint don't match, and would then generate an additional reload to
> yield:
> 
>    mov   A, Y
>    mov   X, A
>    add3  X, X, 1
> 
> but in fact this final step doesn't seem to be happening -- it just
> finishes reload without changing the bogus "mov X, Y" insn and goes on
> to die with an "insn does not satisfy its constraints:" error in final.

Reload insns aren't themselves reloaded.  You should look at the 
SECONDARY_*_RELOAD_CLASS; they'll probably let you do what you want.


Bernd

^ permalink raw reply	[flat|nested] 32+ messages in thread

* reload question
@ 2005-03-16 10:08 Miles Bader
  2005-03-16 14:58 ` Bernd Schmidt
  0 siblings, 1 reply; 32+ messages in thread
From: Miles Bader @ 2005-03-16 10:08 UTC (permalink / raw)
  To: gcc

Hi,

I'm writing a new gcc port, and having problems making reload work.

Say I've got a mov instruction that only works via an accumulator A, and
a two-operand add instruction.  "r" regclass includes regs A,X,Y, and
"a" regclass only includes reg A.

So mov has constraints like:  0 = "g,a"   1 = "a,gi"
and add3 has constraints:     0 = "r"     1 = "0"    2 = "i" (say)

Then if there's an insn before reload like:

   add3  X, Y, 1

Reload will notice this, and generate a reload to put Y into X, so
you'll end up with:

   mov   X, Y
   add3  X, X, 1

That seems to be happening correctly.

Now, what I _expected_ to happen is that reload would then be repeated
to handle the newly added mov insn, which would see that the mov insn's
constraint don't match, and would then generate an additional reload to
yield:

   mov   A, Y
   mov   X, A
   add3  X, X, 1

but in fact this final step doesn't seem to be happening -- it just
finishes reload without changing the bogus "mov X, Y" insn and goes on
to die with an "insn does not satisfy its constraints:" error in final.

I've been trying to track through reload seeing what it's doing, but my
mind is being twisted into little knots, and I'd really like to at least
know if what I think should be happening is accurate.

Any hints?

Thanks,

-Miles
-- 
.Numeric stability is probably not all that important when you're guessing.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Reload question
  2003-03-19 12:51 Reload question Joern Rennecke
@ 2003-03-19 19:39 ` Eric Botcazou
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2003-03-19 19:39 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc

> This address looks suspicious to start with.  Where does it
> come from?

It comes from the combination during the reload pass of

(subreg:SI (reg:DF 770) 4))

and 

(set (reg:DF 770)
        (mem/s:DF (plus:SI (reg/f:SI 767)
                (reg:SI 769)) [16 dense.eden+0 S8 A64]))

> The only kind-of legitimate way I can think of
> is if you want to rematerialize reg+reg, and this has
> suceeded because both registers are function invariants.

I have to say that I don't fully understand this sentence.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Reload question
@ 2003-03-19 12:51 Joern Rennecke
  2003-03-19 19:39 ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Joern Rennecke @ 2003-03-19 12:51 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

> Sparc has a non-offsettable REG+REG addressing mode. Now the compiler is
> unable to reload the following address:

> (mem/s:SI (plus:SI (plus:SI (reg/f:SI 22 %l6 [767])
>             (reg:SI 21 %l5 [769]))
>         (const_int 4 [0x4])) [16 dense.eden+0 S8 A64])

This address looks suspicious to start with.  Where does it
come from?  The only kind-of legitimate way I can think of
is if you want to rematerialize reg+reg, and this has
suceeded because both registers are function invariants.
Which doesn't really sound plausible.

-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Reload question
@ 2003-03-18  0:25 Eric Botcazou
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2003-03-18  0:25 UTC (permalink / raw)
  To: gcc

Hi,

This is related to high-priority PR target/7784 on Sparc.

Sparc has a non-offsettable REG+REG addressing mode. Now the compiler is 
unable to reload the following address:

(mem/s:SI (plus:SI (plus:SI (reg/f:SI 22 %l6 [767])
            (reg:SI 21 %l5 [769]))
        (const_int 4 [0x4])) [16 dense.eden+0 S8 A64])

find_reloads_address() doesn't know how to handle the form (and the Sparc 
specific LEGITIMIZE_RELOAD_ADDRESS doesn't help) so it propagates the 
problem to find_reloads_address_1().

Now find_reloads_address_1() has these lines in its header:

/* Note that we take shortcuts assuming that no multi-reg machine mode
   occurs as part of an address.
   Also, this is not fully machine-customizable; it works for machines
   such as VAXen and 68000's and 32000's, but other possible machines
   could have addressing modes that this does not handle right.  */

Hence the question: should I teach find_reloads_address_1() how to reload the 
Sparc address form or tweak LEGITIMIZE_RELOAD_ADDRESS instead (or is the 
question totally irrelevant to the problem) ?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  Reload question
  2002-07-26 15:01   ` Bernd Schmidt
@ 2002-07-26 16:35     ` Mark Mitchell
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Mitchell @ 2002-07-26 16:35 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Kenner, gcc



--On Friday, July 26, 2002 12:18:10 PM +0100 Bernd Schmidt 
<bernds@redhat.com> wrote:

> On Thu, 25 Jul 2002, Mark Mitchell wrote:
>
>> >     In reload, after doing register allocation, we do, in "reload()":
>> >
>> >     That mutates the instruction in place, with the possible result
>> >     that it is no longer recognizable.
>> >
>> >     Who is supposed to be responsible for making the instruction
>> >     recognizable again?
>> >
>> > Reload itself.  The insn won't match its constraints.
>>
>> But this happens *at the end* of reload.  Search for:
>>
>>    PUT_CODE (reg, MEM)
>>
>> in reload1.c.
>>
>> So, where in reload is the instruction supposed to be checked to see
>> that it is safe to make this transmogrification?
>
> find_reloads.  If the modification isn't safe, we'll make a reload for
> the operand and thus ensure that when we get to the code that modifies
> pseudos into MEMs, the transformation is correct.

Thanks; that helped.  I think I understand now what's wrong.

Thanks!

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  Reload question
  2002-07-26 13:35 ` Mark Mitchell
@ 2002-07-26 15:01   ` Bernd Schmidt
  2002-07-26 16:35     ` Mark Mitchell
  0 siblings, 1 reply; 32+ messages in thread
From: Bernd Schmidt @ 2002-07-26 15:01 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Richard Kenner, gcc

On Thu, 25 Jul 2002, Mark Mitchell wrote:

> >     In reload, after doing register allocation, we do, in "reload()":
> >
> >     That mutates the instruction in place, with the possible result that
> >     it is no longer recognizable.
> >
> >     Who is supposed to be responsible for making the instruction
> >     recognizable again?
> >
> > Reload itself.  The insn won't match its constraints.
> 
> But this happens *at the end* of reload.  Search for:
> 
>    PUT_CODE (reg, MEM)
> 
> in reload1.c.
> 
> So, where in reload is the instruction supposed to be checked to see
> that it is safe to make this transmogrification?

find_reloads.  If the modification isn't safe, we'll make a reload for
the operand and thus ensure that when we get to the code that modifies
pseudos into MEMs, the transformation is correct.


Bernd

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  Reload question
  2002-07-25 13:57 Richard Kenner
@ 2002-07-26 13:35 ` Mark Mitchell
  2002-07-26 15:01   ` Bernd Schmidt
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2002-07-26 13:35 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc



--On Thursday, July 25, 2002 02:21:23 PM +0000 Richard Kenner 
<kenner@vlsi1.ultra.nyu.edu> wrote:

>     In reload, after doing register allocation, we do, in "reload()":
>
>     That mutates the instruction in place, with the possible result that
>     it is no longer recognizable.
>
>     Who is supposed to be responsible for making the instruction
>     recognizable again?
>
> Reload itself.  The insn won't match its constraints.

But this happens *at the end* of reload.  Search for:

   PUT_CODE (reg, MEM)

in reload1.c.

So, where in reload is the instruction supposed to be checked to see
that it is safe to make this transmogrification?

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Reload question
  2002-07-25 13:47 Mark Mitchell
@ 2002-07-25 16:19 ` Bernd Schmidt
  0 siblings, 0 replies; 32+ messages in thread
From: Bernd Schmidt @ 2002-07-25 16:19 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Thu, 25 Jul 2002, Mark Mitchell wrote:

> 
> In reload, after doing register allocation, we do, in "reload()":
> 
>   if (reg_renumber[i] < 0)
>     {
>       rtx reg = regno_reg_rtx[i];
> 
>       PUT_CODE (reg, MEM);
>       XEXP (reg, 0) = addr;
>       REG_USERVAR_P (reg) = 0;
>       ...
>     }
> 
> That mutates the instruction in place, with the possible result that
> it is no longer recognizable.
> 
> Who is supposed to be responsible for making the instruction
> recognizable again?

If reload works properly, the insn will be recognizable after this.  If
it isn't the bug is someplace earlier.


Bernd

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  Reload question
@ 2002-07-25 13:57 Richard Kenner
  2002-07-26 13:35 ` Mark Mitchell
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2002-07-25 13:57 UTC (permalink / raw)
  To: mark; +Cc: gcc

    In reload, after doing register allocation, we do, in "reload()":

    That mutates the instruction in place, with the possible result that
    it is no longer recognizable.

    Who is supposed to be responsible for making the instruction
    recognizable again?

Reload itself.  The insn won't match its constraints.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Reload question
@ 2002-07-25 13:47 Mark Mitchell
  2002-07-25 16:19 ` Bernd Schmidt
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2002-07-25 13:47 UTC (permalink / raw)
  To: gcc


In reload, after doing register allocation, we do, in "reload()":

  if (reg_renumber[i] < 0)
    {
      rtx reg = regno_reg_rtx[i];

      PUT_CODE (reg, MEM);
      XEXP (reg, 0) = addr;
      REG_USERVAR_P (reg) = 0;
      ...
    }

That mutates the instruction in place, with the possible result that
it is no longer recognizable.

Who is supposed to be responsible for making the instruction
recognizable again?

Thanks,

-- 
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* reload question
@ 2002-01-27  4:34 Chris Lattner
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Lattner @ 2002-01-27  4:34 UTC (permalink / raw)
  To: gcc

I'm fixing some bugs in a test backend I wrote for GCC, and am hitting a
few snags in the reload phase.  Specifically, GCC isn't able to reload:

(insn 29 28 31 (set (reg:SI 8 %o0 [113])
        (zero_extend:SI (subreg:QI (reg/v:SI 108) 0))) 19 {zero_extendqisi2} (nil)
    (expr_list:REG_EQUIV (mem/f:SI (reg/f:PDI 14 %sp) 0)
        (expr_list:REG_DEAD (reg/v:SI 108)
            (nil))))

and a small set of other cases, involving (zero|sign) extending subregs of
registers.

Can anyone give me a hint as to what could cause this?  I have the
following instruction pattern:

(define_insn "zero_extendqisi2"
  [(set (match_operand:SI 0 "register_operand" "=r")
	(zero_extend:SI (match_operand:QI 1 "register_operand" "")))]
  ""
  "zero_extendqisi2 %1 -> %0")

Which is obviously matched by the labeler, and combined by the combiner...
but reload isn't able to split the combined instruction into something
like this:

(set (reg:QI %reg:QI 1000)
        (subreg:QI (reg/v:SI 108)))

(set (reg:SI 8 %o0 [113])
        (zero_extend:SI (reg:QI 1000)))

Is there something that I'm missing that enables such transformations to
take place?  Is there someplace in the manual that I should reread to
understand this phase of GCC?  Do subreg's ever get turned into
truncate's?

Thanks for the help,

-Chris

http://www.nondot.org/sabre/os/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Reload question
  1997-09-29  5:34 Reload question Christian Iseli
@ 1997-09-29  8:41 ` Jeffrey A Law
  0 siblings, 0 replies; 32+ messages in thread
From: Jeffrey A Law @ 1997-09-29  8:41 UTC (permalink / raw)
  To: Christian Iseli; +Cc: egcs

  In message < 199709291233.OAA08589@lslsun17.epfl.ch >you write:
  >    Note that we take shortcuts assuming that no multi-reg machine mode
  >    occurs as part of an address.
  > 
  > YIKES!  Does it really mean that an address *must* occupy a single
  > hard register, i.e., that my 8-bit machine must pretend that it
  > has 16-bit registers ???
That's what the comment seems to imply (assuming that addresses are
16bits on your machine and basic register sizes are 8 bits).

Comments are sometimes out of date or misleading, but in this case
I suspect it's correct -- I can't think of a machine supported by
gcc where an address spans more than a single register.

  > If so, how hard would that be to "remove" those shortcuts?
No clue.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Reload question
@ 1997-09-29  5:34 Christian Iseli
  1997-09-29  8:41 ` Jeffrey A Law
  0 siblings, 1 reply; 32+ messages in thread
From: Christian Iseli @ 1997-09-29  5:34 UTC (permalink / raw)
  To: egcs

Hi folks,

Today I delved into the source code of the reload pass...

At the start of find_reload_address (and some other related routines) there
is a small comment saying:

   Note that we take shortcuts assuming that no multi-reg machine mode
   occurs as part of an address.

YIKES!  Does it really mean that an address *must* occupy a single hard register,
i.e., that my 8-bit machine must pretend that it has 16-bit registers ???

If so, how hard would that be to "remove" those shortcuts?

					Christian

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2011-08-11 16:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-11 16:49 reload question Hari Sandanagobalane
  -- strict thread matches above, loose matches on Subject: below --
2010-06-23 17:41 Alex Turjan
2010-06-23 18:30 ` Jeff Law
2010-06-23 21:29   ` Alex Turjan
2010-06-23 21:43     ` Jeff Law
2010-06-24  3:44 ` Joern Rennecke
2007-08-10 20:02 Pat Haugen
2007-08-11  0:18 ` Ian Lance Taylor
2007-08-13 14:55   ` Pat Haugen
2007-08-13 17:42     ` Ian Lance Taylor
2007-08-11 11:45 ` Paolo Bonzini
2005-03-16 10:08 Miles Bader
2005-03-16 14:58 ` Bernd Schmidt
2005-03-18 10:35   ` Miles Bader
2005-03-18 14:04     ` Miles Bader
2005-03-22 18:56     ` Ian Lance Taylor
2005-03-25 14:27       ` tm_gccmail
2005-03-25 15:43         ` Ian Lance Taylor
2005-06-02  5:56           ` Miles Bader
2005-03-25 16:30         ` Alan Lehotsky
2003-03-19 12:51 Reload question Joern Rennecke
2003-03-19 19:39 ` Eric Botcazou
2003-03-18  0:25 Eric Botcazou
2002-07-25 13:57 Richard Kenner
2002-07-26 13:35 ` Mark Mitchell
2002-07-26 15:01   ` Bernd Schmidt
2002-07-26 16:35     ` Mark Mitchell
2002-07-25 13:47 Mark Mitchell
2002-07-25 16:19 ` Bernd Schmidt
2002-01-27  4:34 reload question Chris Lattner
1997-09-29  5:34 Reload question Christian Iseli
1997-09-29  8:41 ` Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).