public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* [CR16] Improper register allocation
@ 2011-11-20  0:48 Jayant R. Sonar
  2011-11-20  5:32 ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Jayant R. Sonar @ 2011-11-20  0:48 UTC (permalink / raw)
  To: gcc-help

I am working on CR16 port.

Found an improper code is being generated for following part of the test 
code:
============================================================================
IeData     = (void *) RAlloc(DataLen);
LAHType* p = (LAHType*)IeData;

ReadFH (DpEntry, &FeData, &p->Table, &p->Masked, &p->NumAH, &p->Len, TData);
============================================================================

Arguments are pushed on to the stack from right to left direction.
Possible working assembly that could be generated for this code:
Case 1:
================================================================
       bal (ra), _RAlloc@c
       stord	(r1,r0), 32(sp)
       addd	$-2, (sp)
.LCFI547:
       movd	$18, (r7,r6)
       addd	(sp), (r7,r6)
       push	$2,r6
.LCFI548:
       addd	$3, (r1,r0)
       push	$2,r0

OR lesser optimized code could be:
Case 2:
============================================================
       bal (ra), _RAlloc@c
       stord	(r1,r0), 32(sp)
       addd	$-2, (sp)
.LCFI547:
       movd	$18, (r7,r6)
       addd	(sp), (r7,r6)
       push	$2,r6
.LCFI548:
       loadd	32(sp), (r1,r0)
       addd	$3, (r1,r0)
       push	$2,r0

However, I am getting following code:
============================================================
       bal (ra), _RAlloc@c
       stord	(r1,r0), 20(sp)
       addd	$-2, (sp)
.LCFI547:
       movd	$18, (r2,r1)
       addd	(sp), (r2,r1)
       push	$2,r1	  
.LCFI548:
       addd	$3, (r1,r0)	<<============== PROBLEM
       push	$2,r0

'R1' being the part of first push already has some value in it. So before 
next push, either (r1,r0) should be loaded with new address value as shown 
in Case 2. Otherwise, as in Case 1, different register pairs should be used 
for these two push operations.

CR16 port did not have REG_ALLOC_ORDER defined.
I am trying to control register allocation order using this. However, it 
looks like this sequence has to be decided very carefully as slightest of 
the alternation in sequence are causing some or other application failures.

Is there any other way, I could address this problem?

Thanks and Regards,
Jayant Sonar
[KPIT Cummins, Pune]


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [CR16] Improper register allocation
  2011-11-20  0:48 [CR16] Improper register allocation Jayant R. Sonar
@ 2011-11-20  5:32 ` Ian Lance Taylor
  2011-11-20 15:53   ` Jayant R. Sonar
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2011-11-20  5:32 UTC (permalink / raw)
  To: Jayant R. Sonar; +Cc: gcc-help

"Jayant R. Sonar" <Jayant.Sonar@kpitcummins.com> writes:

> I am working on CR16 port.
>
> Found an improper code is being generated for following part of the test 
> code:
> ============================================================================
> IeData     = (void *) RAlloc(DataLen);
> LAHType* p = (LAHType*)IeData;
>
> ReadFH (DpEntry, &FeData, &p->Table, &p->Masked, &p->NumAH, &p->Len, TData);
> ============================================================================
>
> Arguments are pushed on to the stack from right to left direction.
> Possible working assembly that could be generated for this code:
> Case 1:
> ================================================================
>        bal (ra), _RAlloc@c
>        stord	(r1,r0), 32(sp)
>        addd	$-2, (sp)
> .LCFI547:
>        movd	$18, (r7,r6)
>        addd	(sp), (r7,r6)
>        push	$2,r6
> .LCFI548:
>        addd	$3, (r1,r0)
>        push	$2,r0
>
> OR lesser optimized code could be:
> Case 2:
> ============================================================
>        bal (ra), _RAlloc@c
>        stord	(r1,r0), 32(sp)
>        addd	$-2, (sp)
> .LCFI547:
>        movd	$18, (r7,r6)
>        addd	(sp), (r7,r6)
>        push	$2,r6
> .LCFI548:
>        loadd	32(sp), (r1,r0)
>        addd	$3, (r1,r0)
>        push	$2,r0
>
> However, I am getting following code:
> ============================================================
>        bal (ra), _RAlloc@c
>        stord	(r1,r0), 20(sp)
>        addd	$-2, (sp)
> .LCFI547:
>        movd	$18, (r2,r1)
>        addd	(sp), (r2,r1)
>        push	$2,r1	  
> .LCFI548:
>        addd	$3, (r1,r0)	<<============== PROBLEM
>        push	$2,r0
>
> 'R1' being the part of first push already has some value in it. So before 
> next push, either (r1,r0) should be loaded with new address value as shown 
> in Case 2. Otherwise, as in Case 1, different register pairs should be used 
> for these two push operations.
>
> CR16 port did not have REG_ALLOC_ORDER defined.
> I am trying to control register allocation order using this. However, it 
> looks like this sequence has to be decided very carefully as slightest of 
> the alternation in sequence are causing some or other application failures.
>
> Is there any other way, I could address this problem?


I don't know this syntax.  Are you saying that the instruction
        push	$2,r1
modifies r1?  Or are you saying that that the instruction
        addd	$3, (r1,r0)
may modify r1, and therefore must be executed before the push
instruction?

Either way, REG_ALLOC_ORDER is not an issue here.  You need to look at
the RTL dumps.  See whether the original call sequence in the first RTL
dump is valid, using pseudo-registers.  Then find out where it becomes
invalid.

One thing to consider is whether the push instruction is particularly
efficient on your processor.  If it is not--if it is comparable to a mov
instruction to an offset from sp--then consider defining
ACCUMULATE_OUTGOING_ARGS rather than PUSH_ARGS.  Most processors get
more efficient code from ACCUMULATE_OUTGOING_ARGS.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [CR16] Improper register allocation
  2011-11-20  5:32 ` Ian Lance Taylor
@ 2011-11-20 15:53   ` Jayant R. Sonar
  2011-11-21  5:03     ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Jayant R. Sonar @ 2011-11-20 15:53 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

[-- Attachment #1: Type: text/plain, Size: 6143 bytes --]

Hello Ian,

Thank you for the valuable inputs.

>> Or are you saying that that the instruction
>>         addd	$3, (r1,r0)
>> may modify r1, and therefore must be executed before the push
>> instruction?

No, the sequence of the instructions is correct here.
What is expected here is (r1,r0) should be loaded with an address value 
before being added by 3.

I looked into the RTL dumps and found the original call sequence, where 
(r1,r0) is loaded with appropriate address value before adding 3, is valid 
up to IRA phase. The reload phase is optimizing out the important load 
instruction (movsi_long).

Dump before 'postreload' i.e. at 'ira':
===========================================================================
(insn 1613 2944 2945 65 Convert.c:707 
(set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg:SI 1 r1)) 61 {pushsi_internal} (nil))

(insn 2945 1613 1614 65 Convert.c:707 
(set (reg/f:SI 0 r0 [1056])
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(insn 1614 2945 1615 65 Convert.c:707 
(set (reg/f:SI 0 r0 [1056])
        (plus:SI (reg/f:SI 0 r0 [1056])
            (const_int 3 [0x3]))) 1 {addsi3} (nil))


Dump of 'postreload':
===========================================================================
(insn 1613 2944 1614 65 Convert.c:707 
(set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg:SI 1 r1)) 61 {pushsi_internal} (nil))

(insn 1614 1613 1615 65 Convert.c:707 
(set (reg/f:SI 0 r0 [1056])
        (plus:SI (reg/f:SI 0 r0 [1056])
            (const_int 3 [0x3]))) 1 {addsi3} (nil))


The instruction 2945 is deleted during reload phase.
This instruction was loading an address into register pair (r1,r0). And next 
instruction 1614, adds an offset value 3 into this address. However, as 2945 
is deleted, (r1,r0) does not receive the expected address value before 3 is 
added into it by 1614.

Just for the reference, giving here again the inappropriate assembly code 
generated:

        addd    (sp), (r2,r1)
        push    $2,r1   <<<<<<<<<<<<<<<<<==================================
.LCFI548:                    Insn loading addr in (r1,r0) should be here
        addd    $3, (r1,r0)  <<<<<<<<<<<<<<<<<<<<==========================

The push command in CR16 is capable of pushing eight adjacent registers onto 
the stack. The count value in the instruction specifies the number of words 
to save.

So ACCUMULATE_OUTGOING_ARGS may not be efficient when multiple pushes are to
 be considered. 

I can not give the complete RTL dump files here. However 2 RTL dumps having 
part of the concerned code are attached with this mail. 

Thanks and Regards,
Jayant Sonar
[KPIT Cummins, Pune]


-----Original Message-----
From: Ian Lance Taylor [mailto:iant@google.com] 
Sent: Saturday, November 19, 2011 10:35 PM
To: Jayant R. Sonar
Cc: gcc-help@gcc.gnu.org
Subject: Re: [CR16] Improper register allocation

"Jayant R. Sonar" <Jayant.Sonar@kpitcummins.com> writes:

> I am working on CR16 port.
>
> Found an improper code is being generated for following part of the test 
> code:
> ============================================================================
> IeData     = (void *) RAlloc(DataLen);
> LAHType* p = (LAHType*)IeData;
>
> ReadFH (DpEntry, &FeData, &p->Table, &p->Masked, &p->NumAH, &p->Len, TData);
> ============================================================================
>
> Arguments are pushed on to the stack from right to left direction.
> Possible working assembly that could be generated for this code:
> Case 1:
> ================================================================
>        bal (ra), _RAlloc@c
>        stord	(r1,r0), 32(sp)
>        addd	$-2, (sp)
> .LCFI547:
>        movd	$18, (r7,r6)
>        addd	(sp), (r7,r6)
>        push	$2,r6
> .LCFI548:
>        addd	$3, (r1,r0)
>        push	$2,r0
>
> OR lesser optimized code could be:
> Case 2:
> ============================================================
>        bal (ra), _RAlloc@c
>        stord	(r1,r0), 32(sp)
>        addd	$-2, (sp)
> .LCFI547:
>        movd	$18, (r7,r6)
>        addd	(sp), (r7,r6)
>        push	$2,r6
> .LCFI548:
>        loadd	32(sp), (r1,r0)
>        addd	$3, (r1,r0)
>        push	$2,r0
>
> However, I am getting following code:
> ============================================================
>        bal (ra), _RAlloc@c
>        stord	(r1,r0), 20(sp)
>        addd	$-2, (sp)
> .LCFI547:
>        movd	$18, (r2,r1)
>        addd	(sp), (r2,r1)
>        push	$2,r1	  
> .LCFI548:
>        addd	$3, (r1,r0)	<<============== PROBLEM
>        push	$2,r0
>
> 'R1' being the part of first push already has some value in it. So before 
> next push, either (r1,r0) should be loaded with new address value as shown 
> in Case 2. Otherwise, as in Case 1, different register pairs should be used 
> for these two push operations.
>
> CR16 port did not have REG_ALLOC_ORDER defined.
> I am trying to control register allocation order using this. However, it 
> looks like this sequence has to be decided very carefully as slightest of 
> the alternation in sequence are causing some or other application failures.
>
> Is there any other way, I could address this problem?


I don't know this syntax.  Are you saying that the instruction
        push	$2,r1
modifies r1?  Or are you saying that that the instruction
        addd	$3, (r1,r0)
may modify r1, and therefore must be executed before the push
instruction?

Either way, REG_ALLOC_ORDER is not an issue here.  You need to look at
the RTL dumps.  See whether the original call sequence in the first RTL
dump is valid, using pseudo-registers.  Then find out where it becomes
invalid.

One thing to consider is whether the push instruction is particularly
efficient on your processor.  If it is not--if it is comparable to a mov
instruction to an offset from sp--then consider defining
ACCUMULATE_OUTGOING_ARGS rather than PUSH_ARGS.  Most processors get
more efficient code from ACCUMULATE_OUTGOING_ARGS.

Ian


[-- Attachment #2: Convert.c.188r.ira.txt --]
[-- Type: text/plain, Size: 5271 bytes --]

(call_insn 1604 1602 1605 65 RcHeap.h:351 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:QI (symbol_ref:SI ("RAlloc") [flags 0x41]  <function_decl 0xb7dfc980 RAlloc>) [0 S1 A8])
                    (const_int 0 [0x0])))
            (clobber (reg:SI 14 ra))
        ]) 96 {cr16_call_value_insn_branch} (nil)
    (expr_list:REG_DEP_TRUE (use (reg:HI 2 r2))
        (expr_list:REG_DEP_TRUE (use (reg:HI 3 r3))
            (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4))
                (nil)))))

(insn 1605 1604 1607 65 RcHeap.h:351 (set (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])
        (reg:SI 0 r0)) 67 {*movsi_long} (nil))

(insn 1607 1605 1609 65 Convert.c:704 (set (mem/f/c/i:SI (plus:SI (reg/f:SI 15 sp)
                (const_int 20 [0x14])) [7 IeData+0 S4 A32])
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(debug_insn 1609 1607 1610 65 Convert.c:706 (var_location:SI p (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) -1 (nil))

(insn 1610 1609 2943 65 Convert.c:707 (set (reg/f:SI 15 sp)
        (plus:SI (reg/f:SI 15 sp)
            (const_int -2 [0xfffffffe]))) 1 {addsi3} (nil))

(insn 2943 1610 2944 65 Convert.c:707 (set (reg:SI 1 r1)
        (const_int 18 [0x12])) 67 {*movsi_long} (nil))

(insn 2944 2943 1613 65 Convert.c:707 (set (reg:SI 1 r1)
        (plus:SI (reg:SI 1 r1)
            (reg/f:SI 15 sp))) 1 {addsi3} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 sp)
            (const_int 18 [0x12]))
        (nil)))

(insn 1613 2944 2945 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg:SI 1 r1)) 61 {pushsi_internal} (nil))

(insn 2945 1613 1614 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056])
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(insn 1614 2945 1615 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056])
        (plus:SI (reg/f:SI 0 r0 [1056])
            (const_int 3 [0x3]))) 1 {addsi3} (nil))

(insn 1615 1614 2946 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg/f:SI 0 r0 [1056])) 61 {pushsi_internal} (nil))

(insn 2946 1615 1616 65 Convert.c:707 (set (reg/f:SI 0 r0 [1057])
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(insn 1616 2946 1617 65 Convert.c:707 (set (reg/f:SI 0 r0 [1057])
        (plus:SI (reg/f:SI 0 r0 [1057])
            (const_int 2 [0x2]))) 1 {addsi3} (nil))

(insn 1617 1616 2947 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg/f:SI 0 r0 [1057])) 61 {pushsi_internal} (nil))

(insn 2947 1617 1618 65 Convert.c:707 (set (reg/f:SI 0 r0 [1058])
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(insn 1618 2947 1619 65 Convert.c:707 (set (reg/f:SI 0 r0 [1058])
        (plus:SI (reg/f:SI 0 r0 [1058])
            (const_int 1 [0x1]))) 1 {addsi3} (nil))

(insn 1619 1618 1620 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg/f:SI 0 r0 [1058])) 61 {pushsi_internal} (nil))

(insn 1620 1619 1621 65 Convert.c:707 (set (reg/f:SI 15 sp)
        (plus:SI (reg/f:SI 15 sp)
            (const_int -10 [0xfffffff6]))) 1 {addsi3} (nil))

(insn 1621 1620 1624 65 Convert.c:707 (set (reg/f:SI 2 r2 [1059])
        (reg/f:SI 15 sp)) 67 {*movsi_long} (nil))

(insn 1624 1621 1625 65 Convert.c:707 (set (reg:SI 0 r0 [1062])
        (const_int 9 [0x9])) 67 {*movsi_long} (expr_list:REG_EQUIV (const_int 9 [0x9])
        (nil)))

(insn 1625 1624 1626 65 Convert.c:707 (set (mem/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg:SI 0 r0 [1062])) 61 {pushsi_internal} (expr_list:REG_EQUAL (const_int 9 [0x9])
        (nil)))

(insn 1626 1625 1628 65 Convert.c:707 (set (reg:SI 4 r4)
        (reg/f:SI 8 r8 [1368])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13)
            (const_int 28 [0x1c]))
        (nil)))

(call_insn 1628 1626 1629 65 Convert.c:707 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:QI (symbol_ref:SI ("memcpy") [flags 0x41]  <function_decl 0xb75a7600 memcpy>) [0 S1 A8])
                    (const_int 4 [0x4])))
            (clobber (reg:SI 14 ra))
        ]) 96 {cr16_call_value_insn_branch} (expr_list:REG_EH_REGION (const_int 0 [0x0])
        (nil))
    (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2))
        (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4))
            (nil))))

(insn 1629 1628 1632 65 Convert.c:707 (set (reg/f:SI 15 sp)
        (plus:SI (reg/f:SI 15 sp)
            (const_int 4 [0x4]))) 1 {addsi3} (nil))

(insn 1632 1629 1633 65 Convert.c:707 (set (reg:SI 4 r4)
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(insn 1633 1632 1634 65 Convert.c:707 (set (reg:SI 2 r2)
        (reg/f:SI 10 r10 [1371])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13)
            (const_int 2 [0x2]))
        (nil)))

(call_insn 1634 1633 2948 65 Convert.c:707 (parallel [
            (call (mem:QI (symbol_ref:SI ("ReadFH") [flags 0x41]  <function_decl 0xb7852000 ReadFH>) [0 S1 A8])
                (const_int 28 [0x1c]))
            (clobber (reg:SI 14 ra))
        ]) 93 {cr16_call_insn_branch} (nil)
    (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2))
        (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4))
            (nil))))


[-- Attachment #3: Convert.c.189r.postreload.txt --]
[-- Type: text/plain, Size: 4899 bytes --]

(call_insn 1604 1602 1605 65 RcHeap.h:351 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:QI (symbol_ref:SI ("RAlloc") [flags 0x41]  <function_decl 0xb7dfc980 RAlloc>) [0 S1 A8])
                    (const_int 0 [0x0])))
            (clobber (reg:SI 14 ra))
        ]) 96 {cr16_call_value_insn_branch} (nil)
    (expr_list:REG_DEP_TRUE (use (reg:HI 2 r2))
        (expr_list:REG_DEP_TRUE (use (reg:HI 3 r3))
            (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4))
                (nil)))))

(insn 1605 1604 1607 65 RcHeap.h:351 (set (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])
        (reg:SI 0 r0)) 67 {*movsi_long} (nil))

(insn 1607 1605 1609 65 Convert.c:704 (set (mem/f/c/i:SI (plus:SI (reg/f:SI 15 sp)
                (const_int 20 [0x14])) [7 IeData+0 S4 A32])
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(debug_insn 1609 1607 1610 65 Convert.c:706 (var_location:SI p (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) -1 (nil))

(insn 1610 1609 2943 65 Convert.c:707 (set (reg/f:SI 15 sp)
        (plus:SI (reg/f:SI 15 sp)
            (const_int -2 [0xfffffffe]))) 1 {addsi3} (nil))

(insn 2943 1610 2944 65 Convert.c:707 (set (reg:SI 1 r1)
        (const_int 18 [0x12])) 67 {*movsi_long} (nil))

(insn 2944 2943 1613 65 Convert.c:707 (set (reg:SI 1 r1)
        (plus:SI (reg:SI 1 r1)
            (reg/f:SI 15 sp))) 1 {addsi3} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 sp)
            (const_int 18 [0x12]))
        (nil)))

(insn 1613 2944 1614 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg:SI 1 r1)) 61 {pushsi_internal} (nil))

(insn 1614 1613 1615 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056])
        (plus:SI (reg/f:SI 0 r0 [1056])
            (const_int 3 [0x3]))) 1 {addsi3} (nil))

(insn 1615 1614 1616 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg/f:SI 0 r0 [1056])) 61 {pushsi_internal} (nil))

(insn 1616 1615 1617 65 Convert.c:707 (set (reg/f:SI 0 r0 [1057])
        (plus:SI (reg/f:SI 0 r0 [1057])
            (const_int -1 [0xffffffff]))) 1 {addsi3} (nil))

(insn 1617 1616 1618 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg/f:SI 0 r0 [1057])) 61 {pushsi_internal} (nil))

(insn 1618 1617 1619 65 Convert.c:707 (set (reg/f:SI 0 r0 [1058])
        (plus:SI (reg/f:SI 0 r0 [1058])
            (const_int -1 [0xffffffff]))) 1 {addsi3} (nil))

(insn 1619 1618 1620 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg/f:SI 0 r0 [1058])) 61 {pushsi_internal} (nil))

(insn 1620 1619 1621 65 Convert.c:707 (set (reg/f:SI 15 sp)
        (plus:SI (reg/f:SI 15 sp)
            (const_int -10 [0xfffffff6]))) 1 {addsi3} (nil))

(insn 1621 1620 1624 65 Convert.c:707 (set (reg/f:SI 2 r2 [1059])
        (reg/f:SI 15 sp)) 67 {*movsi_long} (nil))

(insn 1624 1621 1625 65 Convert.c:707 (set (reg:SI 0 r0 [1062])
        (const_int 9 [0x9])) 67 {*movsi_long} (expr_list:REG_EQUIV (const_int 9 [0x9])
        (nil)))

(insn 1625 1624 1626 65 Convert.c:707 (set (mem/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
        (reg:SI 0 r0 [1062])) 61 {pushsi_internal} (expr_list:REG_EQUAL (const_int 9 [0x9])
        (nil)))

(insn 1626 1625 1628 65 Convert.c:707 (set (reg:SI 4 r4)
        (reg/f:SI 8 r8 [1368])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13)
            (const_int 28 [0x1c]))
        (nil)))

(call_insn 1628 1626 1629 65 Convert.c:707 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:QI (symbol_ref:SI ("memcpy") [flags 0x41]  <function_decl 0xb75a7600 memcpy>) [0 S1 A8])
                    (const_int 4 [0x4])))
            (clobber (reg:SI 14 ra))
        ]) 96 {cr16_call_value_insn_branch} (expr_list:REG_EH_REGION (const_int 0 [0x0])
        (nil))
    (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2))
        (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4))
            (nil))))

(insn 1629 1628 1632 65 Convert.c:707 (set (reg/f:SI 15 sp)
        (plus:SI (reg/f:SI 15 sp)
            (const_int 4 [0x4]))) 1 {addsi3} (nil))

(insn 1632 1629 1633 65 Convert.c:707 (set (reg:SI 4 r4)
        (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))

(insn 1633 1632 1634 65 Convert.c:707 (set (reg:SI 2 r2)
        (reg/f:SI 10 r10 [1371])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13)
            (const_int 2 [0x2]))
        (nil)))

(call_insn 1634 1633 2948 65 Convert.c:707 (parallel [
            (call (mem:QI (symbol_ref:SI ("LasReadFieldAttachedHandsets") [flags 0x41]  <function_decl 0xb7852000 LasReadFieldAttachedHandsets>) [0 S1 A8])
                (const_int 28 [0x1c]))
            (clobber (reg:SI 14 ra))
        ]) 93 {cr16_call_insn_branch} (nil)
    (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2))
        (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4))
            (nil))))


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [CR16] Improper register allocation
  2011-11-20 15:53   ` Jayant R. Sonar
@ 2011-11-21  5:03     ` Ian Lance Taylor
  2011-11-22  8:02       ` Jayant R. Sonar
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2011-11-21  5:03 UTC (permalink / raw)
  To: Jayant R. Sonar; +Cc: gcc-help

"Jayant R. Sonar" <Jayant.Sonar@kpitcummins.com> writes:

> I looked into the RTL dumps and found the original call sequence, where 
> (r1,r0) is loaded with appropriate address value before adding 3, is valid 
> up to IRA phase. The reload phase is optimizing out the important load 
> instruction (movsi_long).
>
> Dump before 'postreload' i.e. at 'ira':
> ===========================================================================
> (insn 1613 2944 2945 65 Convert.c:707 
> (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
>         (reg:SI 1 r1)) 61 {pushsi_internal} (nil))
>
> (insn 2945 1613 1614 65 Convert.c:707 
> (set (reg/f:SI 0 r0 [1056])
>         (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil))
>
> (insn 1614 2945 1615 65 Convert.c:707 
> (set (reg/f:SI 0 r0 [1056])
>         (plus:SI (reg/f:SI 0 r0 [1056])
>             (const_int 3 [0x3]))) 1 {addsi3} (nil))
>
>
> Dump of 'postreload':
> ===========================================================================
> (insn 1613 2944 1614 65 Convert.c:707 
> (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32])
>         (reg:SI 1 r1)) 61 {pushsi_internal} (nil))
>
> (insn 1614 1613 1615 65 Convert.c:707 
> (set (reg/f:SI 0 r0 [1056])
>         (plus:SI (reg/f:SI 0 r0 [1056])
>             (const_int 3 [0x3]))) 1 {addsi3} (nil))
>
>
> The instruction 2945 is deleted during reload phase.

Looking at the extra RTL you attached, I see this in the ira dump:

(insn 1605 1604 1607 65 RcHeap.h:351 (set (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])
        (reg:SI 0 r0)) 67 {*movsi_long} (nil))

Between this RTL insn and insn 2945, there are no labels, and there is
no modification of either r0 or r12.  Therefore, the value in r12 is the
same as the value in r0.  Therefore, insn 2945, which sets r0 to r12,
appears unnecessary.

> Just for the reference, giving here again the inappropriate assembly code 
> generated:
>
>         addd    (sp), (r2,r1)
>         push    $2,r1   <<<<<<<<<<<<<<<<<==================================
> .LCFI548:                    Insn loading addr in (r1,r0) should be here
>         addd    $3, (r1,r0)  <<<<<<<<<<<<<<<<<<<<==========================

Looking at this, it looks like the addition modifies both r1 and r0.
This implies that your registers hold HImode values, so the modification
to r1 changes the SImode in r0.

If that is why your instruction sequence is wrong, then are you certain
that the value of HARD_REGNO_NREGS is correct?

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [CR16] Improper register allocation
  2011-11-21  5:03     ` Ian Lance Taylor
@ 2011-11-22  8:02       ` Jayant R. Sonar
  0 siblings, 0 replies; 5+ messages in thread
From: Jayant R. Sonar @ 2011-11-22  8:02 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Hello Ian,

>> Looking at this, it looks like the addition modifies both r1 and r0.
>> This implies that your registers hold HImode values, so the modification
>> to r1 changes the SImode in r0.

I am sorry I should have mentioned register sizes earlier itself.
CR16 has R0-R11 as 16bit registers and R12-R15 as 32 bit registers. For SI 
modes, registers pairs of 16 bit registers are used. 
I am working on GCC-4.5.1.

In the application's IRA dump, we can see between insns 1605 and 2945, two 
register pairs in SI mode, (r1,r0) and (r2,r1) are being used concurrently 
for two different purposes. A pair is referred as a single SI mode 
register with lower register number i.e. First pair is referred with just R0 
and second pair with just R1. As two register numbers appear different to 
compiler, it doesn't feel they are overlapping in any way, which is not true 
for SI mode. As a result, these 2 pairs get one overlapping register 'r1'.

>> If that is why your instruction sequence is wrong, then are you certain
>> that the value of HARD_REGNO_NREGS is correct?

I checked 'HARD_REGNO_NREGS' macro. It is working fine.

While reading some previous archives, I found that few other architectures 
had faced register allocation problem with IRA phase. Therefore I tried to 
tweak the IRA_COVER_CLASSES macro in my port. Initially it was defined as:
#define IRA_COVER_CLASSES        \
{                                \
   GENERAL_REGS, LIM_REG_CLASSES \
}

When I split the GENERAL_REGS in following way, it solved my problem.
#define IRA_COVER_CLASSES                 \
{                                         \
   SHORT_REGS, LONG_REGS, LIM_REG_CLASSES \
}

Now compiler is generating following assembly code:
	addd	$-2, (sp)
.LCFI547:
	movd	$18, (r2,r1)	
	addd	(sp), (r2,r1)	
	push	$2,r1	
.LCFI548:
	movd	(r8,r7), (r1,r0)
	addd	$3, (r1,r0)

As can be seen here, though same overlapping register pairs are being used 
here, between the first push and next addd instruction, compiler generates a 
load instruction to load the (r1,r0) contents again.

Thanks and Regards,
Jayant Sonar
[KPIT Cummins, Pune]


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-11-21 14:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-20  0:48 [CR16] Improper register allocation Jayant R. Sonar
2011-11-20  5:32 ` Ian Lance Taylor
2011-11-20 15:53   ` Jayant R. Sonar
2011-11-21  5:03     ` Ian Lance Taylor
2011-11-22  8:02       ` Jayant R. Sonar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).