public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* LRA for avr: help with FP and elimination
@ 2023-07-13  9:27 SenthilKumar.Selvaraj
  2023-07-14 13:29 ` Vladimir Makarov
  0 siblings, 1 reply; 8+ messages in thread
From: SenthilKumar.Selvaraj @ 2023-07-13  9:27 UTC (permalink / raw)
  To: gcc

Hi,

  I've been spending some (spare) time checking what it would take to
  make LRA work for the avr target.

  Right after I removed the TARGET_LRA_P hook disabling LRA, building
  libgcc failed with a weird ICE. On the avr, the stack pointer (SP)
  is not used to access stack slots - TARGET_CAN_ELIMINATE returns false
  if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
  if get_frame_size() > 0.

  With LRA, however, reload generates

(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
        (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
     (nil))

  and the backend code errors out when it finds SP is being used as a
  pointer register.

  Digging through the RTL dumps, I found the following. For the
  following insn sequence in *.ira

(insn 189 128 159 7 (set (reg:HI 58 [ b ])
        (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
     (nil))
(insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
        (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
     (nil))
(insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
        (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
     (nil))

  1. For r58, IRA picks R28:R29, which is the frame pointer for avr.

      Popping a13(r58,l0)  --         assign reg 28

  2. LRA sees the subreg in insn 159 and generates a reload reg
  (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
  bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
  reload isn't completed yet. LRA therefore decides rclass for the
  pseudo reg is NO_REGS.

<snip>
Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
  159: r125:HI#0=r86:QI

  4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
  That is my understanding, please correct me if I'm wrong.
<snip>
            0 Small class reload: reject+=3
            0 Non input pseudo reload: reject++
            Cycle danger: overall += LRA_MAX_REJECT
          alt=0,overall=610,losers=1,rld_nregs=1
            0 Small class reload: reject+=3
            0 Non input pseudo reload: reject++
            alt=1: Bad operand -- refuse
            0 Non pseudo reload: reject++
          alt=2,overall=1,losers=0,rld_nregs=0
	 Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}

  5. LRA creates stack slots, and then uses the FP register to access
  the slots. This is despite r58 already being assigned R28:R29. 

  6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
     frame_pointer_needed is not set, despite the creation of stack
     slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
     and this eventually causes the ICE when the avr backend sees SP being
     used as a pointer register.

  This is the relevant sequence after reload
<snip>
(insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
        (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
     (nil))
(insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
        (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
     (nil))
(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
        (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
     (nil))
(insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
        (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split}
     (nil))
(insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
        (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
     (nil))
(insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 2 [0x2])) [2 %sfp+2 S1 A8])
        (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
     (nil))
(insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
        (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
                (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split}
     (nil))

  For choices other than FP, simplify_subreg_regno returns the correct part
  of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.

  I checked what classic reload does in the same situation - it picks a
  different register (R25) instead of spilling to a stack slot.

<snip>
(insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
        (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
     (nil))
(insn 159 189 226 7 (set (reg:QI 25 r25)
        (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
     (nil))
(insn 226 159 160 7 (set (reg:QI 28 r28)
        (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
     (nil))
(insn 160 226 227 7 (set (reg:QI 25 r25)
        (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
     (nil))
(insn 227 160 33 7 (set (reg:QI 29 r29)
        (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
     (nil))


  My questions:

  1. Is there something obvious the avr backend is doing wrong that is
  causing this?

  2. Shouldn't LRA ask the backend for frame_pointer_required_p and
  update frame_pointer_needed if it creates stack slots?

  3. Even if (2) works, I see that lra-eliminates.cc:update_reg_eliminate
  asserts that if the backend said elimination to SP is
ok first up, it
  cannot reject that elimination later (line 1165). If the only reason
  FP is required is because LRA created stack
slots, what should the backend do?
  
  4. When simplify_subreg_regno bails for FP, lra-constraints.cc:1815
  sets rclass = NO_REGS and forces a spill to memory. The comment says
  it is to prevent infinite looping, but for this case, doesn't it
  make sense to look for other regs?
  
  5. I can work around the problem by disabling elimination from FP to SP
  when lra_in_progress, but I think it pevents IRA/LRA from using
  R28:R29 even when FP is not required at all?

  6. Basic question, but does FP to SP elimination mean any operation
  possible with FP should be doable with SP as well?


Regards
Senthil

  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-07-13  9:27 LRA for avr: help with FP and elimination SenthilKumar.Selvaraj
@ 2023-07-14 13:29 ` Vladimir Makarov
  2023-07-17  7:17   ` SenthilKumar.Selvaraj
  2023-07-27 11:50   ` Maciej W. Rozycki
  0 siblings, 2 replies; 8+ messages in thread
From: Vladimir Makarov @ 2023-07-14 13:29 UTC (permalink / raw)
  To: SenthilKumar.Selvaraj, gcc


On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:
> Hi,
>
>    I've been spending some (spare) time checking what it would take to
>    make LRA work for the avr target.
>
>    Right after I removed the TARGET_LRA_P hook disabling LRA, building
>    libgcc failed with a weird ICE.

>   On the avr, the stack pointer (SP)
>    is not used to access stack slots
It is very uncommon target then.
>   - TARGET_CAN_ELIMINATE returns false
>    if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
>    if get_frame_size() > 0.
>
>    With LRA, however, reload generates
>
> (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
>          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
>
>    and the backend code errors out when it finds SP is being used as a
>    pointer register.
>
>    Digging through the RTL dumps, I found the following. For the
>    following insn sequence in *.ira
>
> (insn 189 128 159 7 (set (reg:HI 58 [ b ])
>          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
>       (nil))
> (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
>          (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
> (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
>          (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
>
>    1. For r58, IRA picks R28:R29, which is the frame pointer for avr.
>
>        Popping a13(r58,l0)  --         assign reg 28
>
>    2. LRA sees the subreg in insn 159 and generates a reload reg
>    (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
>    bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
>    reload isn't completed yet. LRA therefore decides rclass for the
>    pseudo reg is NO_REGS.
>
> <snip>
> Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
>    159: r125:HI#0=r86:QI
>
>    4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
>    That is my understanding, please correct me if I'm wrong.
> <snip>
>              0 Small class reload: reject+=3
>              0 Non input pseudo reload: reject++
>              Cycle danger: overall += LRA_MAX_REJECT
>            alt=0,overall=610,losers=1,rld_nregs=1
>              0 Small class reload: reject+=3
>              0 Non input pseudo reload: reject++
>              alt=1: Bad operand -- refuse
>              0 Non pseudo reload: reject++
>            alt=2,overall=1,losers=0,rld_nregs=0
> 	 Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}
>
>    5. LRA creates stack slots, and then uses the FP register to access
>    the slots. This is despite r58 already being assigned R28:R29.
>
>    6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
>       frame_pointer_needed is not set, despite the creation of stack
>       slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
>       and this eventually causes the ICE when the avr backend sees SP being
>       used as a pointer register.
>
>    This is the relevant sequence after reload
> <snip>
> (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
>          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
>       (nil))
> (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
>          (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
>       (nil))
> (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
>          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
> (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
>          (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split}
>       (nil))
> (insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
>          (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
>       (nil))
> (insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 2 [0x2])) [2 %sfp+2 S1 A8])
>          (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
> (insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
>          (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
>                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split}
>       (nil))
>
>    For choices other than FP, simplify_subreg_regno returns the correct part
>    of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.
>
>    I checked what classic reload does in the same situation - it picks a
>    different register (R25) instead of spilling to a stack slot.
>
> <snip>
> (insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
>          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
>       (nil))
> (insn 159 189 226 7 (set (reg:QI 25 r25)
>          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
> (insn 226 159 160 7 (set (reg:QI 28 r28)
>          (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
> (insn 160 226 227 7 (set (reg:QI 25 r25)
>          (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
> (insn 227 160 33 7 (set (reg:QI 29 r29)
>          (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
>       (nil))
>
>
>    My questions:
>
>    1. Is there something obvious the avr backend is doing wrong that is
>    causing this?
No.  In my opinion if it worked with reload, it should work with LRA 
because LRA is a substitute for reload.
>    2. Shouldn't LRA ask the backend for frame_pointer_required_p and
>    update frame_pointer_needed if it creates stack slots?
I think so.
>    3. Even if (2) works, I see that lra-eliminates.cc:update_reg_eliminate
>    asserts that if the backend said elimination to SP is
> ok first up, it
>    cannot reject that elimination later (line 1165). If the only reason
>    FP is required is because LRA created stack
> slots, what should the backend do?
>    
I think it can be relaxed for avr on which sp can not be used access 
stack memory
>    4. When simplify_subreg_regno bails for FP, lra-constraints.cc:1815
>    sets rclass = NO_REGS and forces a spill to memory. The comment says
>    it is to prevent infinite looping, but for this case, doesn't it
>    make sense to look for other regs?
>    
I guess it can be relaxed for avr and frame-pointer too as avr does not 
use sp for accessing memory
>    5. I can work around the problem by disabling elimination from FP to SP
>    when lra_in_progress, but I think it pevents IRA/LRA from using
>    R28:R29 even when FP is not required at all?
Yes, it is better to avoid disabling elimination
>    6. Basic question, but does FP to SP elimination mean any operation
>    possible with FP should be doable with SP as well?

Hard to say. There are a lot of undocumented RA assumptions.

If you send me the preprocessed test, I could start to work on it to fix 
the problems.  I think it is hard to fix them right for a person having 
a little experience with LRA.


>    


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-07-14 13:29 ` Vladimir Makarov
@ 2023-07-17  7:17   ` SenthilKumar.Selvaraj
  2023-07-18 15:04     ` Vladimir Makarov
  2023-07-27 11:50   ` Maciej W. Rozycki
  1 sibling, 1 reply; 8+ messages in thread
From: SenthilKumar.Selvaraj @ 2023-07-17  7:17 UTC (permalink / raw)
  To: gcc, vmakarov

On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:
> > Hi,
> > 
> >    I've been spending some (spare) time checking what it would take to
> >    make LRA work for the avr target.
> > 
> >    Right after I removed the TARGET_LRA_P hook disabling LRA, building
> >    libgcc failed with a weird ICE.
> >   On the avr, the stack pointer (SP)
> >    is not used to access stack slots
> It is very uncommon target then.
> >   - TARGET_CAN_ELIMINATE returns false
> >    if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
> >    if get_frame_size() > 0.
> > 
> >    With LRA, however, reload generates
> > 
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > 
> >    and the backend code errors out when it finds SP is being used as a
> >    pointer register.
> > 
> >    Digging through the RTL dumps, I found the following. For the
> >    following insn sequence in *.ira
> > 
> > (insn 189 128 159 7 (set (reg:HI 58 [ b ])
> >          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
> >          (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
> >          (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > 
> >    1. For r58, IRA picks R28:R29, which is the frame pointer for avr.
> > 
> >        Popping a13(r58,l0)  --         assign reg 28
> > 
> >    2. LRA sees the subreg in insn 159 and generates a reload reg
> >    (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
> >    bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
> >    reload isn't completed yet. LRA therefore decides rclass for the
> >    pseudo reg is NO_REGS.
> > 
> > <snip>
> > Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
> >    159: r125:HI#0=r86:QI
> > 
> >    4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
> >    That is my understanding, please correct me if I'm wrong.
> > <snip>
> >              0 Small class reload: reject+=3
> >              0 Non input pseudo reload: reject++
> >              Cycle danger: overall += LRA_MAX_REJECT
> >            alt=0,overall=610,losers=1,rld_nregs=1
> >              0 Small class reload: reject+=3
> >              0 Non input pseudo reload: reject++
> >              alt=1: Bad operand -- refuse
> >              0 Non pseudo reload: reject++
> >            alt=2,overall=1,losers=0,rld_nregs=0
> >        Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}
> > 
> >    5. LRA creates stack slots, and then uses the FP register to access
> >    the slots. This is despite r58 already being assigned R28:R29.
> > 
> >    6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
> >       frame_pointer_needed is not set, despite the creation of stack
> >       slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
> >       and this eventually causes the ICE when the avr backend sees SP being
> >       used as a pointer register.
> > 
> >    This is the relevant sequence after reload
> > <snip>
> > (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >          (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >          (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 2 [0x2])) [2 %sfp+2 S1 A8])
> >          (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > 
> >    For choices other than FP, simplify_subreg_regno returns the correct part
> >    of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.
> > 
> >    I checked what classic reload does in the same situation - it picks a
> >    different register (R25) instead of spilling to a stack slot.
> > 
> > <snip>
> > (insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 159 189 226 7 (set (reg:QI 25 r25)
> >          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 226 159 160 7 (set (reg:QI 28 r28)
> >          (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 160 226 227 7 (set (reg:QI 25 r25)
> >          (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 227 160 33 7 (set (reg:QI 29 r29)
> >          (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > 
> > 
> >    My questions:
> > 
> >    1. Is there something obvious the avr backend is doing wrong that is
> >    causing this?
> No.  In my opinion if it worked with reload, it should work with LRA
> because LRA is a substitute for reload.
> >    2. Shouldn't LRA ask the backend for frame_pointer_required_p and
> >    update frame_pointer_needed if it creates stack slots?
> I think so.
> >    3. Even if (2) works, I see that lra-eliminates.cc:update_reg_eliminate
> >    asserts that if the backend said elimination to SP is
> > ok first up, it
> >    cannot reject that elimination later (line 1165). If the only reason
> >    FP is required is because LRA created stack
> > slots, what should the backend do?
> > 
> I think it can be relaxed for avr on which sp can not be used access
> stack memory
> >    4. When simplify_subreg_regno bails for FP, lra-constraints.cc:1815
> >    sets rclass = NO_REGS and forces a spill to memory. The comment says
> >    it is to prevent infinite looping, but for this case, doesn't it
> >    make sense to look for other regs?
> > 
> I guess it can be relaxed for avr and frame-pointer too as avr does not
> use sp for accessing memory
> >    5. I can work around the problem by disabling elimination from FP to SP
> >    when lra_in_progress, but I think it pevents IRA/LRA from using
> >    R28:R29 even when FP is not required at all?
> Yes, it is better to avoid disabling elimination
> >    6. Basic question, but does FP to SP elimination mean any operation
> >    possible with FP should be doable with SP as well?
> 
> Hard to say. There are a lot of undocumented RA assumptions.
> 
> If you send me the preprocessed test, I could start to work on it to fix
> the problems.  I think it is hard to fix them right for a person having
> a little experience with LRA.
> 
> 
Ok, this is a reduced test case that reproduces the failure.

$ cat case.c
typedef int HItype __attribute__ ((mode (HI)));
HItype
__mulvhi3 (HItype a, HItype b)
{
  HItype w;

  if (__builtin_mul_overflow (a, b, &w))
    __builtin_trap ();

  return w;
}

On latest master, this trivial patch turns on LRA for avr
--- gcc/config/avr/avr.cc
+++ gcc/config/avr/avr.cc
@@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode mode, enum rtx_code)
 #undef  TARGET_CONVERT_TO_TYPE
 #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef  TARGET_ADDR_SPACE_SUBSET_P
 #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
 
Then configuring and building for avr without attempting to build libgcc

$ configure --target=avr --prefix=<prefix_dir> --enable-languages=c && make all-host && make install-host

And finally to reproduce the failure
$ <prefix_dir>/bin/avr-gcc -mmcu=avr25 case.c -Os

Thanks for taking this up - please do let me know if you need more information.

Regards
Senthil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-07-17  7:17   ` SenthilKumar.Selvaraj
@ 2023-07-18 15:04     ` Vladimir Makarov
  2023-08-10 11:33       ` SenthilKumar.Selvaraj
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Makarov @ 2023-07-18 15:04 UTC (permalink / raw)
  To: SenthilKumar.Selvaraj, gcc


On 7/17/23 03:17, SenthilKumar.Selvaraj@microchip.com wrote:
> On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
>> If you send me the preprocessed test, I could start to work on it to fix
>> the problems.  I think it is hard to fix them right for a person having
>> a little experience with LRA.
>>
>>
> Ok, this is a reduced test case that reproduces the failure.
>
> $ cat case.c
> typedef int HItype __attribute__ ((mode (HI)));
> HItype
> __mulvhi3 (HItype a, HItype b)
> {
>    HItype w;
>
>    if (__builtin_mul_overflow (a, b, &w))
>      __builtin_trap ();
>
>    return w;
> }
>
> On latest master, this trivial patch turns on LRA for avr
> --- gcc/config/avr/avr.cc
> +++ gcc/config/avr/avr.cc
> @@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode mode, enum rtx_code)
>   #undef  TARGET_CONVERT_TO_TYPE
>   #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
>   
> -#undef TARGET_LRA_P
> -#define TARGET_LRA_P hook_bool_void_false
> -
>   #undef  TARGET_ADDR_SPACE_SUBSET_P
>   #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
>   
> Then configuring and building for avr without attempting to build libgcc
>
> $ configure --target=avr --prefix=<prefix_dir> --enable-languages=c && make all-host && make install-host
>
> And finally to reproduce the failure
> $ <prefix_dir>/bin/avr-gcc -mmcu=avr25 case.c -Os

Thank you.  I've reproduced the bug and started to work on it 
yesterday.  The problem is a bit tricky than I initially thought but I 
believe I'll fix it on this week.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-07-14 13:29 ` Vladimir Makarov
  2023-07-17  7:17   ` SenthilKumar.Selvaraj
@ 2023-07-27 11:50   ` Maciej W. Rozycki
  2023-07-27 13:03     ` Paul Koning
  1 sibling, 1 reply; 8+ messages in thread
From: Maciej W. Rozycki @ 2023-07-27 11:50 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: SenthilKumar.Selvaraj, gcc

On Fri, 14 Jul 2023, Vladimir Makarov via Gcc wrote:

> >   On the avr, the stack pointer (SP)
> >    is not used to access stack slots
> It is very uncommon target then.

 Same with the VAX target.  SP is used for outgoing function arguments, 
function calls, alloca only.  AP is used for incoming function arguments 
and is set automatically by hardware at function entry.  FP is used for 
local variables and is likewise set by hardware at function entry.  The 
RET instruction sets SP from FP automatically as the first step of the 
function return sequence.

 I guess it'll affect LRA conversion of the target too.

  Maciej

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-07-27 11:50   ` Maciej W. Rozycki
@ 2023-07-27 13:03     ` Paul Koning
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Koning @ 2023-07-27 13:03 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Vladimir Makarov, SenthilKumar.Selvaraj, gcc



> On Jul 27, 2023, at 7:50 AM, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> 
> On Fri, 14 Jul 2023, Vladimir Makarov via Gcc wrote:
> 
>>>  On the avr, the stack pointer (SP)
>>>   is not used to access stack slots
>> It is very uncommon target then.
> 
> Same with the VAX target.  SP is used for outgoing function arguments, 
> function calls, alloca only.  AP is used for incoming function arguments 
> and is set automatically by hardware at function entry.  FP is used for 
> local variables and is likewise set by hardware at function entry.

While most other targets maintain FP in software, doesn't the same description apply to any target that can have a frame pointer.  The frame pointer may be used only some of the time (PDP-11) or always (VAX) but when it's used local variable references and argument references would go through FP, not SP, right?

	paul



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-07-18 15:04     ` Vladimir Makarov
@ 2023-08-10 11:33       ` SenthilKumar.Selvaraj
  2023-08-11 16:02         ` Vladimir Makarov
  0 siblings, 1 reply; 8+ messages in thread
From: SenthilKumar.Selvaraj @ 2023-08-10 11:33 UTC (permalink / raw)
  To: gcc, vmakarov

On Tue, 2023-07-18 at 11:04 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 7/17/23 03:17, SenthilKumar.Selvaraj@microchip.com wrote:
> > On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> > > If you send me the preprocessed test, I could start to work on it to fix
> > > the problems.  I think it is hard to fix them right for a person having
> > > a little experience with LRA.
> > > 
> > > 
> > Ok, this is a reduced test case that reproduces the failure.
> > 
> > $ cat case.c
> > typedef int HItype __attribute__ ((mode (HI)));
> > HItype
> > __mulvhi3 (HItype a, HItype b)
> > {
> >    HItype w;
> > 
> >    if (__builtin_mul_overflow (a, b, &w))
> >      __builtin_trap ();
> > 
> >    return w;
> > }
> > 
> > On latest master, this trivial patch turns on LRA for avr
> > --- gcc/config/avr/avr.cc
> > +++ gcc/config/avr/avr.cc
> > @@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode mode, enum rtx_code)
> >   #undef  TARGET_CONVERT_TO_TYPE
> >   #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
> > 
> > -#undef TARGET_LRA_P
> > -#define TARGET_LRA_P hook_bool_void_false
> > -
> >   #undef  TARGET_ADDR_SPACE_SUBSET_P
> >   #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
> > 
> > Then configuring and building for avr without attempting to build libgcc
> > 
> > $ configure --target=avr --prefix=<prefix_dir> --enable-languages=c && make all-host && make install-host
> > 
> > And finally to reproduce the failure
> > $ <prefix_dir>/bin/avr-gcc -mmcu=avr25 case.c -Os
> 
> Thank you.  I've reproduced the bug and started to work on it
> yesterday.  The problem is a bit tricky than I initially thought but I
> believe I'll fix it on this week.
> 
> 
Hi Vlad,

  I can confirm your commit (https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832)
  fixes the above problem, thank you. However, I see execution failures if a 
  pseudo assigned to FP has to be spilled because of stack slot creation.

  To reproduce, build the compiler just like above, and then do

$ avr-gcc -mmcu=avr51 <gcc-src-dir>/gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S -fdump-rtl-all

  The execution failure occurs at this point
	
	movw r24,r2
	sbiw r24,36
	brne .L8

   r2 is never set anywhere at all in the assembly.

The relevant insns (in the IRA dump) are

(insn 3 15 4 3 (set (reg/v:HI 51 [ j ])
        (const_int 0 [0])) "gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split}
     (expr_list:REG_EQUAL (const_int 0 [0])
        (nil)))
...
(insn 28 27 67 8 (parallel [
            (set (reg/v:HI 51 [ j ])
                (plus:HI (reg/v:HI 51 [ j ])
                    (const_int 1 [0x1])))
            (clobber (scratch:QI))
        ]) "/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8 175 {addhi3_clobber}
     (nil))
...
(jump_insn 44 43 45 13 (parallel [
            (set (pc)
                (if_then_else (ne (reg/v:HI 51 [ j ])
                        (const_int 36 [0x24]))
                    (label_ref:HI 103)
                    (pc)))
            (clobber (scratch:QI))
        ]) "/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16 discrim 1 713 {cbranchhi4_insn}
     (expr_list:REG_DEAD (reg/v:HI 51 [ j ])
        (int_list:REG_BR_PROB 7 (nil)))
 -> 103)

  LRA deletes insns 3 and 28, and uses r2 in the jump_insn.

  In the reload dump, for pseudo r51, I'm seeing this
subreg regs:
	   Frame pointer can not be eliminated anymore
	   Spilling non-eliminable hard regs: 28 29
	 Spilling r51(28)
  Slot 0 regnos (width = 0):	 46
  Slot 1 regnos (width = 0):	 45

  lra_update_fp2sp_elimination calls spill_pseudos with
  HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1.

  Later down the line, process_bb_lives is called with dead_insn_p=true from
  lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out
  on that BB does not contain 51 (even though previous calls to the same BB did).
  
Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffffffd838: 25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664
664	  function_abi last_call_abi = default_function_abi;
(gdb) n
666	  reg_live_out = df_get_live_out (bb);
(gdb) 
667	  sparseset_clear (pseudos_live);
(gdb) p debug_bitmap(reg_live_out)

first = 0x321c128 current = 0x321c128 indx = 0
	0x321c128 next = (nil) prev = (nil) indx = 0
		bits = { 28 32 34 43 44 47 48 49 50 }

  process_bb_lives then considers the insn setting 51 (and
  the reload insns LRA created) as dead, and removes them.

BB 8
   Insn 67: point = 31, n_alt = -1
   Insn 114: point = 31, n_alt = 3
   Deleting dead insn 114
deleting insn with uid = 114.
   Insn 28: point = 31, n_alt = 1
   Deleting dead insn 28
deleting insn with uid = 28.
   Insn 113: point = 31, n_alt = 2
   Deleting dead insn 113

Same for insn 3 as well

BB 3
   Insn 92: point = 40, n_alt = -1
   Insn 5: point = 40, n_alt = 1
   Insn 4: point = 41, n_alt = 3
   Insn 3: point = 42, n_alt = 3
   Deleting dead insn 3
deleting insn with uid = 3.

  Yet when it prints "Global pseudo live data has been updated" after
  all this, r51 is live again :(

BB 8:
    livein: 8:

       43   44   47   48   49   50   51
    liveout: 8:

       28   32   34   43   44   47   48   49   50   51

  Eventually, it assigns 2 to r51, resulting in just the compare and
  branch instruction remaining in the assembly.

  Is this an LRA bug or is the target doing something wrong?

Regards
Senthil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LRA for avr: help with FP and elimination
  2023-08-10 11:33       ` SenthilKumar.Selvaraj
@ 2023-08-11 16:02         ` Vladimir Makarov
  0 siblings, 0 replies; 8+ messages in thread
From: Vladimir Makarov @ 2023-08-11 16:02 UTC (permalink / raw)
  To: SenthilKumar.Selvaraj, gcc


On 8/10/23 07:33, SenthilKumar.Selvaraj@microchip.com wrote:
>
> Hi Vlad,
>
>    I can confirm your commit (https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832)
>    fixes the above problem, thank you. However, I see execution failures if a
>    pseudo assigned to FP has to be spilled because of stack slot creation.
>
>    To reproduce, build the compiler just like above, and then do
>
> $ avr-gcc -mmcu=avr51 <gcc-src-dir>/gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S -fdump-rtl-all
>
>    The execution failure occurs at this point
> 	
> 	movw r24,r2
> 	sbiw r24,36
> 	brne .L8
>
>     r2 is never set anywhere at all in the assembly.
>
> The relevant insns (in the IRA dump) are
>
> (insn 3 15 4 3 (set (reg/v:HI 51 [ j ])
>          (const_int 0 [0])) "gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split}
>       (expr_list:REG_EQUAL (const_int 0 [0])
>          (nil)))
> ...
> (insn 28 27 67 8 (parallel [
>              (set (reg/v:HI 51 [ j ])
>                  (plus:HI (reg/v:HI 51 [ j ])
>                      (const_int 1 [0x1])))
>              (clobber (scratch:QI))
>          ]) "/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8 175 {addhi3_clobber}
>       (nil))
> ...
> (jump_insn 44 43 45 13 (parallel [
>              (set (pc)
>                  (if_then_else (ne (reg/v:HI 51 [ j ])
>                          (const_int 36 [0x24]))
>                      (label_ref:HI 103)
>                      (pc)))
>              (clobber (scratch:QI))
>          ]) "/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16 discrim 1 713 {cbranchhi4_insn}
>       (expr_list:REG_DEAD (reg/v:HI 51 [ j ])
>          (int_list:REG_BR_PROB 7 (nil)))
>   -> 103)
>
>    LRA deletes insns 3 and 28, and uses r2 in the jump_insn.
>
>    In the reload dump, for pseudo r51, I'm seeing this
> subreg regs:
> 	   Frame pointer can not be eliminated anymore
> 	   Spilling non-eliminable hard regs: 28 29
> 	 Spilling r51(28)
>    Slot 0 regnos (width = 0):	 46
>    Slot 1 regnos (width = 0):	 45
>
>    lra_update_fp2sp_elimination calls spill_pseudos with
>    HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1.
>
>    Later down the line, process_bb_lives is called with dead_insn_p=true from
>    lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out
>    on that BB does not contain 51 (even though previous calls to the same BB did).
>    
> Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffffffd838: 25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664
> 664	  function_abi last_call_abi = default_function_abi;
> (gdb) n
> 666	  reg_live_out = df_get_live_out (bb);
> (gdb)
> 667	  sparseset_clear (pseudos_live);
> (gdb) p debug_bitmap(reg_live_out)
>
> first = 0x321c128 current = 0x321c128 indx = 0
> 	0x321c128 next = (nil) prev = (nil) indx = 0
> 		bits = { 28 32 34 43 44 47 48 49 50 }
>
>    process_bb_lives then considers the insn setting 51 (and
>    the reload insns LRA created) as dead, and removes them.
>
> BB 8
>     Insn 67: point = 31, n_alt = -1
>     Insn 114: point = 31, n_alt = 3
>     Deleting dead insn 114
> deleting insn with uid = 114.
>     Insn 28: point = 31, n_alt = 1
>     Deleting dead insn 28
> deleting insn with uid = 28.
>     Insn 113: point = 31, n_alt = 2
>     Deleting dead insn 113
>
> Same for insn 3 as well
>
> BB 3
>     Insn 92: point = 40, n_alt = -1
>     Insn 5: point = 40, n_alt = 1
>     Insn 4: point = 41, n_alt = 3
>     Insn 3: point = 42, n_alt = 3
>     Deleting dead insn 3
> deleting insn with uid = 3.
>
>    Yet when it prints "Global pseudo live data has been updated" after
>    all this, r51 is live again :(
>
> BB 8:
>      livein: 8:
>
>         43   44   47   48   49   50   51
>      liveout: 8:
>
>         28   32   34   43   44   47   48   49   50   51
>
>    Eventually, it assigns 2 to r51, resulting in just the compare and
>    branch instruction remaining in the assembly.
>
>    Is this an LRA bug or is the target doing something wrong?
>
I've reproduced this.  Probably it is a bug with live info update when 
fp->sp elimination became invalid.

I'll start to work on this problem on the next week and hope to have a 
fix soon after that.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-11 16:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13  9:27 LRA for avr: help with FP and elimination SenthilKumar.Selvaraj
2023-07-14 13:29 ` Vladimir Makarov
2023-07-17  7:17   ` SenthilKumar.Selvaraj
2023-07-18 15:04     ` Vladimir Makarov
2023-08-10 11:33       ` SenthilKumar.Selvaraj
2023-08-11 16:02         ` Vladimir Makarov
2023-07-27 11:50   ` Maciej W. Rozycki
2023-07-27 13:03     ` Paul Koning

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).