public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Question on aarch64 prologue code.
@ 2023-09-06 10:04 Iain Sandoe
  2023-09-06 12:43 ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Iain Sandoe @ 2023-09-06 10:04 UTC (permalink / raw)
  To: GCC Development; +Cc: Richard Sandiford

Hi Folks,

On the Darwin aarch64 port, we have a number of cleanup test fails (pretty much corresponding to the [still open] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39244).  However, let’s assume that bug could be a red herring..

the underlying reason is missing CFI for the set of the FP which [with Darwin’s LLVM libunwind impl.] breaks the unwind through the function that triggers a signal.

———

taking one of the functions in cleanup-8.C (say fn1) which contains calls.

what I am seeing is something like:

__ZL3fn1v:
LFB28:
; BLOCK 2, count:1073741824 (estimated locally) seq:0
; PRED: ENTRY [always]  count:1073741824 (estimated locally, freq 1.0000) (FALLTHRU)
	stp	x29, x30, [sp, -32]!
// LCFI; or .cfi_xxx is present
	mov	x29, sp
// *** NO  LCFI (or .cfi_cfa_xxxx when that is enabled)
	str	x19, [sp, 16]
// LCFI / .cfi_xxxx is present.
	adrp	x19, __ZL3fn4i@PAGE
	add	x19, x19, __ZL3fn4i@PAGEOFF;momd
	mov	x1, x19
	mov	w0, 11
	bl	_signal
<snip>

———

The reason seems to be that, in expand_prolog, emit_frame_chain is true (as we would expect, given that this function makes calls).  However ‘frame_pointer_needed' is false, so that the call to aarch64_add_offset() [line aarch64.cc:10405] does not add CFA adjustments to the load of x29.

———

I have currently worked around this by defining a TARGET_FRAME_POINTER_REQUIRED which returns true unless the function is a leaf (if that’s the correct solution, then all is fine).

———

However, it does seem odd that the existing code sets up the FP, but never produces any CFA for it.

So is this a possible bug, or just that I misunderstand the relevant set of circumstances?

thanks.
Iain


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on aarch64 prologue code.
  2023-09-06 10:04 Question on aarch64 prologue code Iain Sandoe
@ 2023-09-06 12:43 ` Richard Sandiford
  2023-09-06 14:03   ` Iain Sandoe
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2023-09-06 12:43 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: GCC Development

Iain Sandoe <idsandoe@googlemail.com> writes:
> Hi Folks,
>
> On the Darwin aarch64 port, we have a number of cleanup test fails (pretty much corresponding to the [still open] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39244).  However, let’s assume that bug could be a red herring..
>
> the underlying reason is missing CFI for the set of the FP which [with Darwin’s LLVM libunwind impl.] breaks the unwind through the function that triggers a signal.

Just curious, do you have more details about why that is?  If the unwinder
is sophisticated enough to process CFI, it seems odd that it requires the
CFA to be defined in terms of the frame pointer.
>
> ———
>
> taking one of the functions in cleanup-8.C (say fn1) which contains calls.
>
> what I am seeing is something like:
>
> __ZL3fn1v:
> LFB28:
> ; BLOCK 2, count:1073741824 (estimated locally) seq:0
> ; PRED: ENTRY [always]  count:1073741824 (estimated locally, freq 1.0000) (FALLTHRU)
> 	stp	x29, x30, [sp, -32]!
> // LCFI; or .cfi_xxx is present
> 	mov	x29, sp
> // *** NO  LCFI (or .cfi_cfa_xxxx when that is enabled)
> 	str	x19, [sp, 16]
> // LCFI / .cfi_xxxx is present.
> 	adrp	x19, __ZL3fn4i@PAGE
> 	add	x19, x19, __ZL3fn4i@PAGEOFF;momd
> 	mov	x1, x19
> 	mov	w0, 11
> 	bl	_signal
> <snip>
>
> ———
>
> The reason seems to be that, in expand_prolog, emit_frame_chain is true (as we would expect, given that this function makes calls).  However ‘frame_pointer_needed' is false, so that the call to aarch64_add_offset() [line aarch64.cc:10405] does not add CFA adjustments to the load of x29.

Right.

> ———
>
> I have currently worked around this by defining a TARGET_FRAME_POINTER_REQUIRED which returns true unless the function is a leaf (if that’s the correct solution, then all is fine).

I suppose it depends on why the frame-pointer-based CFA is important
for Darwin.  If it's due to a more general requirement for a frame
pointer to be used, then yeah, that's probably the right fix.  If it's
more a quirk of the unwinder. then we could probably expose whatever
that quirk is as a new status bit.  Target-independent code in
dwarf2cfi.cc would then need to be aware as well.

> ———
>
> However, it does seem odd that the existing code sets up the FP, but never produces any CFA for it.
>
> So is this a possible bug, or just that I misunderstand the relevant set of circumstances?

emit_frame_chain fulfills an ABI requirement that every non-leaf function
set up a frame-chain record.  When emit_frame_chain && !frame_pointer_needed,
we set up the FP for ABI purposes only.  GCC can still access everything
relative to the stack pointer, and it can still describe the CFI based
purely on the stack pointer.

glibc-based systems only need the CFA to be based on the frame pointer
if the stack pointer moves during the body of the function (usually due
to alloca or VLAs).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on aarch64 prologue code.
  2023-09-06 12:43 ` Richard Sandiford
@ 2023-09-06 14:03   ` Iain Sandoe
  2023-09-06 17:22     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 4+ messages in thread
From: Iain Sandoe @ 2023-09-06 14:03 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Development

Hi Richard,

> On 6 Sep 2023, at 13:43, Richard Sandiford via Gcc <gcc@gcc.gnu.org> wrote:
> 
> Iain Sandoe <idsandoe@googlemail.com> writes:

>> On the Darwin aarch64 port, we have a number of cleanup test fails (pretty much corresponding to the [still open] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39244).  However, let’s assume that bug could be a red herring..
>> 
>> the underlying reason is missing CFI for the set of the FP which [with Darwin’s LLVM libunwind impl.] breaks the unwind through the function that triggers a signal.
> 
> Just curious, do you have more details about why that is?  If the unwinder
> is sophisticated enough to process CFI, it seems odd that it requires the
> CFA to be defined in terms of the frame pointer.

Let me see if I can answer that below.

<snip>

>> <——
>> 
>> I have currently worked around this by defining a TARGET_FRAME_POINTER_REQUIRED which returns true unless the function is a leaf (if that’s the correct solution, then all is fine).
> 
> I suppose it depends on why the frame-pointer-based CFA is important
> for Darwin.  If it's due to a more general requirement for a frame
> pointer to be used, then yeah, that's probably the right fix.

The Darwin ABI  mandates a frame pointer (although it is omitted by clang for leaf functions).

>  If it's
> more a quirk of the unwinder. then we could probably expose whatever
> that quirk is as a new status bit.  Target-independent code in
> dwarf2cfi.cc would then need to be aware as well.

(I suspect) it is the interaction between the mandatory FP and the fact that GCC lays out the stack differently from the other Darwin toolchains at present [port Issue #19].

For the system toolchain, 30 and 29 are always placed first, right below the SP (other callee saves are made below that in a specified order and always in pairs - presumably, with an uneccessary spill half the time) - Actually, I had a look at the weekend, but cannot find specific documentation on this particular aspect of the ABI  (but, of course, the de facto ABI is what the system toolchain does, regardless of presence/absence of any such doc).

However (speculation) that means that the FP is not saved where the system tools expect it, maybe that is confusing the unwinder absent the fp cfa.  Of course, it could also just be an unwinder bug that is never triggered by clang’s codegen.

GCC’s different layout currently defeats compact unwinding on all but leaf frames, so one day I want to fix it ...
.. however making this change is quite heavy lifting and I think there are higher priorities for the port (so, as long as we have working unwind and no observable fallout, I am deferring that change).

Note that Darwin’s ABI also has a red zone (but we have not yet made any use of this, since there is no existing aarch64 impl. and I’ve not had time to get to it).  However, AFAICS that is an optimisation - we can still be correct without it.

>> ———
>> 
>> However, it does seem odd that the existing code sets up the FP, but never produces any CFA for it.
>> 
>> So is this a possible bug, or just that I misunderstand the relevant set of circumstances?
> 
> emit_frame_chain fulfills an ABI requirement that every non-leaf function
> set up a frame-chain record.  When emit_frame_chain && !frame_pointer_needed,
> we set up the FP for ABI purposes only.  GCC can still access everything
> relative to the stack pointer, and it can still describe the CFI based
> purely on the stack pointer.

Thanks that makes sense
- I guess libunwind is never used with aarch64 linux, even in a clang/llvm toolchain.
> 
> glibc-based systems only need the CFA to be based on the frame pointer
> if the stack pointer moves during the body of the function (usually due
> to alloca or VLAs).

I’d have to poke more at the unwinder code and do some more debugging - it seems reasonable that it could work for any unwinder that’s based on DWARF (although, if we have completely missing unwind info, then the different stack layout would surely defeat any fallback proceedure).

thanks
Iain

> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on aarch64 prologue code.
  2023-09-06 14:03   ` Iain Sandoe
@ 2023-09-06 17:22     ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Earnshaw (lists) @ 2023-09-06 17:22 UTC (permalink / raw)
  To: Iain Sandoe, Richard Sandiford; +Cc: GCC Development

On 06/09/2023 15:03, Iain Sandoe wrote:
> Hi Richard,
> 
>> On 6 Sep 2023, at 13:43, Richard Sandiford via Gcc <gcc@gcc.gnu.org> wrote:
>>
>> Iain Sandoe <idsandoe@googlemail.com> writes:
> 
>>> On the Darwin aarch64 port, we have a number of cleanup test fails (pretty much corresponding to the [still open] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39244).  However, let’s assume that bug could be a red herring..
>>>
>>> the underlying reason is missing CFI for the set of the FP which [with Darwin’s LLVM libunwind impl.] breaks the unwind through the function that triggers a signal.
>>
>> Just curious, do you have more details about why that is?  If the unwinder
>> is sophisticated enough to process CFI, it seems odd that it requires the
>> CFA to be defined in terms of the frame pointer.
> 
> Let me see if I can answer that below.
> 
> <snip>
> 
>>> <——
>>>
>>> I have currently worked around this by defining a TARGET_FRAME_POINTER_REQUIRED which returns true unless the function is a leaf (if that’s the correct solution, then all is fine).
>>
>> I suppose it depends on why the frame-pointer-based CFA is important
>> for Darwin.  If it's due to a more general requirement for a frame
>> pointer to be used, then yeah, that's probably the right fix.
> 
> The Darwin ABI  mandates a frame pointer (although it is omitted by clang for leaf functions).
> 
>>  If it's
>> more a quirk of the unwinder. then we could probably expose whatever
>> that quirk is as a new status bit.  Target-independent code in
>> dwarf2cfi.cc would then need to be aware as well.
> 
> (I suspect) it is the interaction between the mandatory FP and the fact that GCC lays out the stack differently from the other Darwin toolchains at present [port Issue #19].
> 
> For the system toolchain, 30 and 29 are always placed first, right below the SP (other callee saves are made below that in a specified order and always in pairs - presumably, with an uneccessary spill half the time) - Actually, I had a look at the weekend, but cannot find specific documentation on this particular aspect of the ABI  (but, of course, the de facto ABI is what the system toolchain does, regardless of presence/absence of any such doc).
> 
> However (speculation) that means that the FP is not saved where the system tools expect it, maybe that is confusing the unwinder absent the fp cfa.  Of course, it could also just be an unwinder bug that is never triggered by clang’s codegen.
> 
> GCC’s different layout currently defeats compact unwinding on all but leaf frames, so one day I want to fix it ...
> .. however making this change is quite heavy lifting and I think there are higher priorities for the port (so, as long as we have working unwind and no observable fallout, I am deferring that change).
> 
> Note that Darwin’s ABI also has a red zone (but we have not yet made any use of this, since there is no existing aarch64 impl. and I’ve not had time to get to it).  However, AFAICS that is an optimisation - we can still be correct without it.
> 
>>> ———
>>>
>>> However, it does seem odd that the existing code sets up the FP, but never produces any CFA for it.
>>>
>>> So is this a possible bug, or just that I misunderstand the relevant set of circumstances?
>>
>> emit_frame_chain fulfills an ABI requirement that every non-leaf function
>> set up a frame-chain record.  When emit_frame_chain && !frame_pointer_needed,
>> we set up the FP for ABI purposes only.  GCC can still access everything
>> relative to the stack pointer, and it can still describe the CFI based
>> purely on the stack pointer.
> 
> Thanks that makes sense
> - I guess libunwind is never used with aarch64 linux, even in a clang/llvm toolchain.
>>
>> glibc-based systems only need the CFA to be based on the frame pointer
>> if the stack pointer moves during the body of the function (usually due
>> to alloca or VLAs).
> 
> I’d have to poke more at the unwinder code and do some more debugging - it seems reasonable that it could work for any unwinder that’s based on DWARF (although, if we have completely missing unwind info, then the different stack layout would surely defeat any fallback proceedure).
> 

This is only a guess, but it sounds to me like the issue might be that although we create a frame record, we don't use the frame pointer for accessing stack variables unless SP can't be used (eg: because the function calls alloca()).  This tends to be more efficient because offset addressing for SP is more flexible.  If we wanted to switch to making FP be the canonical frame address register we'd need to change all the code gen to use FP in addressing as well (or end up with some really messy translation when emitting debug information).

R.

> thanks
> Iain
> 
>>
>> Thanks,
>> Richard
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-09-06 17:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-06 10:04 Question on aarch64 prologue code Iain Sandoe
2023-09-06 12:43 ` Richard Sandiford
2023-09-06 14:03   ` Iain Sandoe
2023-09-06 17:22     ` Richard Earnshaw (lists)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).