public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* arm-none-eabi, nested function trampolines and caching
@ 2023-11-27 15:16 Ed Robbins
  2023-11-27 17:23 ` David Brown
       [not found] ` <8342aeef-4eef-231b-bf45-416660954fdb@marco.de>
  0 siblings, 2 replies; 11+ messages in thread
From: Ed Robbins @ 2023-11-27 15:16 UTC (permalink / raw)
  To: gcc-help

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

Hello,
I am using gcc-arm-none-eabi with a cortex M7 device, with caches
(data/instruction) enabled. Nested function calls result in a usage fault
because there is no clear cache call for this platform.

Is there a way to provide the required functions without rebuilding gcc? I
have been looking at the source and, as far as I can tell, there is not.

But there also doesn't look to be a clean way to implement this: It appears
that this is done on an operating system basis, and when running bare metal
it is not clear where the code would live.

There are also at least two approaches to solve it, I guess:
1. Somehow indicate on the command line (via target or a dedicated option)
to emit the clear cache call for cortex M, and I guess that the function
itself should do nothing if both caches are disabled.
2. Define hooks or provide a command line option so that developers can
provide an implementation for their platform?

Assuming I were to do this the improper way (and just create a build that
works only for my particular target): Where should I define
CLEAR_INSN_CACHE?

I am not sure if there is already a way to do all this that I am just
unaware of?

Many thanks,
Ed Robbins

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-27 15:16 arm-none-eabi, nested function trampolines and caching Ed Robbins
@ 2023-11-27 17:23 ` David Brown
  2023-11-27 18:30   ` Richard Earnshaw
  2023-11-28  9:18   ` Ed Robbins
       [not found] ` <8342aeef-4eef-231b-bf45-416660954fdb@marco.de>
  1 sibling, 2 replies; 11+ messages in thread
From: David Brown @ 2023-11-27 17:23 UTC (permalink / raw)
  To: edd.robbins, gcc-help; +Cc: Ed Robbins

On 27/11/2023 16:16, Ed Robbins via Gcc-help wrote:
> Hello,
> I am using gcc-arm-none-eabi with a cortex M7 device, with caches
> (data/instruction) enabled. Nested function calls result in a usage fault
> because there is no clear cache call for this platform.
> 

I am not sure I understand you here.  Are you talking about trying to 
use gcc nested function extensions, implemented by trampolines (small 
function stubs on the stack)?  If so, then the simple answer is - don't 
do that.  It's a /really/ bad idea.  As far as I understand it, these 
are a left-over from the way nested functions were originally 
implemented in other gcc languages (Pascal, Ada, Modula-2), which now 
handle things differently and far more efficiently.  Trampolines were a 
convenient way to implement nested functions some 30 years ago, before 
caches were the norm, before anyone thought about security, before 
processors had prefetching, and before people realised what an 
appallingly bad idea self-modifying code is.

If you want to use nested functions, use a language that supports nested 
functions, such as Ada, or use C++ with lambdas (which are a bit like 
nested functions only much better).

> Is there a way to provide the required functions without rebuilding gcc? I
> have been looking at the source and, as far as I can tell, there is not.

I can think of at least four ways :

1. The SDK for your microcontroller, provided by the manufacturer, will 
have headers with cache clear functions.

2. The ARM CMSIS headers - also available from your manufacturer - has 
intrinsic functions, including cache clear functions.

3. gcc has a generic "__buitin__clear_cache" function :
<https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005f_005f_005fclear_005fcache>

4. gcc supports the "ARM C Language Extensions", which include cache 
control intrinsics:

<https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html>
<https://developer.arm.com/documentation/ihi0053/latest/>


> 
> But there also doesn't look to be a clean way to implement this: It appears
> that this is done on an operating system basis, and when running bare metal
> it is not clear where the code would live.

There is no "clean" way to handle the appropriate cache invalidation, 
because there is no clean way to get the addresses you need for 
invalidating the instruction cache.  (Cleanly invalidating the 
instruction cache for other purposes, such as during firmware upgrades, 
is no problem.)

> 
> There are also at least two approaches to solve it, I guess:
> 1. Somehow indicate on the command line (via target or a dedicated option)
> to emit the clear cache call for cortex M, and I guess that the function
> itself should do nothing if both caches are disabled.
> 2. Define hooks or provide a command line option so that developers can
> provide an implementation for their platform?
> 
> Assuming I were to do this the improper way (and just create a build that
> works only for my particular target): Where should I define
> CLEAR_INSN_CACHE?
> 
> I am not sure if there is already a way to do all this that I am just
> unaware of?
> 

Seriously - don't use nested functions in C.  Even if you get them 
working, it would be painfully inefficient.  You'd have to flush parts 
of the data cache (to make sure the stack data is written out to main 
memory), taking time.  You'd then have to invalidate the relevant parts 
of the instruction cache.  (Even calculating what parts of these caches 
to clear will take time and effort.)  Then everything needs to be read 
into the caches again to actually execute the function.

And what's the point?  So that you can write :

	void foo(...) {
		int bar(...) {
			...
		}
		bar();
	}

instead of

	static int foo_bar(...) {
		...
	}
	void foo(...) {
		foo_bar();
	}

or

	void foo(...) {
		auto bar = [](...) {
			...
		}
		bar();
	}

?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-27 17:23 ` David Brown
@ 2023-11-27 18:30   ` Richard Earnshaw
  2023-11-28  9:18   ` Ed Robbins
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Earnshaw @ 2023-11-27 18:30 UTC (permalink / raw)
  To: David Brown, edd.robbins, gcc-help; +Cc: Ed Robbins



On 27/11/2023 17:23, David Brown wrote:
> On 27/11/2023 16:16, Ed Robbins via Gcc-help wrote:
>> Hello,
>> I am using gcc-arm-none-eabi with a cortex M7 device, with caches
>> (data/instruction) enabled. Nested function calls result in a usage fault
>> because there is no clear cache call for this platform.
>>
> 
> I am not sure I understand you here.  Are you talking about trying to 
> use gcc nested function extensions, implemented by trampolines (small 
> function stubs on the stack)?  If so, then the simple answer is - don't 
> do that.  It's a /really/ bad idea.  As far as I understand it, these 
> are a left-over from the way nested functions were originally 
> implemented in other gcc languages (Pascal, Ada, Modula-2), which now 
> handle things differently and far more efficiently.  Trampolines were a 
> convenient way to implement nested functions some 30 years ago, before 
> caches were the norm, before anyone thought about security, before 
> processors had prefetching, and before people realised what an 
> appallingly bad idea self-modifying code is.
> 
> If you want to use nested functions, use a language that supports nested 
> functions, such as Ada, or use C++ with lambdas (which are a bit like 
> nested functions only much better).
> 
>> Is there a way to provide the required functions without rebuilding 
>> gcc? I
>> have been looking at the source and, as far as I can tell, there is not.
> 
> I can think of at least four ways :
> 
> 1. The SDK for your microcontroller, provided by the manufacturer, will 
> have headers with cache clear functions.
> 
> 2. The ARM CMSIS headers - also available from your manufacturer - has 
> intrinsic functions, including cache clear functions.
> 
> 3. gcc has a generic "__buitin__clear_cache" function :
> <https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005f_005f_005fclear_005fcache>
> 
> 4. gcc supports the "ARM C Language Extensions", which include cache 
> control intrinsics:
> 
> <https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html>
> <https://developer.arm.com/documentation/ihi0053/latest/>
> 

I completely agree with David's comments about nested functions.  Don't 
do it!

Cleaning the D-caches from user-level on Arm is practically impossible 
if there is no "OS" support; flushing the I-cache is equally difficult. 
This includes m-profile devices with secure and non-secure code, where 
only secure code can execute the cache management operations.  The same 
is true for some, if not all, a-profile devices as well.

Looking at the compiler sources the __clear_cache builtin is only 
implemented for Linux and even there it calls the kernel to do the work.

ACLE does not define a clear cache intrinsic operation (as far as I can 
see).  It does provide some of the primitives needed for a cache clear, 
such as __dmb() and __isb(), but on their own, these are not enough.

CMSIS does appear to provide some primitives (SCB_CleanDCache_by_Addr 
and SCB_InvalidateICache_by_Addr), but these will directly invoke the 
relevent secure-mode primitives.  If you want them in non-secure mode, 
you'll need to export a suitable API from your secure code and then 
arrange to use that.  The compiler knows nothing about CMSIS, so this 
isn't much help for trampolines, I'm afraid.

Microcontroller SDK's are almost certain to face similar issues, since 
the root issue is the same: you can't do this from non-secure mode.

R.

> 
>>
>> But there also doesn't look to be a clean way to implement this: It 
>> appears
>> that this is done on an operating system basis, and when running bare 
>> metal
>> it is not clear where the code would live.
> 
> There is no "clean" way to handle the appropriate cache invalidation, 
> because there is no clean way to get the addresses you need for 
> invalidating the instruction cache.  (Cleanly invalidating the 
> instruction cache for other purposes, such as during firmware upgrades, 
> is no problem.)
> 
>>
>> There are also at least two approaches to solve it, I guess:
>> 1. Somehow indicate on the command line (via target or a dedicated 
>> option)
>> to emit the clear cache call for cortex M, and I guess that the function
>> itself should do nothing if both caches are disabled.
>> 2. Define hooks or provide a command line option so that developers can
>> provide an implementation for their platform?
>>
>> Assuming I were to do this the improper way (and just create a build that
>> works only for my particular target): Where should I define
>> CLEAR_INSN_CACHE?
>>
>> I am not sure if there is already a way to do all this that I am just
>> unaware of?
>>
> 
> Seriously - don't use nested functions in C.  Even if you get them 
> working, it would be painfully inefficient.  You'd have to flush parts 
> of the data cache (to make sure the stack data is written out to main 
> memory), taking time.  You'd then have to invalidate the relevant parts 
> of the instruction cache.  (Even calculating what parts of these caches 
> to clear will take time and effort.)  Then everything needs to be read 
> into the caches again to actually execute the function.
> 
> And what's the point?  So that you can write :
> 
>      void foo(...) {
>          int bar(...) {
>              ...
>          }
>          bar();
>      }
> 
> instead of
> 
>      static int foo_bar(...) {
>          ...
>      }
>      void foo(...) {
>          foo_bar();
>      }
> 
> or
> 
>      void foo(...) {
>          auto bar = [](...) {
>              ...
>          }
>          bar();
>      }
> 
> ?
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-27 17:23 ` David Brown
  2023-11-27 18:30   ` Richard Earnshaw
@ 2023-11-28  9:18   ` Ed Robbins
  1 sibling, 0 replies; 11+ messages in thread
From: Ed Robbins @ 2023-11-28  9:18 UTC (permalink / raw)
  To: David Brown, gcc-help

Hi David!

On Mon, 27 Nov 2023 at 17:23, David Brown <david.brown@hesbynett.no> wrote:
>
> On 27/11/2023 16:16, Ed Robbins via Gcc-help wrote:
> > Hello,
> > I am using gcc-arm-none-eabi with a cortex M7 device, with caches
> > (data/instruction) enabled. Nested function calls result in a usage fault
> > because there is no clear cache call for this platform.
> >
>
> I am not sure I understand you here.  Are you talking about trying to
> use gcc nested function extensions, implemented by trampolines (small
> function stubs on the stack)?  If so, then the simple answer is - don't
> do that.  It's a /really/ bad idea.  As far as I understand it, these
> are a left-over from the way nested functions were originally
> implemented in other gcc languages (Pascal, Ada, Modula-2), which now
> handle things differently and far more efficiently.  Trampolines were a
> convenient way to implement nested functions some 30 years ago, before
> caches were the norm, before anyone thought about security, before
> processors had prefetching, and before people realised what an
> appallingly bad idea self-modifying code is.
>
> If you want to use nested functions, use a language that supports nested
> functions, such as Ada, or use C++ with lambdas (which are a bit like
> nested functions only much better).

I am aware of the alternative languages and the implementation
limitations, but I didn't ask if it is a good idea or not!

>
> > Is there a way to provide the required functions without rebuilding gcc? I
> > have been looking at the source and, as far as I can tell, there is not.
>
> I can think of at least four ways :
>
> 1. The SDK for your microcontroller, provided by the manufacturer, will
> have headers with cache clear functions.
>
> 2. The ARM CMSIS headers - also available from your manufacturer - has
> intrinsic functions, including cache clear functions.
>
> 3. gcc has a generic "__buitin__clear_cache" function :
> <https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005f_005f_005fclear_005fcache>
>
> 4. gcc supports the "ARM C Language Extensions", which include cache
> control intrinsics:
>
> <https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html>
> <https://developer.arm.com/documentation/ihi0053/latest/>
>

I'm aware of how to perform a cache invalidate/clean, but that isn't
the question.

>
> >
> > But there also doesn't look to be a clean way to implement this: It appears
> > that this is done on an operating system basis, and when running bare metal
> > it is not clear where the code would live.
>
> There is no "clean" way to handle the appropriate cache invalidation,
> because there is no clean way to get the addresses you need for
> invalidating the instruction cache.

Yes, there is, and that is what my question is about. The gcc
documentation on trampolines [1] details this to some extent. It says that
the target should define CLEAR_INSN_CACHE(beg, end) and it will be
called with the correct addresses when the trampoline is initialised.
Looking at the gcc source, arm.cc calls
maybe_emit_call_builtin___clear_cache with the correct addresses when
it initialises the trampoline, and
maybe_emit_call_builtin___clear_cache calls __clear_cache if
CLEAR_INSN_CACHE is defined. The question is about where to place this
clear cache definition for a "none" target.

(Cleanly invalidating the
> instruction cache for other purposes, such as during firmware upgrades,
> is no problem.)

Yes, we do this already during firmware upgrades.

Best regards,
Ed

[1]  https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
       [not found] ` <8342aeef-4eef-231b-bf45-416660954fdb@marco.de>
@ 2023-11-28  9:51   ` Ed Robbins
  2023-11-28 18:00     ` David Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Ed Robbins @ 2023-11-28  9:51 UTC (permalink / raw)
  To: Matthias Pfaller, gcc-help

On Tue, 28 Nov 2023 at 07:21, Matthias Pfaller <leo@marco.de> wrote:
>
> On 2023-11-27 16:16, Ed Robbins via Gcc-help wrote:
> > Hello,
> > I am using gcc-arm-none-eabi with a cortex M7 device, with caches
> > (data/instruction) enabled. Nested function calls result in a usage fault
> > because there is no clear cache call for this platform.
> >
> > Is there a way to provide the required functions without rebuilding gcc? I
> > have been looking at the source and, as far as I can tell, there is not.
> >
> > But there also doesn't look to be a clean way to implement this: It appears
> > that this is done on an operating system basis, and when running bare metal
> > it is not clear where the code would live.
> >
> > There are also at least two approaches to solve it, I guess:
> > 1. Somehow indicate on the command line (via target or a dedicated option)
> > to emit the clear cache call for cortex M, and I guess that the function
> > itself should do nothing if both caches are disabled.
> > 2. Define hooks or provide a command line option so that developers can
> > provide an implementation for their platform?
> >
> > Assuming I were to do this the improper way (and just create a build that
> > works only for my particular target): Where should I define
> > CLEAR_INSN_CACHE?
> >
> > I am not sure if there is already a way to do all this that I am just
> > unaware of?
> >
> > Many thanks,
> > Ed Robbins
>
> I have lots of code with nested functions. When switching to gcc-12 I got random
> crashes on my cortex-m7 targets. In order to get that working again I had to patch
> gcc/config/arm/arm.h:
>
> --- ../orig/gcc/config/arm/arm.h
> +++ ./gcc/config/arm/arm.h
> @@ -2518,4 +2518,7 @@
>      representation for SHF_ARM_PURECODE in GCC.  */
>   #define SECTION_ARM_PURECODE SECTION_MACH_DEP
>
> +#if !defined CLEAR_INSN_CACHE
> +#define CLEAR_INSN_CACHE(BEG, END)
> +#endif
>   #endif /* ! GCC_ARM_H */
>
> In addition to that I have implemented __clear_cache in the attached file. Without
> the gcc patch __clear_cache is not getting called by gcc. All this will work only,
> when you operate the data cache in write through mode. But when you are using dma,
> this is pretty much mandatory because otherwise you have to maintain dma-memory
> completely separate from your data/bss segment. I think write back is pretty much
> useless on the m7. We are using an atsame70.
>

Thanks very much Mathias! This is roughly what I thought was required,
and should enable me to get nested functions working.

I think that you can use DMA with write back caching if you flush the
data cache before starting DMA (if the source is memory) and
invalidate the data cache when DMA is finished (if the destination is
memory). Of course, this has a performance impact...

I think that it would be good if gcc provided a way to handle this for
bare metal targets where it is relevant, perhaps by giving an option
that names the clear_cache function (-mclear-cache-func=f, or
something). I believe the MIPS target has -mflush-func= for this or a
similar purpose.

We are also using atsame70, as well as s70/v70/v71 (when chips were
scarce we re-designed the product to use an LCC plate so we can use
any of these in any package, as long as they have sufficient
RAM/flash).

Best regards,
Ed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-28  9:51   ` Ed Robbins
@ 2023-11-28 18:00     ` David Brown
  2023-11-29  7:50       ` Matthias Pfaller
  2023-12-05 17:10       ` Ed Robbins
  0 siblings, 2 replies; 11+ messages in thread
From: David Brown @ 2023-11-28 18:00 UTC (permalink / raw)
  To: edd.robbins, Matthias Pfaller, gcc-help; +Cc: Ed Robbins

On 28/11/2023 10:51, Ed Robbins via Gcc-help wrote:
> On Tue, 28 Nov 2023 at 07:21, Matthias Pfaller <leo@marco.de> wrote:
>>
>> On 2023-11-27 16:16, Ed Robbins via Gcc-help wrote:
>>> Hello,
>>> I am using gcc-arm-none-eabi with a cortex M7 device, with caches
>>> (data/instruction) enabled. Nested function calls result in a usage fault
>>> because there is no clear cache call for this platform.
>>>

>> I have lots of code with nested functions. When switching to gcc-12 I got random
>> crashes on my cortex-m7 targets. In order to get that working again I had to patch
>> gcc/config/arm/arm.h:
>>


Can I ask (either or both of you) why you are using are using nested 
functions like this?  This is possibly the first time I have heard of 
anyone using them, certainly the first time in embedded development. 
Even when I programmed in Pascal, where nested functions are part of the 
language, I did not use them more than a couple of times.

What benefit do you see in nested functions in C, compared to having 
separate functions?  Have you considered moving to C++ and using 
lambdas, which are more flexible, standard, and can be very much more 
efficient?

This is, of course, straying from the topicality of this mailing list. 
But I am curious, and I doubt if I am the only one who is.

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-28 18:00     ` David Brown
@ 2023-11-29  7:50       ` Matthias Pfaller
  2023-11-29 11:52         ` David Brown
  2023-12-05 17:10       ` Ed Robbins
  1 sibling, 1 reply; 11+ messages in thread
From: Matthias Pfaller @ 2023-11-29  7:50 UTC (permalink / raw)
  To: David Brown, edd.robbins, gcc-help; +Cc: Ed Robbins

On 2023-11-28 19:00, David Brown wrote:
 > Can I ask (either or both of you) why you are using are using nested functions like
 > this?  This is possibly the first time I have heard of anyone using them, certainly
 > the first time in embedded development. Even when I programmed in Pascal, where
 > nested functions are part of the language, I did not use them more than a couple of
 > times.
 >
 > What benefit do you see in nested functions in C, compared to having separate
 > functions?  Have you considered moving to C++ and using lambdas, which are more
 > flexible, standard, and can be very much more efficient?
 >
 > This is, of course, straying from the topicality of this mailing list. But I am
 > curious, and I doubt if I am the only one who is.

- I'm maintaining our token threaded forth interpreter. In the inner loop there is a 
absurdly big switch for the primitives. I'm loading rp, sp and tos into local 
variables. pushing, popping  and memory access is done by nested functions (checking 
for stack over and under flows, managing tos, access violations, ...). Of course that 
could be done by macros. But when I'm calling C-functions from within the switch I'll 
sometimes pass pointers to the local functions (e.g. for catch/throw exception handling).

- When calling list iterators, I'm sometimes passing references to nested functions

- When locking is necessary and the function has multiple return points I'm doing 
something like:

void somefunction(void)
{
   void f(void)
   {
      ...
   }
   lock();
   f();
   unlock();
}

I know, in a lot of cases I could just define some outer static function or use 
gotos. But to my eye it just looks nicer that way. In most cases there will be no 
trampoline necessary anyway. Its not used that often and we could probably get rid of 
it in most cases by using macros and ({ ... }).

regards, Matthias



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-29  7:50       ` Matthias Pfaller
@ 2023-11-29 11:52         ` David Brown
  2023-11-29 12:33           ` Matthias Pfaller
  0 siblings, 1 reply; 11+ messages in thread
From: David Brown @ 2023-11-29 11:52 UTC (permalink / raw)
  To: Matthias Pfaller, edd.robbins, gcc-help; +Cc: Ed Robbins

On 29/11/2023 08:50, Matthias Pfaller wrote:
> On 2023-11-28 19:00, David Brown wrote:
>  > Can I ask (either or both of you) why you are using are using nested 
> functions like
>  > this?  This is possibly the first time I have heard of anyone using 
> them, certainly
>  > the first time in embedded development. Even when I programmed in 
> Pascal, where
>  > nested functions are part of the language, I did not use them more 
> than a couple of
>  > times.
>  >
>  > What benefit do you see in nested functions in C, compared to having 
> separate
>  > functions?  Have you considered moving to C++ and using lambdas, 
> which are more
>  > flexible, standard, and can be very much more efficient?
>  >
>  > This is, of course, straying from the topicality of this mailing 
> list. But I am
>  > curious, and I doubt if I am the only one who is.
> 
> - I'm maintaining our token threaded forth interpreter. In the inner 
> loop there is a absurdly big switch for the primitives. I'm loading rp, 
> sp and tos into local variables. pushing, popping  and memory access is 
> done by nested functions (checking for stack over and under flows, 
> managing tos, access violations, ...). Of course that could be done by 
> macros. But when I'm calling C-functions from within the switch I'll 
> sometimes pass pointers to the local functions (e.g. for catch/throw 
> exception handling).
> 
> - When calling list iterators, I'm sometimes passing references to 
> nested functions
> 
> - When locking is necessary and the function has multiple return points 
> I'm doing something like:
> 
> void somefunction(void)
> {
>    void f(void)
>    {
>       ...
>    }
>    lock();
>    f();
>    unlock();
> }
> 
> I know, in a lot of cases I could just define some outer static function 
> or use gotos. But to my eye it just looks nicer that way. In most cases 
> there will be no trampoline necessary anyway. Its not used that often 
> and we could probably get rid of it in most cases by using macros and ({ 
> ... }).
> 

Thanks for that.

I can appreciate that local functions can look nicer than macros or goto 
spaghetti.  In simple cases (which is probably the majority for your 
usage), the local functions will be inlined and will give pretty much 
exactly the same code as you'd get for macros, outer static functions, 
or other methods.  But I'd be very unhappy to see trampolines here, as 
you will need for more complicated cases.  The overheads are not 
something you'd want to see in the inner loop of an interpreter.

AFAIUI, the reason the compiler has to generate trampolines here is to 
make a function that has access to some of the local variables, while 
being shoe-horned into the appearance of a function with parameters that 
don't include any extra values or references.  If you were, as an 
alternative, to switch to C++ and use lambdas instead of nested 
functions that all disappears precisely because lambdas do not have to 
be forced to match the function signature - the generated lambda can 
take extra hidden parameters (and even extra hidden state) as needed.

Of course it's never easy to change these kinds of things in existing 
code.  And it is particularly difficult to get solutions that work 
efficiently on a wide range of compilers or versions.

David




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-29 11:52         ` David Brown
@ 2023-11-29 12:33           ` Matthias Pfaller
  0 siblings, 0 replies; 11+ messages in thread
From: Matthias Pfaller @ 2023-11-29 12:33 UTC (permalink / raw)
  To: David Brown, edd.robbins, gcc-help; +Cc: Ed Robbins

On 2023-11-29 12:52, David Brown wrote:
> On 29/11/2023 08:50, Matthias Pfaller wrote:
>> On 2023-11-28 19:00, David Brown wrote:
>>  > Can I ask (either or both of you) why you are using are using nested functions like
>>  > this?  This is possibly the first time I have heard of anyone using them, certainly
>>  > the first time in embedded development. Even when I programmed in Pascal, where
>>  > nested functions are part of the language, I did not use them more than a couple of
>>  > times.
>>  >
>>  > What benefit do you see in nested functions in C, compared to having separate
>>  > functions?  Have you considered moving to C++ and using lambdas, which are more
>>  > flexible, standard, and can be very much more efficient?
>>  >
>>  > This is, of course, straying from the topicality of this mailing list. But I am
>>  > curious, and I doubt if I am the only one who is.
>>
>> - I'm maintaining our token threaded forth interpreter. In the inner loop there is 
>> a absurdly big switch for the primitives. I'm loading rp, sp and tos into local 
>> variables. pushing, popping  and memory access is done by nested functions 
>> (checking for stack over and under flows, managing tos, access violations, ...). Of 
>> course that could be done by macros. But when I'm calling C-functions from within 
>> the switch I'll sometimes pass pointers to the local functions (e.g. for 
>> catch/throw exception handling).
>>
>> - When calling list iterators, I'm sometimes passing references to nested functions
>>
>> - When locking is necessary and the function has multiple return points I'm doing 
>> something like:
>>
>> void somefunction(void)
>> {
>>    void f(void)
>>    {
>>       ...
>>    }
>>    lock();
>>    f();
>>    unlock();
>> }
>>
>> I know, in a lot of cases I could just define some outer static function or use 
>> gotos. But to my eye it just looks nicer that way. In most cases there will be no 
>> trampoline necessary anyway. Its not used that often and we could probably get rid 
>> of it in most cases by using macros and ({ ... }).
>>
> 
> Thanks for that.
> 
> I can appreciate that local functions can look nicer than macros or goto spaghetti.  
> In simple cases (which is probably the majority for your usage), the local functions 
> will be inlined and will give pretty much exactly the same code as you'd get for 
> macros, outer static functions, or other methods.  But I'd be very unhappy to see 
> trampolines here, as you will need for more complicated cases.  The overheads are not 
> something you'd want to see in the inner loop of an interpreter.
> 
> AFAIUI, the reason the compiler has to generate trampolines here is to make a 
> function that has access to some of the local variables, while being shoe-horned into 
> the appearance of a function with parameters that don't include any extra values or 
> references.  If you were, as an alternative, to switch to C++ and use lambdas instead 
> of nested functions that all disappears precisely because lambdas do not have to be 
> forced to match the function signature - the generated lambda can take extra hidden 
> parameters (and even extra hidden state) as needed.
> 
> Of course it's never easy to change these kinds of things in existing code.  And it 
> is particularly difficult to get solutions that work efficiently on a wide range of 
> compilers or versions.
> 
> David

We are using (at the moment) two micro controllers with cache. The at91sam4e is a 
cortex-m4 device with two kilobytes of unified i/d-cache. Because of this cache must 
only be considered when using DMA.

The atsame7x/atsamv7x series is a cortex-m7 device with 16k i-cache and 16k d-cache. 
Here you have to worry about i-cache invalidates. Evicting a single i-cache (and the 
trampoline code is small) doesn't hurt too much. Especially if it happens very seldom 
(its not like every function passes pointers to nested functions...).

Besides that @300MHz (or @120MHz for the cortex-m4) the core is more than fast enough 
for our applications. The 384k of RAM on the at91sam[ev]7x and the 128k of RAM on the 
at91sam4e are a lot more of a hindrance...

I'm aware of the reason for the trampoline code and because of this I know that (as 
you wrote) in the majority of the cases trampoline code is not needed (because no 
outer arguments or variables are referenced and there is no passing of function 
pointers to other functions).

In the cases where the trampoline code is needed I'm willing to take the performance 
hit in exchange for the gain I get.

e.g. in the example with the interpreter inner loop I would need to pass along all 
kinds of state every time I call an external function needing access to local 
interpreter state. If I pass just a pointer to a callback function (that will then 
access local state) there is much less opportunity for errors... In most of the cases 
passing the callback is not necessary anyway.

Matthias


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-11-28 18:00     ` David Brown
  2023-11-29  7:50       ` Matthias Pfaller
@ 2023-12-05 17:10       ` Ed Robbins
  2023-12-05 18:40         ` Ed Robbins
  1 sibling, 1 reply; 11+ messages in thread
From: Ed Robbins @ 2023-12-05 17:10 UTC (permalink / raw)
  To: David Brown; +Cc: Matthias Pfaller, gcc-help

On Tue, 28 Nov 2023 at 18:00, David Brown <david.brown@hesbynett.no> wrote:
>
> On 28/11/2023 10:51, Ed Robbins via Gcc-help wrote:
> > On Tue, 28 Nov 2023 at 07:21, Matthias Pfaller <leo@marco.de> wrote:
> >>
> >> On 2023-11-27 16:16, Ed Robbins via Gcc-help wrote:
> >>> Hello,
> >>> I am using gcc-arm-none-eabi with a cortex M7 device, with caches
> >>> (data/instruction) enabled. Nested function calls result in a usage fault
> >>> because there is no clear cache call for this platform.
> >>>
>
> >> I have lots of code with nested functions. When switching to gcc-12 I got random
> >> crashes on my cortex-m7 targets. In order to get that working again I had to patch
> >> gcc/config/arm/arm.h:
> >>
>
>
> Can I ask (either or both of you) why you are using are using nested
> functions like this?  This is possibly the first time I have heard of
> anyone using them, certainly the first time in embedded development.
> Even when I programmed in Pascal, where nested functions are part of the
> language, I did not use them more than a couple of times.
>
> What benefit do you see in nested functions in C, compared to having
> separate functions?  Have you considered moving to C++ and using
> lambdas, which are more flexible, standard, and can be very much more
> efficient?
>
> This is, of course, straying from the topicality of this mailing list.
> But I am curious, and I doubt if I am the only one who is.
>
> David
>

In the simplest case you may want to do something like:

struct sometype* find_thing_with_value(struct list *things, uint32_t value) {
    bool match(void *thing) {
        return ((struct sometype*)thing)->field == value;
    }
    return (struct sometype*)list_find(things, &match);
}

In general though dependency inversion can be quite powerful. We use
nested functions that do not access local variables fairly frequently
in our codebase, mostly as a scoping mechanism. We would like to be
able to access locals, and are aware of the implications, but only
very recently decided to address the issue.

I think that rather than switch to C++ lambdas we would be more
inclined to move to clang blocks and stick with C. But there is no
impetus for either move: Nested functions are just fine for what we
need to do, and the syntax is very clean. We have thoughts about a HLL
shift but it wouldn't be to C++.

Security implications are not an issue for us because there is no way
for an outsider to communicate with the devices, let alone get a stack
overflow and code execution.

I think that it would be sensible if there was a -mflush_func option
for ARM targets, as for MIPS, which would resolve the caching issues
that require a patched gcc build in this case.

Thanks to Matthias for guidance. I have ended up implementing this a
bit differently, as in the patch below, as it's then totally
self-contained. I've tested by rebuilding the toolchain using the
script from [1] and it is working well.

Best regards,
Ed

[1] https://github.com/FreddieChopin/bleeding-edge-toolchain/tree/master

From 569177e29cdc777e84e5e8cf633ebef761e82f83 Mon Sep 17 00:00:00 2001
From: Ed Robbins <edd.robbins@gmail.com>
Date: Tue, 5 Dec 2023 16:56:36 +0000
Subject: [PATCH] Define CLEAR_INSN_CACHE in arm.h and implement for v7em
 targets

---
 gcc/config/arm/arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f479540812a..666d98611bc 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2518,4 +2518,37 @@ const char *arm_be8_option (int argc, const char **argv);
    representation for SHF_ARM_PURECODE in GCC.  */
 #define SECTION_ARM_PURECODE SECTION_MACH_DEP

+#ifndef CLEAR_INSN_CACHE
+/* When defined CLEAR_INSN_CACHE is called by __clear_cache in libgcc/libgcc2.c
+   It needs to always be _defined_, otherwise
maybe_emit_call_builtin___clear_cache
+   from gcc/builtins.cc will not generate a call to __clear_cache, however we
+   only want it _implemented_ for the multilib version of libgcc.a built for
+   v7em targets. */
+#ifdef __ARM_ARCH_7EM__
+#define CLEAR_INSN_CACHE(BEG, END) \
+ { \
+ const void *scb_base = (const void*)0xe000e000; \
+ const unsigned dccmvac_offset = 0x268; \
+ const unsigned icimvau_offset = 0x258; \
+ const unsigned cache_line_size = 32; \
+ void *addr = (void*)((unsigned)BEG & ~(cache_line_size - 1)); \
+ void *end = (void*)((unsigned)(END + cache_line_size - 1) &
~(cache_line_size - 1)); \
+ __asm__ __volatile__("dsb" : : : "memory"); \
+ while (addr < end) { \
+ *(unsigned**)(scb_base + dccmvac_offset) = addr; \
+ addr += cache_line_size; \
+ } \
+ __asm__ __volatile__("dsb; isb" : : : "memory"); \
+ addr = (void*)((unsigned)BEG & ~(cache_line_size - 1)); \
+ while (addr < end) { \
+ *(unsigned**)(scb_base + icimvau_offset) = addr; \
+ addr += cache_line_size; \
+ } \
+ __asm__ __volatile__("dsb; isb" : : : "memory"); \
+ }
+#else
+#define CLEAR_INSN_CACHE(BEG, END) ;
+#endif
+#endif
+
 #endif /* ! GCC_ARM_H */
-- 
2.34.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arm-none-eabi, nested function trampolines and caching
  2023-12-05 17:10       ` Ed Robbins
@ 2023-12-05 18:40         ` Ed Robbins
  0 siblings, 0 replies; 11+ messages in thread
From: Ed Robbins @ 2023-12-05 18:40 UTC (permalink / raw)
  To: David Brown; +Cc: Matthias Pfaller, gcc-help

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

On Tue, 5 Dec 2023 at 17:10, Ed Robbins <edd.robbins@googlemail.com> wrote:
>
> On Tue, 28 Nov 2023 at 18:00, David Brown <david.brown@hesbynett.no> wrote:
> >
> > On 28/11/2023 10:51, Ed Robbins via Gcc-help wrote:
> > > On Tue, 28 Nov 2023 at 07:21, Matthias Pfaller <leo@marco.de> wrote:
> > >>
> > >> On 2023-11-27 16:16, Ed Robbins via Gcc-help wrote:
> > >>> Hello,
> > >>> I am using gcc-arm-none-eabi with a cortex M7 device, with caches
> > >>> (data/instruction) enabled. Nested function calls result in a usage fault
> > >>> because there is no clear cache call for this platform.
> > >>>
> >
> > >> I have lots of code with nested functions. When switching to gcc-12 I got random
> > >> crashes on my cortex-m7 targets. In order to get that working again I had to patch
> > >> gcc/config/arm/arm.h:
> > >>
> >
> >
> > Can I ask (either or both of you) why you are using are using nested
> > functions like this?  This is possibly the first time I have heard of
> > anyone using them, certainly the first time in embedded development.
> > Even when I programmed in Pascal, where nested functions are part of the
> > language, I did not use them more than a couple of times.
> >
> > What benefit do you see in nested functions in C, compared to having
> > separate functions?  Have you considered moving to C++ and using
> > lambdas, which are more flexible, standard, and can be very much more
> > efficient?
> >
> > This is, of course, straying from the topicality of this mailing list.
> > But I am curious, and I doubt if I am the only one who is.
> >
> > David
> >
>
> In the simplest case you may want to do something like:
>
> struct sometype* find_thing_with_value(struct list *things, uint32_t value) {
>     bool match(void *thing) {
>         return ((struct sometype*)thing)->field == value;
>     }
>     return (struct sometype*)list_find(things, &match);
> }
>
> In general though dependency inversion can be quite powerful. We use
> nested functions that do not access local variables fairly frequently
> in our codebase, mostly as a scoping mechanism. We would like to be
> able to access locals, and are aware of the implications, but only
> very recently decided to address the issue.
>
> I think that rather than switch to C++ lambdas we would be more
> inclined to move to clang blocks and stick with C. But there is no
> impetus for either move: Nested functions are just fine for what we
> need to do, and the syntax is very clean. We have thoughts about a HLL
> shift but it wouldn't be to C++.
>
> Security implications are not an issue for us because there is no way
> for an outsider to communicate with the devices, let alone get a stack
> overflow and code execution.
>
> I think that it would be sensible if there was a -mflush_func option
> for ARM targets, as for MIPS, which would resolve the caching issues
> that require a patched gcc build in this case.
>
> Thanks to Matthias for guidance. I have ended up implementing this a
> bit differently, as in the patch below, as it's then totally
> self-contained. I've tested by rebuilding the toolchain using the
> script from [1] and it is working well.
>

I guess I should have attached that instead of pasting it... second time lucky

Ed

[-- Attachment #2: 0001-Define-CLEAR_INSN_CACHE-in-arm.h-and-implement-for-v.patch --]
[-- Type: text/x-patch, Size: 2003 bytes --]

From 569177e29cdc777e84e5e8cf633ebef761e82f83 Mon Sep 17 00:00:00 2001
From: Ed Robbins <edd.robbins@gmail.com>
Date: Tue, 5 Dec 2023 16:56:36 +0000
Subject: [PATCH] Define CLEAR_INSN_CACHE in arm.h and implement for v7em
 targets

---
 gcc/config/arm/arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f479540812a..666d98611bc 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2518,4 +2518,37 @@ const char *arm_be8_option (int argc, const char **argv);
    representation for SHF_ARM_PURECODE in GCC.  */
 #define SECTION_ARM_PURECODE SECTION_MACH_DEP
 
+#ifndef CLEAR_INSN_CACHE
+/* When defined CLEAR_INSN_CACHE is called by __clear_cache in libgcc/libgcc2.c
+   It needs to always be _defined_, otherwise maybe_emit_call_builtin___clear_cache
+   from gcc/builtins.cc will not generate a call to __clear_cache, however we
+   only want it _implemented_ for the multilib version of libgcc.a built for
+   v7em targets. */
+#ifdef __ARM_ARCH_7EM__
+#define CLEAR_INSN_CACHE(BEG, END) \
+	{ \
+		const void *scb_base = (const void*)0xe000e000; \
+		const unsigned dccmvac_offset = 0x268; \
+		const unsigned icimvau_offset = 0x258; \
+		const unsigned cache_line_size = 32; \
+		void *addr = (void*)((unsigned)BEG & ~(cache_line_size - 1)); \
+		void *end = (void*)((unsigned)(END + cache_line_size - 1) & ~(cache_line_size - 1)); \
+		__asm__ __volatile__("dsb" : : : "memory"); \
+		while (addr < end) { \
+			*(unsigned**)(scb_base + dccmvac_offset) = addr; \
+			addr += cache_line_size; \
+		} \
+		__asm__ __volatile__("dsb; isb" : : : "memory"); \
+		addr = (void*)((unsigned)BEG & ~(cache_line_size - 1)); \
+		while (addr < end) { \
+			*(unsigned**)(scb_base + icimvau_offset) = addr; \
+			addr += cache_line_size; \
+		} \
+		__asm__ __volatile__("dsb; isb" : : : "memory"); \
+	}
+#else
+#define CLEAR_INSN_CACHE(BEG, END) ;
+#endif
+#endif
+
 #endif /* ! GCC_ARM_H */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-12-05 18:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-27 15:16 arm-none-eabi, nested function trampolines and caching Ed Robbins
2023-11-27 17:23 ` David Brown
2023-11-27 18:30   ` Richard Earnshaw
2023-11-28  9:18   ` Ed Robbins
     [not found] ` <8342aeef-4eef-231b-bf45-416660954fdb@marco.de>
2023-11-28  9:51   ` Ed Robbins
2023-11-28 18:00     ` David Brown
2023-11-29  7:50       ` Matthias Pfaller
2023-11-29 11:52         ` David Brown
2023-11-29 12:33           ` Matthias Pfaller
2023-12-05 17:10       ` Ed Robbins
2023-12-05 18:40         ` Ed Robbins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).