[ECOS] eCos FP support suggestions.

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] eCos FP support suggestions.
@ 1999-09-27  5:44 Sergei Organov
  1999-10-08  4:15 ` Nick Garnett
  0 siblings, 1 reply; 5+ messages in thread
From: Sergei Organov @ 1999-09-27  5:44 UTC (permalink / raw)
  To: ecos-discuss

Hello,

Here are my thoughts about floating point support implementation for
eCos. Your objections, comments and suggestions are welcome.

Support for three different configurations of FP context handling are
required:

1. All tasks are floating point.
2. Some tasks are floating point, immediate FP context switch.
3. Some tasks are floating point, lazy (deferred) FP context switch.

The following HAL macroses will be affected:
  HAL_THREAD_INIT_CONTEXT( _sparg_, _thread_, _entry_, _id_ )
  HAL_THREAD_SWITCH_CONTEXT(_fspptr_,_tspptr_)
  HAL_THREAD_LOAD_CONTEXT(_tspptr_)

Additional arguments are required to pass information about FP
context. We may either add parameters to existing macroses and let
non-FP HALs ignore them, or define parallel set of macroses that will
contain the parameters and, if they are defined by HAL and FP support
is enabled, will be used by kernel instead of non-FP-aware macroses.
Don't know which way is better. The latter way is used below.

To illustrate the idea, HAL_THREAD_SWITCH_CONTEXT_EXTENDED macro is
described as well as changes to 'Cyg_Scheduler::unlock_inner()'.

HAL_THREAD_SWITCH_CONTEXT_EXTENDED arguments (current task is called
t1, next task is called t2):

CYG_ADDRESS t1_sp_ptr
  The same as in HAL_THREAD_SWITCH_CONTEXT.

CYG_ADDRESS t2_sp_ptr
  The same as in HAL_THREAD_SWITCH_CONTEXT.

cyg_bool    t1_is_fp
  'true' if space for FP is required in context of t1. Defines how
  much to move '*t1_sp_ptr'. Allows HAL to don't allocate space for FP
  for non-FP tasks. HAL is allowed to ignore this and always allocate
  space for FP in the context.

cyg_bool    t2_is_fp
  'true' if space for FP was allocated in context of t2. Defines how
  much to move '*t2_sp_ptr'. If HAL doesn't allocate space for FP for
  non-FP tasks, this provides information that allows HAL to don't
  store FP attribute in the task context. If HAL allocates space for
  FP even for non-FP tasks, this argument is useless.

CYG_ADDRESS save_fp_sp_ptr
  Address of stack pointer to save FP context. Could be equal to
  't1_sp_ptr' (!). Zero if we don't need to save FP context.

cyg_bool    load_fp
  If FP context should be actually loaded from t2.

Changes to 'Cyg_Scheduler::unlock_inner()'.

'fp_context_owner_thread' is pointer to the thread that owns current
FP context. It is static member of 'Cyg_Scheduler_Base' class
initialized to zero. It will be then set by 'Cyg_Scheduler::start()'
if first task to execute is FP, or later by first switch to task that
has FP context.

'is_fp' is boolean attribute - non-static const member of
'Cyg_HardwareThread' class. I'd suggest to initialize it by adding
another constructor to both 'Cyg_HardwareThread' and 'Cyg_Thread'
classes where 'is_fp' argument is inserted before 'stack_size' and
'stack_base'.

Current code from 'unlock_inner()':

  // Switch contexts
  HAL_THREAD_SWITCH_CONTEXT( &current->stack_ptr, &next->stack_ptr );

should be changed to:

  // Switch contexts

#if !defined(HAL_THREAD_SWITCH_CONTEXT_EXTENDED)

  HAL_THREAD_SWITCH_CONTEXT( &current->stack_ptr, &next->stack_ptr );

#else  // defined(HAL_THREAD_SWITCH_CONTEXT_EXTENDED)

#  if !defined(CYG_KERNEL_FP_SUPPORT)

  HAL_THREAD_SWITCH_CONTEXT_EXTENDED(
    &current->stack_ptr,
    &next->stack_ptr,
    false,
    false,
    0,
    false);

#  elif defined(CYGIMP_FP_LAZY_CONTEXT_SWITCH)

  Cyg_Thread* fp_context_owner = fp_context_owner_thread;
  cyg_bool load_fp = next->is_fp && next != fp_context_owner;
  if(load_fp)
    fp_context_owner_thread = next;
  HAL_THREAD_SWITCH_CONTEXT_EXTENDED(
    &current->stack_ptr,
    &next->stack_ptr,
    current->is_fp,
    next->is_fp,
    (fp_context_owner && next->is_fp) ? &fp_context_owner->stack_ptr : 0,
    load_fp);

#  elif defined(CYGIMP_FP_ALL_TASKS)

  HAL_THREAD_SWITCH_CONTEXT_EXTENDED(
    &current->stack_ptr,
    &next->stack_ptr,
    true,
    true,
    &current->stack_ptr,
    true);

#  else

  HAL_THREAD_SWITCH_CONTEXT_EXTENDED(
    &current->stack_ptr,
    &next->stack_ptr,
    current->is_fp,
    next->is_fp,
    current->is_fp ? &current->stack_ptr : 0,
    next->is_fp);

#  endif

#endif // defined(HAL_THREAD_SWITCH_CONTEXT_EXTENDED)

Well, seems to be enough for the first time.

Regards,
Sergei.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] eCos FP support suggestions.
  1999-09-27  5:44 [ECOS] eCos FP support suggestions Sergei Organov
@ 1999-10-08  4:15 ` Nick Garnett
  1999-10-11  7:58   ` Sergei Organov
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Garnett @ 1999-10-08  4:15 UTC (permalink / raw)
  To: ecos-discuss

Sergei Organov <osv@Javad.RU> writes:

> Hello,
> 
> Here are my thoughts about floating point support implementation for
> eCos. Your objections, comments and suggestions are welcome.
> 
> Support for three different configurations of FP context handling are
> required:
> 
> 1. All tasks are floating point.
> 2. Some tasks are floating point, immediate FP context switch.
> 3. Some tasks are floating point, lazy (deferred) FP context switch.
> 

[Details snipped]

Sergei,

I have finally got around to taking a look at this, I've been busy on
more urgent things for the last week or so.

Your scheme does not correspond to the way in which I intended to
implement FP handling. In particular, I want to keep the FP stuff
entirely in the HAL and not make any changes to the kernel or HAL APIs
at all, which I do not believe are necessary. We also have to make
sure that the right things happen during interrupt and exception
handling and for debugging.

The options for FP support that I want to see are:

1. All threads are FP, with full save/restore on context switch.

2. Threads are non-FP until first use, then they do a full
   save/restore each context switch.

3. Threads are non-FP until first use, then the FP context is
   saved/restored lazily as necessary.

I think these correspond to yours options.

For all options we need to extend the HAL_SavedRegisters structure to
contain the FP state. Although for option 3 this may consist of
a pointer to a second structure allocated elsewhere on the stack.

Option 1 is easily implemented by adding code to the context switch,
interrupt and exception state save/restore code. This has already been
implemented in the MIPS HAL.

Option 2 requires an extra flag to be added to the HAL_SavedRegisters
structure to indicate whether the thread has a valid FP state. This is
set false on initialization and the FPU disabled whenever the thread
is switched to. If the thread performs an FP operation the FP
exception handler sets the flag true and the FPU is initialized.
Subsequently, when the thread is switched out, the flag is checked and
the FPU context saved. Similarly the FPU context will be restored when
the thread is reloaded. Optimizations can be added to this to allow
the HAL to avoid allocating the FPU save area if the thread does not
do FP operations.

Option 3 presents something of a problem. Notionally it is a
development of option 2 where the actual FPU state swap is handled in
the FP exception handler only if necessary. However, here's the
problem: by the time we have decided to load the FPU state from the
current thread, we will have destroyed it, since it is simply stored
on the stack as part of the CPU state we have already loaded.

A solution to this is to allocate a per-thread "static" FP save area
at the base of the stack, since it must persist after the rest of the
thread's CPU state is loaded. However, this would prevent us using FP
in exception or interrupt routines: an unreasonable and unenforceable
restriction (although we could make that a configuration option if the
user is prepared to exert the correct level of self-restraint).

A development of this idea, and probably the right way to do it, is to
have an initial "static" save area that is used during normal thread
switching, and to allocate a new FPU save area during exception and
interrupt handling. These are then chained together and pointed to by
a field in the HAL_SavedRegisters, which is used to maintain a HAL
level pointer to the current thread's FPU save area. An additional
pointer will then point to the save area for the current FPU contents
owner. A thread context switch involves disabling the FPU, saving the
current thread's FPU context pointer into the CPU save state and
loading the pointer from the next thread. If the new thread performs
an FPU operation, the exception handler saves the FPU context to the
save area pointed to by the FPU owner pointer, loads the FPU context
from the current thread's context pointer and copies it to the FPU
context owner pointer. Exceptions and interrupts must also disable the
FPU, and in addition will create a new FPU save area which replaces
the current thread's save area pointer. Any FP operations will then
cause a fresh FPU context to be created, rather than use the current
thread's existing context. Exception or interrupt return will just
cause the new context to be destroyed and the original current
thread's pointer to be restored.

The only addition we will need to the HAL API will be a new macro,
HAL_CPU_FLUSH_CONTEXT() or something similar, to force the FPU
contents out to the save area for the benefit of debugging.

I intend to experimentally implement these mechanisms in the MIPS HAL,
to check that it all works properly, although I don't know exactly
when I will find the time to do this. In the meantime I suggest that
you try implementing Option 1 only, which will allow you to at least
use the FPU, although at slightly reduced performance. Once I am happy
that I have not missed any subtleties in the MIPS HAL, you can then
use that as a model to implement the PowerPC version. That way we will
hopefully get a uniform implementation across all HALs.

Suggestions, criticism and comments are of course welcome.

-- 
Nick Garnett           mailto:nickg@cygnus.co.uk
Cygnus Solutions, UK   http://www.cygnus.co.uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] eCos FP support suggestions.
  1999-10-08  4:15 ` Nick Garnett
@ 1999-10-11  7:58   ` Sergei Organov
  1999-10-27  8:15     ` Nick Garnett
  0 siblings, 1 reply; 5+ messages in thread
From: Sergei Organov @ 1999-10-11  7:58 UTC (permalink / raw)
  To: Nick Garnett; +Cc: ecos-discuss

Nick,

I already heard before from Bart about the idea to decide if thread
needs FP context by handling "FP unavailable" exception and thus
don't add additional support in the HAL interface. This approach has
its drawbacks, and I think that it should be taken only if (even
backward compatible) changes in HAL API are strictly prohibited.

I believe that it could be implemented in the way you described.
However, here are things that bother me in this approach:

1. What to do if architecture appears where "FP enable bit" just
doesn't exist and thus there is no way to get exception on first FP
instruction?

2. Anyway it'd be fine to have a way to define non-FP task
explicitly (and get "FP not available" exception if FP operation is
used). It will also allow to don't have "static" FP area at the base
of the stack for such light-weight task.

3. Porting of FP support to new targets seems to be more difficult
with this approach, because all common logic (that is in turn more
complex) is to be implemented in the HAL instead of kernel.

4. Potentially time-consuming operations of handling exception and
initializing of FP context occurs at hardly predictable time moments
(when first FP instruction is executed) instead of well defined moment 
of task creation.

5. Are there any benefits of this approach besides unchanged HAL
interface? Programmer doesn't need to decide if particular task needs
FP context. What else?

Well, you asked for criticism :-) What do you think?

Regards,
Sergei.

Nick Garnett <nickg@cygnus.co.uk> writes:
> Sergei Organov <osv@Javad.RU> writes:
> 
> > Hello,
> > 
> > Here are my thoughts about floating point support implementation for
> > eCos. Your objections, comments and suggestions are welcome.
> > 
> > Support for three different configurations of FP context handling are
> > required:
> > 
> > 1. All tasks are floating point.
> > 2. Some tasks are floating point, immediate FP context switch.
> > 3. Some tasks are floating point, lazy (deferred) FP context switch.
> > 
> 
> [Details snipped]
> 
> 
> Sergei,
> 
> I have finally got around to taking a look at this, I've been busy on
> more urgent things for the last week or so.
> 
> Your scheme does not correspond to the way in which I intended to
> implement FP handling. In particular, I want to keep the FP stuff
> entirely in the HAL and not make any changes to the kernel or HAL APIs
> at all, which I do not believe are necessary. We also have to make
> sure that the right things happen during interrupt and exception
> handling and for debugging.
> 
> The options for FP support that I want to see are:
> 
> 1. All threads are FP, with full save/restore on context switch.
> 
> 2. Threads are non-FP until first use, then they do a full
>    save/restore each context switch.
> 
> 3. Threads are non-FP until first use, then the FP context is
>    saved/restored lazily as necessary.
> 
> I think these correspond to yours options.
> 
> For all options we need to extend the HAL_SavedRegisters structure to
> contain the FP state. Although for option 3 this may consist of
> a pointer to a second structure allocated elsewhere on the stack.
> 
> Option 1 is easily implemented by adding code to the context switch,
> interrupt and exception state save/restore code. This has already been
> implemented in the MIPS HAL.
> 
> Option 2 requires an extra flag to be added to the HAL_SavedRegisters
> structure to indicate whether the thread has a valid FP state. This is
> set false on initialization and the FPU disabled whenever the thread
> is switched to. If the thread performs an FP operation the FP
> exception handler sets the flag true and the FPU is initialized.
> Subsequently, when the thread is switched out, the flag is checked and
> the FPU context saved. Similarly the FPU context will be restored when
> the thread is reloaded. Optimizations can be added to this to allow
> the HAL to avoid allocating the FPU save area if the thread does not
> do FP operations.
> 
> Option 3 presents something of a problem. Notionally it is a
> development of option 2 where the actual FPU state swap is handled in
> the FP exception handler only if necessary. However, here's the
> problem: by the time we have decided to load the FPU state from the
> current thread, we will have destroyed it, since it is simply stored
> on the stack as part of the CPU state we have already loaded.
> 
> A solution to this is to allocate a per-thread "static" FP save area
> at the base of the stack, since it must persist after the rest of the
> thread's CPU state is loaded. However, this would prevent us using FP
> in exception or interrupt routines: an unreasonable and unenforceable
> restriction (although we could make that a configuration option if the
> user is prepared to exert the correct level of self-restraint).
> 
> A development of this idea, and probably the right way to do it, is to
> have an initial "static" save area that is used during normal thread
> switching, and to allocate a new FPU save area during exception and
> interrupt handling. These are then chained together and pointed to by
> a field in the HAL_SavedRegisters, which is used to maintain a HAL
> level pointer to the current thread's FPU save area. An additional
> pointer will then point to the save area for the current FPU contents
> owner. A thread context switch involves disabling the FPU, saving the
> current thread's FPU context pointer into the CPU save state and
> loading the pointer from the next thread. If the new thread performs
> an FPU operation, the exception handler saves the FPU context to the
> save area pointed to by the FPU owner pointer, loads the FPU context
> from the current thread's context pointer and copies it to the FPU
> context owner pointer. Exceptions and interrupts must also disable the
> FPU, and in addition will create a new FPU save area which replaces
> the current thread's save area pointer. Any FP operations will then
> cause a fresh FPU context to be created, rather than use the current
> thread's existing context. Exception or interrupt return will just
> cause the new context to be destroyed and the original current
> thread's pointer to be restored.
> 
> The only addition we will need to the HAL API will be a new macro,
> HAL_CPU_FLUSH_CONTEXT() or something similar, to force the FPU
> contents out to the save area for the benefit of debugging.
> 
> I intend to experimentally implement these mechanisms in the MIPS HAL,
> to check that it all works properly, although I don't know exactly
> when I will find the time to do this. In the meantime I suggest that
> you try implementing Option 1 only, which will allow you to at least
> use the FPU, although at slightly reduced performance. Once I am happy
> that I have not missed any subtleties in the MIPS HAL, you can then
> use that as a model to implement the PowerPC version. That way we will
> hopefully get a uniform implementation across all HALs.
> 
> 
> Suggestions, criticism and comments are of course welcome.
> 
> -- 
> Nick Garnett           mailto:nickg@cygnus.co.uk
> Cygnus Solutions, UK   http://www.cygnus.co.uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] eCos FP support suggestions.
  1999-10-11  7:58   ` Sergei Organov
@ 1999-10-27  8:15     ` Nick Garnett
  1999-10-27 11:10       ` Sergei Organov
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Garnett @ 1999-10-27  8:15 UTC (permalink / raw)
  To: ecos-discuss

Sergei Organov <osv@Javad.RU> writes:

Sergei,

Sorry that it's taken me a bit of time to get back to this. 

> Nick,
> 
> I already heard before from Bart about the idea to decide if thread
> needs FP context by handling "FP unavailable" exception and thus
> don't add additional support in the HAL interface. This approach has
> its drawbacks, and I think that it should be taken only if (even
> backward compatible) changes in HAL API are strictly prohibited.
> 
> I believe that it could be implemented in the way you described.
> However, here are things that bother me in this approach:
> 
> 1. What to do if architecture appears where "FP enable bit" just
> doesn't exist and thus there is no way to get exception on first FP
> instruction?

In this case we really have no alternative but to assume that all
threads are FP-using and switch FP state on each context
switch. Unfortunately we cannot really rely on the user telling us
which threads use FP and which don't since the issue is often
orthogonal to the threadedness of the application. It is not always
easy to cleanly divide the code of the app into FP and non-FP parts
and ensure that threads stay in their own halves.

However, this is largely an academic issue, most of the architectures
we support have this facility, since their designers expects operating
systems to switch FPU state lazily using exactly the mechanism I
propose.

> 
> 2. Anyway it'd be fine to have a way to define non-FP task
> explicitly (and get "FP not available" exception if FP operation is
> used). It will also allow to don't have "static" FP area at the base
> of the stack for such light-weight task.

This would be a reasonable optional enhancement to the basic
mechanism. However, see my comments about the impact on the kernel
interfaces later.

> 
> 3. Porting of FP support to new targets seems to be more difficult
> with this approach, because all common logic (that is in turn more
> complex) is to be implemented in the HAL instead of kernel.
>

There is not really very much common code here. Nearly all of it has
to be implemented in assembler for performance and because it works
very close to the machine. Even in your design, the common code really
only consists of a few per-thread variables and a few tests, nothing
very complex. With a suitable model to code from, porting to a new
architecture is often a simple matter of just translating instruction
for instruction. I have done this many times and it is very easy.

Often it is cleaner and simpler to reimplement a small piece of common
code than to provide a more complex interface simply so that it may be
shared.

Also, remember that the whole thing does not need to be implemented
from the start. It is acceptable to implement only option 1 and to add
the others as enhancements as they are required.

> 4. Potentially time-consuming operations of handling exception and
> initializing of FP context occurs at hardly predictable time moments
> (when first FP instruction is executed) instead of well defined moment 
> of task creation.

FPU exception handling should not be very expensive. Beyond the code
to save and restore the FPU contexts, it should not be more than a
handful of instructions. Remeber, in their "real" incarnations as
workstation CPUs, these processors are doing this kind of thing all
the time and hardware support for these exceptions is quite slick.

I left many details out of my description of how things work for
simplicity. When I talk about initializing the FPU, this may just mean
loading an FPU context full of zeroes. However, for many architectures
it may be cheaper to do an FPU initialization (which may just be to
load zeroes into all the registers) than to load an FPU context, so we
should do this when the option is available (such as first use of FP
by a thread).

Initializing the static per-thread FP context will happen at thread
initialization. It should be set up so that it can just be loaded into
the FPU as if it were a saved context, or marked invalid if it is
cheaper to initializate the FPU directly.

I agree that having any work done in the FPU exception handler makes
the first FPU instruction after a context switch take a much longer
time. However, this does have a fixed maximum duration (the time to
save and load a whole FPU context plus a few instructions). Any
mechanism that either lazily switches FPU contexts, or allows threads
to be optionally FP using or not, will introduce non-determinism. This
is true whether the switch is done in an FPU exception handler or the
context switch code. The only way of ensuring determinism is to switch
FPU contexts on every thread switch and reckon the extra time for this
into your calculations as a constant overhead.

> 
> 5. Are there any benefits of this approach besides unchanged HAL
> interface? Programmer doesn't need to decide if particular task needs
> FP context. What else?
>

We would also have to extend or change the kernel interface, since it
would be necessary to either specifiy that a thread was FP using on
creation, or notify the kernel of that fact later. This would either
require the constructor for the Cyg_Thread class to be changed, or
some new member functions to be added. These changes would then have
to be reflected in the C API. All of this would have major effects
on existing code and documentation. In general I want to avoid having
to make such far-reaching changes if an alternative solution exists.

-- 
Nick Garnett           mailto:nickg@cygnus.co.uk
Cygnus Solutions, UK   http://www.cygnus.co.uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] eCos FP support suggestions.
  1999-10-27  8:15     ` Nick Garnett
@ 1999-10-27 11:10       ` Sergei Organov
  0 siblings, 0 replies; 5+ messages in thread
From: Sergei Organov @ 1999-10-27 11:10 UTC (permalink / raw)
  To: ecos-discuss

Nick Garnett <nickg@cygnus.co.uk> writes:
>
> Sergei,
>
> Sorry that it's taken me a bit of time to get back to this.

Please don't mention it. I was busy as well :-)

Thank you so much for your explanations. You almost convinced me the
approach you are going to use is better, so I snip most of our discussion
below. While many of implementation details are still not clear to me,
I think I better don't bother you with questions about them to leave
more time for real work.

The only comment below.

> > 2. Anyway it'd be fine to have a way to define non-FP task
> > explicitly (and get "FP not available" exception if FP operation is
> > used). It will also allow to don't have "static" FP area at the base
> > of the stack for such light-weight task.
>
> This would be a reasonable optional enhancement to the basic
> mechanism. However, see my comments about the impact on the kernel
> interfaces later.
>
[...]
> We would also have to extend or change the kernel interface, since it
> would be necessary to either specifiy that a thread was FP using on
> creation, or notify the kernel of that fact later. This would either
> require the constructor for the Cyg_Thread class to be changed, or
> some new member functions to be added. These changes would then have
> to be reflected in the C API. All of this would have major effects
> on existing code and documentation. In general I want to avoid having
> to make such far-reaching changes if an alternative solution exists.

Well, I agree that C interface and documentation changes seem to be
unavoidable. However, it seems there is a way to do it without 
impact on existing user code (e.g. add another routine for task
creation that will create such a task or make task creation routine to 
take additional optional argument). On the other hand without it I must
pay extra, say, maximum of 256 bytes of stack (for full PowerPC  
FPU context) for every task in the system that I know will never use 
FP. Doesn't it overweight extra work of adding this support to the 
system and changing the documentation? 

It is also possible that other cases will occur later when it's
necessary to send some optional arguments to the task creation
routine. Don't you think it's better to have some mechanism in C
interface that will allow to make such additions in future?

Another solution would be to put FP related static data on top of
stack, not at the bottom. This will allow to just allocate less stack
space for non-FP task in hope that this task will never execute FP
operation. It has obvious drawback of loosing ability to get "FP
unavailable" exception if the task actually executes FP instruction
though.

BTW, if you have a guess when lazy FP switch support might be
implemented in the MIPS HAL, please let me know.

Best Regards,
Sergei.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-10-27 11:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-09-27  5:44 [ECOS] eCos FP support suggestions Sergei Organov
1999-10-08  4:15 ` Nick Garnett
1999-10-11  7:58   ` Sergei Organov
1999-10-27  8:15     ` Nick Garnett
1999-10-27 11:10       ` Sergei Organov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).