From mboxrd@z Thu Jan  1 00:00:00 1970
From: Nick Garnett <nickg@cygnus.co.uk>
To: ecos-discuss@sourceware.cygnus.com
Subject: Re: [ECOS] eCos FP support suggestions.
Date: Wed, 27 Oct 1999 08:15:00 -0000
Message-id: <po904oyh2w.fsf@balti.cygnus.co.uk>
References: <87puz4zfno.fsf@osv.javad.ru> <po670i85sc.fsf@balti.cygnus.co.uk> <87aepqx7mx.fsf@osv.javad.ru>
X-SW-Source: 1999-10/msg00100.html

Sergei Organov <osv@Javad.RU> writes:

Sergei,

Sorry that it's taken me a bit of time to get back to this. 

> Nick,
> 
> I already heard before from Bart about the idea to decide if thread
> needs FP context by handling "FP unavailable" exception and thus
> don't add additional support in the HAL interface. This approach has
> its drawbacks, and I think that it should be taken only if (even
> backward compatible) changes in HAL API are strictly prohibited.
> 
> I believe that it could be implemented in the way you described.
> However, here are things that bother me in this approach:
> 
> 1. What to do if architecture appears where "FP enable bit" just
> doesn't exist and thus there is no way to get exception on first FP
> instruction?

In this case we really have no alternative but to assume that all
threads are FP-using and switch FP state on each context
switch. Unfortunately we cannot really rely on the user telling us
which threads use FP and which don't since the issue is often
orthogonal to the threadedness of the application. It is not always
easy to cleanly divide the code of the app into FP and non-FP parts
and ensure that threads stay in their own halves.

However, this is largely an academic issue, most of the architectures
we support have this facility, since their designers expects operating
systems to switch FPU state lazily using exactly the mechanism I
propose.

> 
> 2. Anyway it'd be fine to have a way to define non-FP task
> explicitly (and get "FP not available" exception if FP operation is
> used). It will also allow to don't have "static" FP area at the base
> of the stack for such light-weight task.

This would be a reasonable optional enhancement to the basic
mechanism. However, see my comments about the impact on the kernel
interfaces later.

> 
> 3. Porting of FP support to new targets seems to be more difficult
> with this approach, because all common logic (that is in turn more
> complex) is to be implemented in the HAL instead of kernel.
>

There is not really very much common code here. Nearly all of it has
to be implemented in assembler for performance and because it works
very close to the machine. Even in your design, the common code really
only consists of a few per-thread variables and a few tests, nothing
very complex. With a suitable model to code from, porting to a new
architecture is often a simple matter of just translating instruction
for instruction. I have done this many times and it is very easy.

Often it is cleaner and simpler to reimplement a small piece of common
code than to provide a more complex interface simply so that it may be
shared.

Also, remember that the whole thing does not need to be implemented
from the start. It is acceptable to implement only option 1 and to add
the others as enhancements as they are required.

> 4. Potentially time-consuming operations of handling exception and
> initializing of FP context occurs at hardly predictable time moments
> (when first FP instruction is executed) instead of well defined moment 
> of task creation.

FPU exception handling should not be very expensive. Beyond the code
to save and restore the FPU contexts, it should not be more than a
handful of instructions. Remeber, in their "real" incarnations as
workstation CPUs, these processors are doing this kind of thing all
the time and hardware support for these exceptions is quite slick.

I left many details out of my description of how things work for
simplicity. When I talk about initializing the FPU, this may just mean
loading an FPU context full of zeroes. However, for many architectures
it may be cheaper to do an FPU initialization (which may just be to
load zeroes into all the registers) than to load an FPU context, so we
should do this when the option is available (such as first use of FP
by a thread).

Initializing the static per-thread FP context will happen at thread
initialization. It should be set up so that it can just be loaded into
the FPU as if it were a saved context, or marked invalid if it is
cheaper to initializate the FPU directly.

I agree that having any work done in the FPU exception handler makes
the first FPU instruction after a context switch take a much longer
time. However, this does have a fixed maximum duration (the time to
save and load a whole FPU context plus a few instructions). Any
mechanism that either lazily switches FPU contexts, or allows threads
to be optionally FP using or not, will introduce non-determinism. This
is true whether the switch is done in an FPU exception handler or the
context switch code. The only way of ensuring determinism is to switch
FPU contexts on every thread switch and reckon the extra time for this
into your calculations as a constant overhead.

> 
> 5. Are there any benefits of this approach besides unchanged HAL
> interface? Programmer doesn't need to decide if particular task needs
> FP context. What else?
>

We would also have to extend or change the kernel interface, since it
would be necessary to either specifiy that a thread was FP using on
creation, or notify the kernel of that fact later. This would either
require the constructor for the Cyg_Thread class to be changed, or
some new member functions to be added. These changes would then have
to be reflected in the C API. All of this would have major effects
on existing code and documentation. In general I want to avoid having
to make such far-reaching changes if an alternative solution exists.


-- 
Nick Garnett           mailto:nickg@cygnus.co.uk
Cygnus Solutions, UK   http://www.cygnus.co.uk