Patch_or_Contribution granted: [Bug 1001607] Cortex-M4F architectural Floating Point Support

public inbox for ecos-patches@sourceware.org
 help / color / mirror / Atom feed

From: bugzilla-daemon@bugs.ecos.sourceware.org
To: ecos-patches@sourceware.org
Subject: Patch_or_Contribution granted: [Bug 1001607] Cortex-M4F architectural Floating Point Support
Date: Fri, 22 Jun 2012 04:56:00 -0000	[thread overview]
Message-ID: <20120622045526.265282F78004@mail.ecoscentric.com> (raw)
In-Reply-To: <bug-1001607-0@http.bugs.ecos.sourceware.org/>

Jonathan Larmour <jifl@ecoscentric.com> has granted  Patch_or_Contribution:
Bug 1001607: Cortex-M4F architectural Floating Point Support
http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001607

------- Additional Comments from Jonathan Larmour <jifl@ecoscentric.com>
Hi Ilija,

Thanks for all this work. It looks really good! I've gone through it all now,
and unsurprisingly have some observations. It's quite a big patch so it ends up
being quite long...

Architecture HAL:

- The CDL seems to be getting much more complex than it really needs to be.
There is now CDL for: CYGHWR_HAL_CORTEXM (== M3 or M4),
CYGINT_HAL_CORTEXM4_CODE, CYGOPT_HAL_CORTEXM4_CODE, CYGOPT_HAL_CORTEXM3_CODE,
CYGINT_HAL_CORTEXM_FPU, CYGINT_HAL_CORTEXM_FPV4_SP_D16, CYGHWR_HAL_CORTEXM_FPU,
and to an extent CYGHWR_HAL_CORTEXM_FPU_SWITCH=="NONE", which all have
overlapping behaviour with respect to each other and effects on each other.
There is unnecessary redundancy here.

I think we can simply remove CYGINT_HAL_CORTEXM4_CODE, CYGOPT_HAL_CORTEXM4_CODE
and CYGOPT_HAL_CORTEXM3_CODE because even if building with -mcpu=cortex-m4, gcc
defaults to soft floating-point. So that means if CYGHWR_HAL_CORTEXM == "M4"
then it's safe to use -mcpu=cortex-m4. So CYGBLD_GLOBAL_XFLAGS_ARCH can go too.


This line can just go away:
+	 requires { !is_active(CYGOPT_HAL_CORTEXM4_CODE) implies
CYGHWR_HAL_CORTEXM_FPU == 0 }

- CYGHWR_HAL_CORTEXM_FPU should be just:
	    active_if	    CYGINT_HAL_CORTEXM_FPU
	    default_value   1

You don't need to set the default value specially when its inactive anyway.

- In CYGHWR_HAL_CORTEXM_FPU's description there's a "\n". Also I wouldn't call
it "ordinary" behaviour. Perhaps "simplest".

- Here's a possible outline of a simplification. What do you think?
  cdl_option CYGHWR_HAL_CORTEXM_FPU_FLAGS {
     display	"FPU build flags"
     calculated { (CYGHWR_HAL_CORTEXM_FPU ? " -mfloat-abi=hard" : "" ) .
(CYGINT_HAL_CORTEXM_FPV4_SP_D16 ? " -mfpu=fpv4-sp-d16" : "" ) }
   }
   requires is_substr(CYGBLD_GLOBAL_CFLAGS, CYGHWR_HAL_CORTEXM_FPU_FLAGS )
   requires is_substr(CYGBLD_GLOBAL_LDFLAGS, CYGHWR_HAL_CORTEXM_FPU_FLAGS )
  requires { !CYGHWR_HAL_CORTEXM_FPU implies !is_substr(CYGBLD_GLOBAL_CFLAGS,
"-mfpu=fpv4-sp-d16") && !is_substr(CYGBLD_GLOBAL_CFLAGS, "-mfloat-abi=hard") }
  requires { !CYGHWR_HAL_CORTEXM_FPU implies !is_substr(CYGBLD_GLOBAL_LDFLAGS,
"-mfpu=fpv4-sp-d16") && !is_substr(CYGBLD_GLOBAL_LDFLAGS, "-mfloat-abi=hard") }


- There is a quite important design issue. A lot of things assume the only
processor with an FPU is the Cortex-M4 and that FPU has the fpv4-sp-d16
arrangement. This is unlikely to remain the case, and it's easy to anticipate.
The M4 FPU is a bit cut-down after all, so it's easy to see a full VFP4 unit
being added in a later Cortex-M.  So we should be wrapping up much more
functionality in macros that can be supplied according to the FPU design, the
most obvious alternative possibility being a full VFP4 unit, to give an idea of
what to abstract.

With that in mind, I think there needs to be a fpv4-sp-d16.h header file,
probably included from cortexm_fpu.h (which could in future then instead
include alternatives e.g. a vfp4.h header once that became relevant). That
FPU-specific header will define a number of FPU-model-specific properties and
macros used for the implementation, such as from cortexm_stub.h and context.S. 


So for example, cortexm_stub.h would just have:
#ifndef NUMREGS
# define NUMREGS (16+8+2)  // 16 GPR, 8 FPR (unused), 2 PS
#endif

and that definition would have been overriden by fpv4-sp-d16.h (or in future by
vfp4.h).

For consistency with the practice of other HALs, something like 'enum regnames'
would become:
#ifndef ENUM_REGNAMES_DEFINED
enum regnames {
    R0, R1, R2, R3, R4, R5, R6, R7,
    R8, R9, R10, FP, IP, SP, LR, PC,
    F0, F1, F2, F3, F4, F5, F6, F7,
    FPS, PS
};
#endif

Of course we only need to override where necessary - things which are the same
in the FPv4 implementation, such as PS_N, PS_Z, PS_C, PS_V, target_register_t,
CYGARC_STUB_REGISTER_ACCESS_DEFINED, TARGET_HAS_LARGE_REGISTERS,
HAL_STUB_REGISTERS_SIZE, etc. are just set once in cortexm_stub.h.

- I wondered whether the current contents of cortexm_fpu.h would be shared with
any future potential alternative FPU unit in future Cortex-M's. I think the
current cortexm_fpu.h probably would be common between them, but do you agree?
So again with the most obvious example, would a VFP4 implementation still use
the same cortexm_fpu.h contents (after including a future vfp4.h header).

- In cortexm_fpu.h, you define hal_cortexm_fpu_enable/disable as essentially
"extern inline". You have to be careful with this, because using extern inline
means no non-inline implementation is provided. So if the compiler cannot
inline (or chooses not to, due to side-effects of optimisation), they would
fail to link. You could either play some tricks using the C preprocessor so you
can have this both as an inline and also define it in a linkable way from a C
file, but I think it may be better to just replace these inlines with
HAL_CORTEXM_FPU_ENABLE/DISABLE macros really. I do prefer inlines myself, but
using them easily from C can be a pain.

- In cortexm_regs.h, there has been the addition of
CYGARC_DSB()/CYGARC_CORTEXM_DATA_SYNC_BARRIER() and
CYGARC_ISB()/CYGARC_CORTEXM_INSTR_SYNC_BARRIER(). I think you should choose one
name for each and stick with it.

Also for consistency with some other architectural ports, there should be a
HAL_MEMORY_BARRIER define:

#define HAL_MEMORY_BARRIER() \
  CYGARC_CORTEXM_DATA_SYNC_BARRIER(); CYGARC_CORTEXM_INSTR_SYNC_BARRIER()

( This could then be used in cortexm_fpu.h, hal_fpu.c etc, to replace existing
calls which do exactly that, although that isn't vital.)

OOI strictly that cortexm_regs.h doesn't need #ifndef __ASSEMBLER__ as its all
macro definitions, but you can keep it like that if you want.

- Looking at the top of hal_arch.h, I think you should always be able to
include <cyg/hal/cortexm_fpu.h>, and we then have cortexm_fpu.h handling the
FPU config settings in a single place (including the check for which sort of
FPU unit), rather than each of the users of the header.

- Again in hal_arch.h, there are defines here which are specific to
fpv4-sp-d16. e.g. HAL_SAVEDREGISTERS_FPU_CONTEXT_N. Or the HAL saved context
members. These should instead be defined by fpv4-sp-d16.h

Although things like HAL_THREAD_INIT_FPU_REGS could be left in hal_arch.h
because once HAL_SAVEDREGISTERS_FPU_CONTEXT_N is abstracted, it will probably
work for everything. Although, that said, there's no harm putting an #ifndef
HAL_THREAD_INIT_FPU_REGS/#endif block around it for future-proofing to be on
the safe side. Similarly HAL_THREAD_INIT_FPU_CONTEXT().

- You don't save the floating point context for interrupts, but do for
exceptions. There's an argument that since both interrupt and exception
handlers could call arbitrary code which therefore may use FP, the context
should be saved. There's also an argument that unless this is a ROM monitor and
including stubs, you don't need to save the exception info either. I think
there may be merit in having two config options to govern whether to save FP
state for exceptions and interrupts. The default value for the exceptions one
would be CYGSEM_HAL_ROM_MONITOR || CYGDBG_HAL_DEBUG_GDB_INCLUDE_STUBS, the
default for the interrupt one would be 0.

- Just as a general coding standards thing, remember to put brackets round uses
of __regs_p in the new HAL_THREAD_INIT_FPU_* macros.

- FYI you can use CYG_EMPTY_STATEMENT instead of " CYG_MACRO_START
CYG_MACRO_END"

- I wouldn't bother with CYGARC_CORTEXM_GDB_REG_FPA, and instead, similar to
things above, have something like:
#ifndef HAL_CORTEXM_GDB_REGISTERS_DEFINED
typedef struct
{
    cyg_uint32	gpr[16];
[etc.]
} HAL_CORTEXM_GDB_Registers;
#endif

which is overridden from fpv4-sp-d16.h.

and CYGNUM_HAL_STACK_CONTEXT_SIZE can be:
#ifndef CYGNUM_HAL_STACK_FPU_CONTEXT_SIZE
# define CYGNUM_HAL_STACK_FPU_CONTEXT_SIZE 0
#endif
#define CYGNUM_HAL_STACK_CONTEXT_SIZE \
   ((4 * 20) + CYGNUM_HAL_STACK_FPU_CONTEXT_SIZE)

I think you're probably getting the idea about these macro overrides by now
:-).

- hal_io.h defines UFSR. I think you should give the comment as "Usage fault
status reg" rather than just "Misc".

- In context.S, CYGARC_FPU_CPACR_ENABLE could usefully be put in cortexm_fpu.h
where it could also be used by hal_cortexm_fpu_enable/disable (or their
replacement macros, as above).

- The title at the top of hal_fpu.c should be something like "HAL FPU support".


- Why does it include <pkgconf/kernel.h> ?

- Typo's: autometic -> automatic and ececute -> execute

- hal_usagefault_check_fpu() should only enable the FPU if
CYGHWR_HAL_CORTEXM_FPU_SWITCH_LAZY. Otherwise it implies something has gone
wrong and should return false for an error.

- cortexm_stub.c: There's a diag_printf left over in reg_offset().

- Again with reg_offset(), more abstraction of FPU model needed.

- hal_misc.c: You should ignore the startup type for the call to hal_init_fpu.
We can't assume for definite a previous ROM monitor was built with FPU support
enabled, so we can't rely on it (and equally hal_init_fpu() should not just
enable the FPU if needed, but also explicitly disable it if this is lazy
switching mode and it's RAM startup).

- hal_misc.c: hal_get/set_gdb_registers. This duplication can definitely be
avoided with the use of macros (including a default one for the pretend FPA
regs). And then the details would be provided via cortexm_fpu.h by my proposed
fpv4-sp-d16.h.

- In vectors.S: This change is accidental:
  -// NOTE: At present this implementation does not permit an exception
  +// NOTE:p At present this implementation does not permit an exception

- Again we should definitely aim to replace FPU-specific parts with macros,
rather than duplicating entire functions such as hal_default_exception_vsr.
Either with the C preprocessor, or GAS .macro. If the latter you could have an
include/fpu.inc file, which will #include fpv4-sp-d16.h again (which would then
need to be asm-include safe) to configure itself - up to you. You can again
look at the i386 arch HAL to see what it did in the same situation -
specifically arch.inc, although it is more complex than we would need to be in
cortex-m.

But I think something needs to be done otherwise it becomes much harder to
follow the asm.

- Given what I suggested above about being able to save FP context from ISRs,
something would be needed with the hal_default_interrupt_vsr here, although
with some care, I would expect the same macros should be able to be reused for
both exceptions and interrupts.

- hal_usagefault_exception_vsr is almost identical to
hal_default_exception_vsr. I don't see why hal_default_exception_vsr can't be
used, and instead treated specially in hal_deliver_exception() instead, which
would be less code overall.

I had been also going to say about only setting the usagefault VSR (in
hal_misc.c) if CYGHWR_HAL_CORTEXM_FPU_SWITCH_LAZY, but indeed if we use
hal_deafult_exception_vsr then we could now just call
hal_usagefault_check_fpu() from hal_deliver_exception(), still only if
CYGHWR_HAL_CORTEXM_FPU_SWITCH_LAZY. And if hal_usagefault_check_fpu() returns
an error, then that allows us to enter the stub which seems like a good idea so
the user can debug the cause.

In other words hal_deliver_exception() would be like this (with some
comments+whitespace stripped for brevity):
void hal_deliver_exception( HAL_SavedRegisters *regs )
{
#ifdef CYGDBG_HAL_DEBUG_GDB_INCLUDE_STUBS
    if (__mem_fault_handler )
    {
	regs->u.exception.pc = (unsigned long)__mem_fault_handler;
	return; // Caught an exception inside stubs
    }
#endif
#ifdef CYGHWR_HAL_CORTEXM_FPU_SWITCH_LAZY
    if ( hal_usagefault_check_fpu() )
	return; // Successfully handled, so return
#endif

#if defined(CYGDBG_HAL_DEBUG_GDB_INCLUDE_STUBS)
    _hal_registers = regs;
    __handle_exception();
#elif defined(CYGPKG_KERNEL_EXCEPTIONS)
    cyg_hal_deliver_exception( regs->u.exception.vector, (CYG_ADDRWORD)regs );
#else
    CYG_FAIL("Exception!!!");
#endif
}

- Alternatively, if there's a reason to keep hal_usagefault_exception_vsr
separate from hal_default_exception_vsr, then it's still the case it only needs
to be included if CYGHWR_HAL_CORTEXM_FPU_SWITCH_LAZY I think.

Tests:

- For fptestf.c and fpinttest.c: The test for whether the test is NA should
also include <pkgconf/isoinfra.h> and check CYGINT_ISO_C_CLOCK_FUNCS is
defined, and also CYGPKG_LIBM for fabsf() (which should then not be
prototyped). Note this uses a libm implementation which hasn't been checked in
yet. Overall it might be easier to just do our own fabsf() since we can do a
simple one since we're not concerned with NaNs or infinity. I think this:
float my_fabsf(float f)
{
    if ( f < 0.0 )
	return -f;
    if ( f == -0.0 )
	return 0.0;
    return f;
}

 I think stdio.h isn't needed? And the include of math.h should be within the
#if block for the test requirements otherwise the test won't build because
math.h doesn't exist. You should include <string.h> for strlen and check for
CYGINT_ISO_STRING_STRFUNCS.

- In thread_switch_fpu.cxx you have a FIXME in thread_fp() about gcc maybe
optimising things away. I think it could. A simple solution is to make op1 and
op2 be global arrays, indexed by the thread index (the indx variable). That
should make gcc treat them more carefully. At the very end,
thread_swotch_fpu.cxx -> thread_switch_fpu.cxx :-).

Don't let the length of this make you think that there are big problems
here.... overall this is great work!

Jifl

          parent reply	other threads:[~2012-06-22  4:56 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <bug-1001607-0@http.bugs.ecos.sourceware.org/>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120622045526.265282F78004@mail.ecoscentric.com \
    --to=bugzilla-daemon@bugs.ecos.sourceware.org \
    --cc=ecos-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).