Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
To: Daniel Engel <libgcc@danielengel.com>,
	Christophe Lyon <christophe.lyon@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0
Date: Thu, 7 Jan 2021 12:56:54 +0000	[thread overview]
Message-ID: <3ba1767e-26f3-96d2-e5c7-987ca0a10f1d@foss.arm.com> (raw)
In-Reply-To: <7e0f7bb7-c2c4-4c77-8dc2-5f5c20a7bcec@www.fastmail.com>

On 07/01/2021 00:59, Daniel Engel wrote:
> --snip--
> 
> On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> 
>>
>> Thanks for working on this, Daniel.
>>
>> This is clearly stage1 material, so we've got time for a couple of
>> iterations to sort things out.
> 
> I appreciate your feedback.  I had been hoping that with no regressions
> this might still be eligible for stage2.  Christophe never indicated
> either way. but the fact that he was looking at it seemed positive.
> I thought I would be a couple of weeks faster with this last
> iteration, but holidays got in the way.

GCC doesn't have a stage 2 any more (historical wart).  We were in
(late) stage3 when this was first posted, and because of the significant
impact this might have on not just CM0 but other targets as well, I
don't think it's something we should try to squeeze in at the last
minute.  We're now in stage 4, so that is doubly the case.

Christophe is a very valuable member of our community, but he's not a
port maintainer and thus cannot really rule on what can go into the
tools, or when.

> 
> I actually think your comments below could all be addressable within a
> couple of days.  But, I'm not accounting for the review process.
>  
>> Firstly, the patch is very large, but contains a large number of
>> distinct changes, so it would really benefit from being broken down into
>> a number of distinct patches.  This will make reviewing the individual
>> changes much more straight-forward.  
> 
> I have no context for "large" or "small" with respect to gcc.  This
> patch comprises about 30% of a previously-monolithic library that's
> been shipping since ~2016 (the rest is libm material).  Other than
> (1) the aforementioned change to div0(), (2) a nascent adaptation
> for __truncdfsf2() (not enabled), and (3) the gratuitous addition of
> the bitwise functions, the library remains pretty much as it was
> originally released.

Large, like many other terms is relative.  For assembler file changes,
which this is primarily, the overall size can be much smaller and still
be considered 'large'.

> 
> The driving force in the development of this library was small size,
> which of course was never possible with the softfp routines.  It's not
> half-slow, either, for the limitations of the M0 architecture.   And,
> it's IEEE compliant.  But, that means that most of the functions are
> highly interconnected.  So, some of it can be broken up as you outline
> below, but that last patch is still worth more than half of the total.

Nevertheless, having the floating-point code separated out will make
reviewing more straight forward.  I'll likely need to ask one of our FP
experts to have a specific look at that part and that will be easier if
it is disentangled from the other changes.

> 
> I also have ~70k lines of test vectors that seem mostly redundant, but
> not completely.  I haven't decided what to do here.  For example, I have
> coverage for __aeabi_u/ldivmod, while GCC does not.  If I do anything
> with this code it will be in a separate thread.

Publishing the test code, even if it isn't integrated into the GCC
testsuite would be useful.  Perhaps someone else could then help with that.

> 
>> I'd suggest:
>>
>> 1) Some basic makefile cleanups to ease initial integration - in
>> particular where we have things like
>>
>> LIB1FUNCS += <long list of functions>
>>
>> that this be rewritten with one function per line (and sorted
>> alphabetically) - then we can see which functions are being changed in
>> subsequent patches.  It makes the Makefile fragments longer, but the
>> improvement in clarity for makes this worthwhile.
> 
> I know next to nothing about Makefiles, particularly ones as complex as
> GCC's.  I was just trying to work with the existing style to avoid
> breaking something.  However, I can certainly adopt this suggestion.
>  
>> 2) The changes for the existing integer functions - preferably one
>> function per patch.
>>
>> 3) The new integer functions that you're adding
> 
> These wouldn't be too hard to do, but what are the expectations for
> testing?  A clean build of GCC takes about 6 hours in my VM, and
> regression testing takes about 4 hours per architecture.  You would want
> a full regression report for each incremental patch?  I have no idea how
> to target regression tests that apply to particular runtime functions
> without the risk of missing something.
> 

Most of this can be tested in a cross-compile environment using qemu as
a model.  A cross build shouldn't take that long (especially if you
restrict the compiler to just C and C++ - other languages are
vanishingly unlikely to pick up errors in the parts of the compiler
you're changing).  But build checks will be mostly sufficient for most
of the intermediate patches.

>> 4) The floating-point support.
>>
>> Some more general observations:
>>
>> - where functions are already in lib1funcs.asm, please leave them there.
> 
> I guess I have a different vision here.  I have had a really hard time
> following all of the nested #ifdefs in lib1funcs, so I thought it would
> be helpful to begin breaking it up into logical units.

Agreed, it's not easy.  But the restructuring, if any, should be done
separately from other changes, not mixed up with them.

> 
> The functions removed were all functions for which I had THUMB1
> sequences faster/smaller than lib1funcs:  __clzsi2, __clzdi2, __ctzsi2,
> __ashrdi3, __lshrdi3, __ashldi3.  In fact, the new THUMB1 of __clzsi2 is
> the same number of instructions as the previous ARM/THUMB2 version.
> 
> You will find all of the previous ARM versions of these functions merged
> into the new files (with attribution) and the same preprocessor
> selection path.  So no architecture variant should be any worse off than
> before this patch, and some beyond v6m should benefit.
> 
> In the future, I think that my versions of __divsi3 and __divdi3 will
> prove faster/smaller than the existing THUMB2 versions.  I know that my
> float routines are less than half the compiled size of THUMB2 versions
> in 'ieee754-sf.S'.  However, I haven't profiled the exact performance
> differences so I have left all this work for future patches. (It's also
> quite likely that my version can be further-refined with a few judicious
> uses of THUMB2 alternatives.)
> 
> My long-term vision would be use lib1funcs as an architectural wrapper
> distinct from the implementation code.
> 
>> - lets avoid having the cm0 subdirectory - in particular we should do
>> this when there is existing code for other targets in the same source
>> files.  It's OK to have any new files in the main 'arm' directory of the
>> source tree - just name the files appropriately if really needed.
> 
> Fair point on the name.  In v1 of this patch, all these files were all
> preprocessor-selected for v6m only.  However, as I've stumbled through
> the finer points of integration, that line has blurred.  Name aside,
> the subdirectory does still represent a standalone library.   I think
> I've managed to add enough integration hooks that it works well in
> a libgcc context, but it still has a very distinct implementation style.
> 
> I don't have a strong opinion on this, just preference.  But, keeping
> the subdirectory with a neutral name will probably make maintenance
> easier in the short term.  I would suggest "lib0" (since it caters to
> the lowest common denominator) or "eabi" (since that was the original
> target).  There are precedents in other architectures (e.g. avr).

The issue here is that the selection of code from the various
subdirectories is not consistent.  In some cases we might be pulling in
a thumb1 implementation into a thumb2 environment, so having the code in
a directory that doesn't reflect this makes maintaining the code harder.
 I don't mind too much if some new files are introduced and their names
reflect both their function and the architecture they support - eg
t1-di-shift.S would obviously contain code for di-mode shifts in thumb1.

> 
>> - let's avoid the CM0 prefix - this is 'thumb1' code, for want of a
>> better term, and that is used widely elsewhere in the compiler.  So if
>> you really need a term just use THUMB1, or even T1.
> 
> Maybe.  The Cortex M0 includes a subset of THUMB2 instructions.  Most
> of this is probably THUMB1 clean, but it wasn't a design requirement.

It's particularly the Thumb1 issue, just more the name is for a specific
CPU which might cause confusion later.  v6m would be preferable to that
if there really is a dependency on the instructions that are not in the
original Thumb1 ISA.

> 
> The CM0_FUNC_START exists so that I can specify subsections of ".text"
> for each function.  This was a fairly fundamental design decision that
> allowed me to make a number of branch optimizations between functions.
> The other macros are just duplicates for naming symmetry.

This is something we'll have to get to during the main review of the
code - we used to have support for PE-COFF object files.  That might now
be obsolete, wince support is certainly deprecated - but we can't assume
that ELF is the only object format we'll ever have to support.

> 
> The existing  FUNC_START macro inserts extra conflicting ".text"
> directives that would break the build.  Of course, the prefix was
> arbitrary; I just took CM0 from the library name.  But, there's nothing
> architecturally significant about this macro at all, so THUMB1 and T1
> seems just about as wrong.  Maybe define a FUNC_START_SECTION macro with
> two parameters? For example:
> 
>     FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2
> 
> Instead of: 
> 
>     .section .text.sorted.libgcc.clz2.clzdi2,"x"
>     CM0_FUNC_START clzdi2
> 
>> - For the 64-bit shift functions, I expect the existing code to be
>> preferable whenever possible - I think it can even be tweaked to build
>> for thumb2 by inserting a suitable IT instruction.  So your code should
>> only be used when
>>
>>  #if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
> 
> That is the definition of NOT_ISA_TARGET_32BIT, which I am respecting.
> (The name doesn't seem quite right for Cortex M0, since it does support
> some 32 bit instructions, but that's beside the point.)

The terms Thumb1 and Thumb2 predate the arm-v8m architecture
specifications - even then the term Thumb1 was interpreted as "mostly
16-bit instructions" and thumb2 as "a mix of 16- and 32-bit".  Yes, the
16/32-bit spilt has become more blurred and that will likely continue in
future since the 16-bit encoding space is pretty full.

> 
> The current lib1funcs ARM code path still exists, as described above. My
> THUMB1 implementations were 1 - 3 instructions shorter than the current
> versions, which is why I took the time to merge the files.
> 
> Unfortunately, the Cortex M0 THUMB2 subset does not provide IT.  I don't
> see an advantage to eliminating the branch unless these functions were
> written with cryptographic side channel attacks in mind.

On high performance cores branches are predicted - if the branch is
predictable then the common path will be taken and the unneeded
instructions will never be used.  But library functions like this tend
to have very unpredictable values used for calling them, so it's much
less likely that the hardware will predict the right path - at this
point conditional instructions tend to win (especially if there aren't
very many of them) because the cost (on average) of not executing the
unneeded instructions is much lower than the cost (on average) of
unwinding the processor state to execute the other arm of the
conditional branch.

> 
>> - most, if not all, of your LSYM symbols should not be needed after
>> assembly, so should start with a captial 'L' (and no leading
>> underscores), the assembler will then automatically discard any that are
>> not needed for relocations.
> 
> You don't want debugging symbols for libgcc internals :) ?  I sort of
> understand that, but those symbols have been useful to me in the past.
> The "." by itself seems to keep visibility local, so the extra symbols
> won't cause linker issuess. Would you object to a macro variant (e.g.
> LLSYM) that prepends the "L" but is easier to disable?

It is a matter of taste, but I really prefer the local symbols to
disappear entirely once the file is compiled - it makes things like
backtrace gdb show the proper call heirarchy.

> 
>> - you'll need to write suitable commit messages for each patch, which
>> also contain a suitable ChangeLog style entry.
> 
> OK.
> 
>> - finally, your popcount implementations have data in the code segment.
>>  That's going to cause problems when we have compilation options such as
>> -mpure-code.
> 
> I am just following the precedent of existing lib1funcs (e.g. __clz2si).
> If this matters, you'll need to point in the right direction for the
> fix.  I'm not sure it does matter, since these functions are PIC anyway.

That might be a bug in the clz implementations - Christophe: Any thoughts?

> 
>> I strongly suggest that, rather than using gcc snapshots (I'm assuming
>> this based on the diff style and directory naming in your patches), you
>> switch to using a git tree, then you'll be able to use tools such as
>> rebasing and the git posting tools to send the patch series for
>> subsequent review.
> 
> Your assumption is correct.  I didn't think that I would have to get so
> deep into the gcc development process for this contribution.  I used
> this library as a bare metal alternative for libgcc/libm in the product
> for years, so I thought it would just drop in.  But, the libgcc compile
> mechanics have proved much more 'interesting'. I'm assuming this
> architecture was created years before the introduction of -ffunction-
> sections...
> 

I don't think I've time to write a history lesson, even if you wanted
it.  Suffice to say, this does date back to the days of a.out format
object files (with 4 relocation types, STABS debugging, and one code,
one data and one bss section).

>>
>> Richard.
>>
> 
> Thanks again,
> Daniel
> 

R.

next prev parent reply	other threads:[~2021-01-07 12:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 23:04 [PATCH] " Daniel Engel
2020-11-26  9:14 ` Christophe Lyon
2020-12-02  3:32   ` Daniel Engel
2020-12-16 17:15     ` Christophe Lyon
2021-01-06 11:20       ` [PATCH v3] " Daniel Engel
2021-01-06 17:05         ` Richard Earnshaw
2021-01-07  0:59           ` Daniel Engel
2021-01-07 12:56             ` Richard Earnshaw [this message]
2021-01-07 13:27               ` Christophe Lyon
2021-01-07 16:44                 ` Richard Earnshaw
2021-01-09 12:28               ` Daniel Engel
2021-01-09 13:09                 ` Christophe Lyon
2021-01-09 18:04                   ` Daniel Engel
2021-01-11 14:49                     ` Richard Earnshaw
2021-01-09 18:48                   ` Daniel Engel
2021-01-11 16:07                   ` Christophe Lyon
2021-01-11 16:18                     ` Daniel Engel
2021-01-11 16:39                       ` Christophe Lyon
2021-01-15 11:40                         ` Daniel Engel
2021-01-15 12:30                           ` Christophe Lyon
2021-01-16 16:14                             ` Daniel Engel
2021-01-21 10:29                               ` Christophe Lyon
2021-01-21 20:35                                 ` Daniel Engel
2021-01-22 18:28                                   ` Christophe Lyon
2021-01-25 17:48                                     ` Christophe Lyon
2021-01-25 23:36                                       ` Daniel Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ba1767e-26f3-96d2-e5c7-987ca0a10f1d@foss.arm.com \
    --to=richard.earnshaw@foss.arm.com \
    --cc=christophe.lyon@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libgcc@danielengel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).