public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Old compiler optimizations in installed headers
@ 2015-05-22 17:06 Joseph Myers
  2015-05-22 21:36 ` Paul Eggert
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Joseph Myers @ 2015-05-22 17:06 UTC (permalink / raw)
  To: libc-alpha

We've recently had a discussion of whether it makes sense to keep various 
macros and inline functions in bits/string2.h that are only used for old 
GCC versions and only serve to optimize cases of small / constant 
arguments with those versions.  A similar issue applies to such things as 
inline __signbit* definitions in bits/mathinline.h - given 
<https://sourceware.org/ml/libc-alpha/2015-05/msg00521.html> could we just 
remove the various inlines on the basis that optimization for compilers 
before GCC 4.0 isn't a concern - and probably to various other inlines.

There was previously a discussion with a proposal in 
<https://sourceware.org/ml/libc-alpha/2013-01/msg00157.html> and with 
<https://sourceware.org/ml/libc-alpha/2013-01/msg00270.html> saying to 
avoid optimization regressions for old compilers.

Regarding what's supported at all, the baseline is C90 or C++98 plus long 
long support (although there are various places where headers in fact 
depend on other extensions to provide particular functionality).

Meanwhile, lots of architecture-specific .S function implementations in 
NPTL (especially, but maybe also elsewhere) have been removed, with 
benchmark evidence showing they don't actually provide better performance 
but do cause significant trouble in maintenance.

I'd like to propose that:

(a) if an optimization could clearly be done in the compiler - if it only 
depends on the standard semantics of standard functions, or fully-defined 
semantics of glibc functions that it would be reasonable to encode into 
GCC, rather than e.g. generating calls to a glibc-internal function - then 
we should be wary of adding it to glibc's headers in the first place: in 
accordance with principles of GNU projects working together, it's better 
to add the optimization to GCC; and

(b) where such optimizations are present in glibc headers and only 
relevant for GCC versions before some baseline (maybe 4.1 or 4.3), we 
should be willing to remove them to simplify the code and remove variants 
that aren't covered by normal glibc testing at all, without being 
concerned about worse code possibly being generated for users of old 
compilers.  Applying that principle to signbit and is* macros would allow 
complete removal of some mathinline.h implementations and removal of 
significant amounts of code from others.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-22 17:06 Old compiler optimizations in installed headers Joseph Myers
@ 2015-05-22 21:36 ` Paul Eggert
  2015-05-22 22:10   ` Joseph Myers
  2015-05-22 22:19 ` Roland McGrath
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Paul Eggert @ 2015-05-22 21:36 UTC (permalink / raw)
  To: Joseph Myers, libc-alpha

On 05/22/2015 08:03 AM, Joseph Myers wrote:
> (b) where such optimizations are present in glibc headers and only
> relevant for GCC versions before some baseline (maybe 4.1 or 4.3), we
> should be willing to remove them to simplify the code

This all sounds reasonable.  How much of a maintenance difference would 
it be to select 4.1 vs 4.3 for the baseline?  If that's significant, the 
last GCC 4.3.x release was in 2009 which is quite a while ago if we're 
talking about development environments, so I suggest going with 4.3.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-22 21:36 ` Paul Eggert
@ 2015-05-22 22:10   ` Joseph Myers
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Myers @ 2015-05-22 22:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

On Fri, 22 May 2015, Paul Eggert wrote:

> On 05/22/2015 08:03 AM, Joseph Myers wrote:
> > (b) where such optimizations are present in glibc headers and only
> > relevant for GCC versions before some baseline (maybe 4.1 or 4.3), we
> > should be willing to remove them to simplify the code
> 
> This all sounds reasonable.  How much of a maintenance difference would it be
> to select 4.1 vs 4.3 for the baseline?  If that's significant, the last GCC
> 4.3.x release was in 2009 which is quite a while ago if we're talking about
> development environments, so I suggest going with 4.3.

There are some 4.3 conditionals in byteswap.h headers.  Other than that it 
doesn't look like there would be much advantage in 4.3 over 4.0 (3.4 would 
be sufficient for bits/string2.h, 4.0 for bits/mathinline.h on various 
architectures).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-22 17:06 Old compiler optimizations in installed headers Joseph Myers
  2015-05-22 21:36 ` Paul Eggert
@ 2015-05-22 22:19 ` Roland McGrath
  2015-05-24 17:10 ` Ondřej Bílka
  2015-05-25  2:37 ` Ondřej Bílka
  3 siblings, 0 replies; 11+ messages in thread
From: Roland McGrath @ 2015-05-22 22:19 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

I think that all makes perfect sense.  Starting with 4.1 or 4.3 is a
completely reasonable conservative starting place.

I think we should consider the minimum GCC version required for
building libc itself as the only clear upper bound on the minimum
GCC version for which we bother to maintain any optimizations in
headers.  Keeping the minimum for header optimizations at a lower
version requires continual proof that it matters to anyone, any time
it looks like raising it would ease the maintenance burden.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-22 17:06 Old compiler optimizations in installed headers Joseph Myers
  2015-05-22 21:36 ` Paul Eggert
  2015-05-22 22:19 ` Roland McGrath
@ 2015-05-24 17:10 ` Ondřej Bílka
  2015-05-27 20:04   ` Richard Henderson
  2015-05-25  2:37 ` Ondřej Bílka
  3 siblings, 1 reply; 11+ messages in thread
From: Ondřej Bílka @ 2015-05-24 17:10 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On Fri, May 22, 2015 at 03:03:01PM +0000, Joseph Myers wrote:
> We've recently had a discussion of whether it makes sense to keep various 
> macros and inline functions in bits/string2.h that are only used for old 
> GCC versions and only serve to optimize cases of small / constant 
> arguments with those versions.  A similar issue applies to such things as 
> inline __signbit* definitions in bits/mathinline.h - given 
> <https://sourceware.org/ml/libc-alpha/2015-05/msg00521.html> could we just 
> remove the various inlines on the basis that optimization for compilers 
> before GCC 4.0 isn't a concern - and probably to various other inlines.
> 
While good idea in general I would be wary of signbit et al. macros.
Using gcc will cause performance regression in some cases as they
generate suboptimal branchless code. Adding branches is correct as
performance savings from branchless code are in branch misprediction
which doesn't happen. If you have more than 5% inputs NaN to offset
single cycle penalty of going branchless your problem is that you are
producing garbage instead performance.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-22 17:06 Old compiler optimizations in installed headers Joseph Myers
                   ` (2 preceding siblings ...)
  2015-05-24 17:10 ` Ondřej Bílka
@ 2015-05-25  2:37 ` Ondřej Bílka
  2015-05-28 16:43   ` Joseph Myers
  3 siblings, 1 reply; 11+ messages in thread
From: Ondřej Bílka @ 2015-05-25  2:37 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On Fri, May 22, 2015 at 03:03:01PM +0000, Joseph Myers wrote:
> We've recently had a discussion of whether it makes sense to keep various 
> macros and inline functions in bits/string2.h that are only used for old 
> GCC versions and only serve to optimize cases of small / constant 
> arguments with those versions.  A similar issue applies to such things as 
> inline __signbit* definitions in bits/mathinline.h - given 
> <https://sourceware.org/ml/libc-alpha/2015-05/msg00521.html> could we just 
> remove the various inlines on the basis that optimization for compilers 
> before GCC 4.0 isn't a concern - and probably to various other inlines.
> 
> There was previously a discussion with a proposal in 
> <https://sourceware.org/ml/libc-alpha/2013-01/msg00157.html> and with 
> <https://sourceware.org/ml/libc-alpha/2013-01/msg00270.html> saying to 
> avoid optimization regressions for old compilers.
> 
> Regarding what's supported at all, the baseline is C90 or C++98 plus long 
> long support (although there are various places where headers in fact 
> depend on other extensions to provide particular functionality).
> 
> Meanwhile, lots of architecture-specific .S function implementations in 
> NPTL (especially, but maybe also elsewhere) have been removed, with 
> benchmark evidence showing they don't actually provide better performance 
> but do cause significant trouble in maintenance.
> 
> I'd like to propose that:
> 
> (a) if an optimization could clearly be done in the compiler - if it only 
> depends on the standard semantics of standard functions, or fully-defined 
> semantics of glibc functions that it would be reasonable to encode into 
> GCC, rather than e.g. generating calls to a glibc-internal function - then 
> we should be wary of adding it to glibc's headers in the first place: in 
> accordance with principles of GNU projects working together, it's better 
> to add the optimization to GCC; and
>
I disagree for simple reason of cost. Its considerably easier to write a
inline function that does transformation than to write equivalent gcc
pass.

Result is that a lot of builtin cause performance regressions. I would
need to review them separately as they don't inspire lot of confidence.

I just found that gcc also seriously messes up strcmp with constant
strings. It uses rep cmpsb which is around three times slower than
libcall in following simple test, complile a.c and b.c separately then
link with main.c.

a.c:

#include <string.h>
int strcmp2(char *c)
{
  return strcmp(c, "ahtusntueoahsntsnthusnthueoasnth");

}


b.c:

#include <string.h>
extern char *str;
int strcmp2(char *c)
{
  return strcmp(c, str);

}

main.c:

char *str  = "ahtusntueoahsntsnthusnthueoasnth";
char *str2 = "bhtusntueoahsntsnthusnthueoasnti";

int main()
{
  int i;
  for (i=0;i<10000000;i++)
    strcmp2 (str2);
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-24 17:10 ` Ondřej Bílka
@ 2015-05-27 20:04   ` Richard Henderson
  2015-05-27 20:13     ` Ondřej Bílka
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2015-05-27 20:04 UTC (permalink / raw)
  To: Ondřej Bílka, Joseph Myers; +Cc: libc-alpha

On 05/24/2015 06:31 AM, Ondřej Bílka wrote:
> While good idea in general I would be wary of signbit et al. macros.
> Using gcc will cause performance regression in some cases as they
> generate suboptimal branchless code. Adding branches is correct as
> performance savings from branchless code are in branch misprediction
> which doesn't happen. If you have more than 5% inputs NaN to offset
> single cycle penalty of going branchless your problem is that you are
> producing garbage instead performance.

I beg your pardon?  Why would you *ever* have a branch implementing signbit?


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-27 20:04   ` Richard Henderson
@ 2015-05-27 20:13     ` Ondřej Bílka
  0 siblings, 0 replies; 11+ messages in thread
From: Ondřej Bílka @ 2015-05-27 20:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Joseph Myers, libc-alpha

On Wed, May 27, 2015 at 08:15:51AM -0700, Richard Henderson wrote:
> On 05/24/2015 06:31 AM, Ondřej Bílka wrote:
> > While good idea in general I would be wary of signbit et al. macros.
> > Using gcc will cause performance regression in some cases as they
> > generate suboptimal branchless code. Adding branches is correct as
> > performance savings from branchless code are in branch misprediction
> > which doesn't happen. If you have more than 5% inputs NaN to offset
> > single cycle penalty of going branchless your problem is that you are
> > producing garbage instead performance.
> 
> I beg your pardon?  Why would you *ever* have a branch implementing signbit?
> 
Sorry, I meant that you must be careful with others like isinf. Not signbit
thats ok.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-25  2:37 ` Ondřej Bílka
@ 2015-05-28 16:43   ` Joseph Myers
  2015-05-29 11:03     ` Ondřej Bílka
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Myers @ 2015-05-28 16:43 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

On Mon, 25 May 2015, Ondøej Bílka wrote:

> > I'd like to propose that:
> > 
> > (a) if an optimization could clearly be done in the compiler - if it only 
> > depends on the standard semantics of standard functions, or fully-defined 
> > semantics of glibc functions that it would be reasonable to encode into 
> > GCC, rather than e.g. generating calls to a glibc-internal function - then 
> > we should be wary of adding it to glibc's headers in the first place: in 
> > accordance with principles of GNU projects working together, it's better 
> > to add the optimization to GCC; and
> >
> I disagree for simple reason of cost. Its considerably easier to write a
> inline function that does transformation than to write equivalent gcc
> pass.

The first question should be to determine the right way to implement 
something rather than the quick way.  And I think the right way is 
generally compiler optimization - which (for example) allows for use in 
kernel space, for optimization based on the function semantics (e.g. as 
regards aliasing) rather than just the semantics of a particular 
implementation in a header, and for different expansions depending on 
whether the compiler thinks the code in question is hot or cold 
(information possibly obtained from profile feedback - much code is 
generally cold, so expansions that increase code size should only be used 
in those bits of code determined to be hot, which is information simply 
not available at all in the headers).

Then, if putting an optimization (or compiler bug workaround, etc.) in 
glibc's headers when a compiler approach would also be possible, it should 
always be accompanied by a comment pointing to the GCC bug report 
requesting the optimization, and the bug should have a comment pointing 
back to that glibc header comment and saying to inform the glibc 
developers when resolving the bug so they know to insert appropriate 
__GNUC_PREREQ conditionals in the header.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-28 16:43   ` Joseph Myers
@ 2015-05-29 11:03     ` Ondřej Bílka
  2015-05-29 12:00       ` Joseph Myers
  0 siblings, 1 reply; 11+ messages in thread
From: Ondřej Bílka @ 2015-05-29 11:03 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

On Thu, May 28, 2015 at 04:18:35PM +0000, Joseph Myers wrote:
> On Mon, 25 May 2015, Ondřej Bílka wrote:
> 
> > > I'd like to propose that:
> > > 
> > > (a) if an optimization could clearly be done in the compiler - if it only 
> > > depends on the standard semantics of standard functions, or fully-defined 
> > > semantics of glibc functions that it would be reasonable to encode into 
> > > GCC, rather than e.g. generating calls to a glibc-internal function - then 
> > > we should be wary of adding it to glibc's headers in the first place: in 
> > > accordance with principles of GNU projects working together, it's better 
> > > to add the optimization to GCC; and
> > >
> > I disagree for simple reason of cost. Its considerably easier to write a
> > inline function that does transformation than to write equivalent gcc
> > pass.
> 
> The first question should be to determine the right way to implement 
> something rather than the quick way.

On several simple optimizations both are correct so question is which is
easier to implement. Current gcc shown evidence that compiler
optimization isn't one. You could have performance regression with
almost all functions due to underlying bugs in memcmp and memcpy
generation.

As these are outstanding bugs lets be practical here. Joseph how long do
you think it will take to fix them. As these are present for five years
I wouldn't be surprised to wait another five years.

So I would make deadline of three months, if gcc cannot produce a patch
within that going compiler optimization way does take too long.

>  And I think the right way is 
> generally compiler optimization - which (for example) allows for use in 
> kernel space, for optimization based on the function semantics (e.g. as 

For kernel space you could just surround these with #ifdef _GCC_USE_SSE
or equivalent.

> regards aliasing) rather than just the semantics of a particular 
> implementation in a header, and for different expansions depending on 
> whether the compiler thinks the code in question is hot or cold 
> (information possibly obtained from profile feedback - much code is 
> generally cold, so expansions that increase code size should only be used 
> in those bits of code determined to be hot, which is information simply 
> not available at all in the headers).
> 
You shouldn't use pattern: Something needs to be done. X is something.
So X should be done.

As you mentioned profiling you would need userspace based profiling
instead of generic one by gcc. You could access userspace counters from
header.

As you mentioned hotness/coldness first you need make gcc measure real
thing which are number icache misses instead of trying to guess
hotness/coldness from frequencies. A code in tight loop with high
iteration count is hot no matter how rarely its executed.

Then as you said that expansion that increases code size should only be
used in hot bits of code you are wrong again.

You need to also do profiling of library function and don't do
ransformation only when function is cache resident. If its not then
expansion would improve performance as you need to fetch only several
bytes into icache instead wasting time on fetching whole function to
cache.

Then you have problem that optimizing for size is missnomer. You do
optimization knowing that each byte in code carries some penalty in
cache misses. Then for optimizations you need to check if they are
cost-effective instead broad generalization of hot/cold increases size
or not. 

Then as I mentioned userspace profiling you need to collect correct data
to get optimization. When we keep focus on strcmp/memcmp its question if
inlining first byte check helps. 

#define strcmp(x, y) ({          \
  profile.iterations++;          \
  if (x[0] - y[0])               \
    (x[0] - y[0]);               \
  else                           \
    {                            \
       profile.second_byte++;    \
...

From my profiling this helps for strcmp and strncmp. However its mistake
for memcmp, which is used to compare structures so mismatch likely
occurs much later. That also means that you discard information my
conversion of strcmp->memcmp unless you do profiling.


> Then, if putting an optimization (or compiler bug workaround, etc.) in 
> glibc's headers when a compiler approach would also be possible, it should 
> always be accompanied by a comment pointing to the GCC bug report 
> requesting the optimization, and the bug should have a comment pointing 
> back to that glibc header comment and saying to inform the glibc 
> developers when resolving the bug so they know to insert appropriate 
> __GNUC_PREREQ conditionals in the header.
> 
Of course that I will describe bugs.

Also you have other problem with gcc/header issue. I was asked if its
possible to use functions with partially expanded header, so you would
call new symbol like memcmp_aligned to save cost of checking header
again. I don't think that adding a expansion in gcc is sane.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Old compiler optimizations in installed headers
  2015-05-29 11:03     ` Ondřej Bílka
@ 2015-05-29 12:00       ` Joseph Myers
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Myers @ 2015-05-29 12:00 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3291 bytes --]

On Fri, 29 May 2015, Ondřej Bílka wrote:

> On several simple optimizations both are correct so question is which is
> easier to implement. Current gcc shown evidence that compiler
> optimization isn't one. You could have performance regression with
> almost all functions due to underlying bugs in memcmp and memcpy
> generation.

In most cases you haven't given enough information (such as test cases, 
compiler options and confirmation this is using current GCC 6) to convince 
me that there are actually problems (an awful lot of people complaining 
about compiler performance are using inappropriate options or compiler 
configurations).

> As these are outstanding bugs lets be practical here. Joseph how long do
> you think it will take to fix them. As these are present for five years
> I wouldn't be surprised to wait another five years.
> 
> So I would make deadline of three months, if gcc cannot produce a patch
> within that going compiler optimization way does take too long.

I don't think you can meaningfully count time without starting with a 
constructive proposal in the right place.  That means bugs in GCC Bugzilla 
for each issue, constructively engaging with the points of view other 
people present in response to those bugs, a meta-bug depending on all such 
bugs, and starting a discussion on the GCC mailing list pointing to that 
bug and offering your advice and expertise on e.g. how common different 
sorts of string function inputs are in different workloads.  The message 
in such discussions should be one of seeking collaborators to improve 
string function optimization in the GNU system as a whole, not "X sucks" 
or "only this particular form of benchmarking is valid" or "changing Y is 
easier than changing Z".

> Also you have other problem with gcc/header issue. I was asked if its
> possible to use functions with partially expanded header, so you would
> call new symbol like memcmp_aligned to save cost of checking header
> again. I don't think that adding a expansion in gcc is sane.

It seems perfectly reasonable for glibc to define *reserved-namespace* 
ABIs for such cases that GCC can then generate calls to (noting that GCC 
often has compile-time knowledge of alignment).  Indeed, the ARM EABI 
defines functions such as __aeabi_memcpy4 and __aeabi_memclr8 for aligned 
inputs - right now, those functions in glibc are wrappers for the generic 
ones, so slower than the generic ones, and GCC doesn't generate calls to 
them.  Changing just one without the other wouldn't be that useful - but 
as a collaboration, making the functions faster in glibc *and* making GCC 
generate calls using the alignment information it has, given new enough 
glibc on the target, would make sense.

Again, the principle is to work out collaboratively what's best for the 
GNU system as a whole (respecting that different people use the system in 
different ways and what's useful for one person may be problematic for 
others, so there is a need to compromise with other requirements than just 
the greatest optimization of particular programs), and to work as needed 
with experts on each part of the system to achieve those goals.  Not to 
focus only on the part of the system you're most familiar with.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-29 11:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-22 17:06 Old compiler optimizations in installed headers Joseph Myers
2015-05-22 21:36 ` Paul Eggert
2015-05-22 22:10   ` Joseph Myers
2015-05-22 22:19 ` Roland McGrath
2015-05-24 17:10 ` Ondřej Bílka
2015-05-27 20:04   ` Richard Henderson
2015-05-27 20:13     ` Ondřej Bílka
2015-05-25  2:37 ` Ondřej Bílka
2015-05-28 16:43   ` Joseph Myers
2015-05-29 11:03     ` Ondřej Bílka
2015-05-29 12:00       ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).