public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
@ 2023-01-06  0:02 Wilco Dijkstra
  2023-01-06  0:22 ` Alejandro Colomar
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2023-01-06  0:02 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson

Hi Alex,

> There are many users of bzero(3) in the wild, and it is a fine API from a 
> usability point of view. 

Since you repeatedly claim lots of use of these functions, I did a quick search
on https://codesearch.debian.net/

bzero: 21440
memset: 563054

mempcpy: 4489
memcpy: 692873

I used "memcpy(" and "memcpy (" and added the results. These overestimate
usage due to prototypes and comments, and don't include memcpy and memset
calls emitted by compilers so in reality the results are even more skewed.

There may be other repositories which can be easily searched, but these results
are clear enough to conclude these functions are dead.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-06  0:02 [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti Wilco Dijkstra
@ 2023-01-06  0:22 ` Alejandro Colomar
  2023-01-06  0:57   ` Alejandro Colomar
  0 siblings, 1 reply; 11+ messages in thread
From: Alejandro Colomar @ 2023-01-06  0:22 UTC (permalink / raw)
  To: Wilco Dijkstra, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson


[-- Attachment #1.1: Type: text/plain, Size: 2629 bytes --]

Hi Wilco,

On 1/6/23 01:02, Wilco Dijkstra wrote:
> Hi Alex,
> 
>> There are many users of bzero(3) in the wild, and it is a fine API from a
>> usability point of view.
> 
> Since you repeatedly claim lots of use of these functions, I did a quick search
> on https://codesearch.debian.net/
> 
> bzero: 21440
> memset: 563054
> 
> mempcpy: 4489
> memcpy: 692873
> 
> I used "memcpy(" and "memcpy (" and added the results. These overestimate
> usage due to prototypes and comments, and don't include memcpy and memset
> calls emitted by compilers so in reality the results are even more skewed.

Many projects redefine those functions themselves, with alternative names, so 
it's hard to really count how much is the intention of projects to use it, 
rather than actual use.  Since the standards don't guarantee such functions, 
projects that care a lot, use a portable name (one that isn't reserved; 
sometimes they don't even know that there's a GNU extension with that name and 
use a weird one, such as cpymem() by nginx).

Projects that prefer portability and don't care about using good APIs so much 
will fall back to the standard APIs, which is most projects, so of course the 
numbers are not comparable in the wild.

The thing is that those APIs are better (imagine that they were all standard, 
and were all equally known by programmers; which ones would you use?).  Some 
programmers will want to use the better APIs, independently of libc providing it 
or not.  In some cases, for high performance programs, good APIs are even more 
relevant.  Not implementing them in libc, will only mean that projects will roll 
their own.

I'm not saying that's bad either.  If we want to simplify libc, and add some 
extension libraries that are independent of libc, that would provide such 
functions, that's fine by me.  And maybe it's the better thing to do.

So, whether you like it or not, a relevant number of programs (although, as you 
proved, a small one compared to the entire universe of programs) will keep using 
an API that asks for a pointer and a size and zeroes it.  We can call it 
bzero(3), or memzero(3), or ext_memzero(3) (ext_ being a prefix for a library 
providing useful extensions to libc), but the function will be there.

Where do you suggest that we put such function?  In or out of libc?

> 
> There may be other repositories which can be easily searched, but these results
> are clear enough to conclude these functions are dead.
I'd say dead is too much.

Cheers,

Alex

> 
> Cheers,
> Wilco

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-06  0:22 ` Alejandro Colomar
@ 2023-01-06  0:57   ` Alejandro Colomar
  0 siblings, 0 replies; 11+ messages in thread
From: Alejandro Colomar @ 2023-01-06  0:57 UTC (permalink / raw)
  To: Wilco Dijkstra, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson


[-- Attachment #1.1: Type: text/plain, Size: 3836 bytes --]

Hi Wilco,

On 1/6/23 01:22, Alejandro Colomar wrote:
> Hi Wilco,
> 
> On 1/6/23 01:02, Wilco Dijkstra wrote:
>> Hi Alex,
>>
>>> There are many users of bzero(3) in the wild, and it is a fine API from a
>>> usability point of view.
>>
>> Since you repeatedly claim lots of use of these functions, I did a quick search
>> on https://codesearch.debian.net/
>>
>> bzero: 21440
>> memset: 563054
>>
>> mempcpy: 4489
>> memcpy: 692873

For comparison:

strcat: 130785
stpcpy: 8960

They compete for the same functionality.  stpcpy(3) is a lot less used than 
strcat(3).  Not because it's dead, but because it only became standard in 
POSIX.1-2008, while the other has been then since forever, and has been 
"promoted" by ISO and POSIX for a long time.  There's no clear winner on which 
API is better, assuming an optimizing compiler; it depends on what you do with them.

Another one:

strncat: 17091
strlcat: 13989

strlcat(3) is 99% of the time what users should call.  Yet they call it less 
than strncat(3).

Portability seems to be the main driver of those numbers.  Luckily, the 
strlcat(3) numbers are not so bad compared to strncat(3).  However, I still 
wonder if all those uses of strcat(3) are really safe.

Of course, having POSIX try to kill bzero(3), or that it hasn't yet considered 
mempcpy(3), hasn't helped the numbers; but it doesn't mean that programmers 
wouldn't happy with them being blessed by ISO/POSIX.

Cheers,

Alex

>>
>> I used "memcpy(" and "memcpy (" and added the results. These overestimate
>> usage due to prototypes and comments, and don't include memcpy and memset
>> calls emitted by compilers so in reality the results are even more skewed.
> 
> Many projects redefine those functions themselves, with alternative names, so 
> it's hard to really count how much is the intention of projects to use it, 
> rather than actual use.  Since the standards don't guarantee such functions, 
> projects that care a lot, use a portable name (one that isn't reserved; 
> sometimes they don't even know that there's a GNU extension with that name and 
> use a weird one, such as cpymem() by nginx).
> 
> Projects that prefer portability and don't care about using good APIs so much 
> will fall back to the standard APIs, which is most projects, so of course the 
> numbers are not comparable in the wild.
> 
> The thing is that those APIs are better (imagine that they were all standard, 
> and were all equally known by programmers; which ones would you use?).  Some 
> programmers will want to use the better APIs, independently of libc providing it 
> or not.  In some cases, for high performance programs, good APIs are even more 
> relevant.  Not implementing them in libc, will only mean that projects will roll 
> their own.
> 
> I'm not saying that's bad either.  If we want to simplify libc, and add some 
> extension libraries that are independent of libc, that would provide such 
> functions, that's fine by me.  And maybe it's the better thing to do.
> 
> So, whether you like it or not, a relevant number of programs (although, as you 
> proved, a small one compared to the entire universe of programs) will keep using 
> an API that asks for a pointer and a size and zeroes it.  We can call it 
> bzero(3), or memzero(3), or ext_memzero(3) (ext_ being a prefix for a library 
> providing useful extensions to libc), but the function will be there.
> 
> Where do you suggest that we put such function?  In or out of libc?
> 
>>
>> There may be other repositories which can be easily searched, but these results
>> are clear enough to conclude these functions are dead.
> I'd say dead is too much.
> 
> Cheers,
> 
> Alex
> 
>>
>> Cheers,
>> Wilco
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-06 16:20 ` Alejandro Colomar
@ 2023-01-06 17:01   ` Joseph Myers
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Myers @ 2023-01-06 17:01 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Wilco Dijkstra, Paul Eggert, Adhemerval Zanella Netto, linux-man,
	Alejandro Colomar, libc-alpha, G. Branden Robinson,
	Christian Brauner

On Fri, 6 Jan 2023, Alejandro Colomar via Libc-alpha wrote:

> P.S.:  To compensate for the time I'm taking from you, I'm preparing a patch
> to remove rawmemchr(3) from glibc.  I'll send it when it's "ready".  Although

Don't waste your time.  The criterion for removing a function is something 
close to "never did anything meaningful or useful; no binary could ever 
have successfully used the function with the ABI with which it was 
defined"; otherwise we'd need a compelling case for how removing the 
function can't break any existing binaries or sources we'd care about.  
Even turning a function into a compat symbol, so making it unavailable to 
new programs (while keeping it for existing binaries and keeping glibc 
tests of that compat symbol), requires a much stronger deprecation basis 
than "there are standard functions that are just as good".  And marking 
functions as deprecated with an attribute in headers - normally desirable 
several releases before making a function into a compat symbol, at least - 
also needs such a stronger deprecation basis; nothing in this thread 
suggests any basis for marking *any* string functions as deprecated to me.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-06 15:53 Wilco Dijkstra
@ 2023-01-06 16:20 ` Alejandro Colomar
  2023-01-06 17:01   ` Joseph Myers
  0 siblings, 1 reply; 11+ messages in thread
From: Alejandro Colomar @ 2023-01-06 16:20 UTC (permalink / raw)
  To: Wilco Dijkstra, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson, Christian Brauner


[-- Attachment #1.1: Type: text/plain, Size: 6275 bytes --]

Hi Wilco,

On 1/6/23 16:53, Wilco Dijkstra wrote:
> Hi Alex,
> 
>> Which C libraries never supported bzero(3)?  It was in POSIX once, so I guess
>> it's supported everywhere in Unix(-like) systems (you can see that I don't care
>> at all about other systems).  Even if only for backwards compatibility, the
>> removal from POSIX will have not affected the portability of bzero(3) in source
>> code (even where libc has removed it, the compiler will provide support).
> 
> These functions have caused portability issues since many UNIX systems didn't
> support them, neither did Windows nor most of the embedded world. So they
> always required extra work - if you do the codesearch on bzero you will find
> many examples of those hacks in existing code.
> 
> You may not care about anything outside Linux,

I do care about non-Linux.  What I don't care about is non-POSIX.

> but many libcs that support
> bzero are not optimized. Even GLIBC used a slow C implementation for bzero
> until we changed it to call memset. I have no idea what all other libs do (and
> given bzero is dead, it doesn't even matter), but bad performance is also a
> portability issue.
> 
>> So, I don't think that's a real problem yet.  We're not yet (or I believe so) in
>> a point where bzero(3) is non-portable in source code.
> 
> It never was portable or well optimized, which were reasons to deprecate it.

Okay, I guess I'll drop the bzero(3) patch for the man-pages, and maybe write an 
alternative patch documenting at least some of this discussion.

> 
>> Even simpler: it is unconditionally defined to memcpy() + len in a macro.
>> The reason (I guess) is that they didn't even know that mempcpy() exists.
> 
> But that means it will never use mempcpy - not in the source, not in the binary.
> So they have done the right thing, and there is no argument that adding or
> optimizing mempcpy in C libraries will improve nginx.
> 
>> Actually, gcc optimizes differently.  When you call mempcpy(3), since it knows
>> it exists, it calls it or replaces it by memcpy(3)+len, depending on
>> optimization flags.  When you call memcpy(3)+len, since it doesn't know if
>> mempcpy(3) is available, it keeps the memcpy(3) call always.
> 
> I don't care what -O0 does, what matters is that in almost all cases mempcpy
> gets optimized into memcpy, and that is the right thing to do.
> 
>> src/nxt_h1proto.c:2290:    p = nxt_cpymem(p, " HTTP/1.1\r\n", 11);
>> src/nxt_h1proto.c:2291:    p = nxt_cpymem(p, "Connection: close\r\n", 19);
> 
> Sure but one could equally argue that returning src + len is useful too:
> 
> p = memscpy (dest1, p, size1);
> p = memscpy (dest2, p, size2);

Be honest, how much do you think this would be used? ;)

> 
> Or return the size so you can easily keep track of how much bytes were copied:
> 
> copied += memncpy (dest1, src1, size1);
> copied += memncpy (dest2, src2, size2);

This kind of interface is error-prone.  In memcpy(3) it's not a problem, because 
there's not truncation, but in other functions it has been the cause of bugs.

snprintf(3) has that interface, and that one causes bugs.  p += snprintf() is 
always wrong.  If there's truncation (and you can't know it before-hand, or 
you'd be calling sprintf(3)), 'p' will overflow, and the behavior is undefined.

That's why I proposed stpeprintf() recently, to prevent such issues.

Having more interfaces promoting that kind of return value is wrong.  I'd rather 
remove all interfaces that return the length of the resulting string, when a 
pointer can be returned.

> 
> And there are lots of other possibilities.

Actually not many.

For an interface that copies n bytes from s to d, the possible return values are 
(ignoring the possibility of returning the result of arc4random(3) :P):

-  (void)
-  s
-  d
-  s + n
-  d + n
-  n

Ignoring void for obvious reasons...

s or s+n are useless, because memcpy(3) is way more frequently used to create a 
new string by catenation of many source strings, compared to the few times it is 
used to copy a single source string into many destination strings.

n interfaces are error-prone, and difficult to use; I wouldn't recommend adding 
yet another one.

d is useless, because that's a value that's already there before the call, and 
most calls to memcpy(3) that don't ignore the return value are going to add the 
length to it.  Isn't that a good hint that the good choice would have been d+n 
from the beginning?

> So who is to say that mempcpy is better
> than all these options?

I do.  And considering that there are projects that reimplement mempcpy(), but I 
haven't seen any that implements 'memscpy()' or 'memncpy()' (as suggested 
above), I'd say some other programmers also do.

> 
>>  From a source code point of view, they let programmers write better/simpler
>> source code than memcpy(3) or memset(3).  That's sugar... yes.  IMO, it's worth it.
> 
> Exactly, it's an opinion/personal preference. As I showed, there are other
> possible return values, so should we add all of these interfaces just because
> some people might like them?

I'd first wait to see where are those people :)

> 
>> Having it in libc rather than an external library has the benefit that it will
>> have support from the compiler (better warnings and optimizations).
> 
> No. Compiler and libc support are totally different things. If your library is
> deemed useful and used in lots of projects, it may be reasonable to add the
> headers to GLIBC. But this will not affect compiler optimization - it would
> use the same header and produce the same code.

GCC can see patterns in code and replace them by the function, but only if that 
function is standard (being in glibc is not enough).  If the function is in an 
external library, such optimizations can't be performed.

> 
> Cheers,
> Wilco

Cheers,

Alex

P.S.:  To compensate for the time I'm taking from you, I'm preparing a patch to 
remove rawmemchr(3) from glibc.  I'll send it when it's "ready".  Although since 
I haven't sent many patches to glibc, I guess it will still be far from ready ;)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
@ 2023-01-06 15:53 Wilco Dijkstra
  2023-01-06 16:20 ` Alejandro Colomar
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2023-01-06 15:53 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson

Hi Alex,

> Which C libraries never supported bzero(3)?  It was in POSIX once, so I guess 
> it's supported everywhere in Unix(-like) systems (you can see that I don't care 
> at all about other systems).  Even if only for backwards compatibility, the 
> removal from POSIX will have not affected the portability of bzero(3) in source 
> code (even where libc has removed it, the compiler will provide support).

These functions have caused portability issues since many UNIX systems didn't
support them, neither did Windows nor most of the embedded world. So they
always required extra work - if you do the codesearch on bzero you will find
many examples of those hacks in existing code.

You may not care about anything outside Linux, but many libcs that support
bzero are not optimized. Even GLIBC used a slow C implementation for bzero
until we changed it to call memset. I have no idea what all other libs do (and
given bzero is dead, it doesn't even matter), but bad performance is also a
portability issue.

> So, I don't think that's a real problem yet.  We're not yet (or I believe so) in 
> a point where bzero(3) is non-portable in source code.

It never was portable or well optimized, which were reasons to deprecate it.

> Even simpler: it is unconditionally defined to memcpy() + len in a macro.
> The reason (I guess) is that they didn't even know that mempcpy() exists.

But that means it will never use mempcpy - not in the source, not in the binary.
So they have done the right thing, and there is no argument that adding or
optimizing mempcpy in C libraries will improve nginx.

> Actually, gcc optimizes differently.  When you call mempcpy(3), since it knows 
> it exists, it calls it or replaces it by memcpy(3)+len, depending on 
> optimization flags.  When you call memcpy(3)+len, since it doesn't know if 
> mempcpy(3) is available, it keeps the memcpy(3) call always.

I don't care what -O0 does, what matters is that in almost all cases mempcpy
gets optimized into memcpy, and that is the right thing to do.

> src/nxt_h1proto.c:2290:    p = nxt_cpymem(p, " HTTP/1.1\r\n", 11);
> src/nxt_h1proto.c:2291:    p = nxt_cpymem(p, "Connection: close\r\n", 19);

Sure but one could equally argue that returning src + len is useful too:

p = memscpy (dest1, p, size1);
p = memscpy (dest2, p, size2);

Or return the size so you can easily keep track of how much bytes were copied:

copied += memncpy (dest1, src1, size1);
copied += memncpy (dest2, src2, size2);

And there are lots of other possibilities. So who is to say that mempcpy is better
than all these options?

> From a source code point of view, they let programmers write better/simpler 
> source code than memcpy(3) or memset(3).  That's sugar... yes.  IMO, it's worth it.

Exactly, it's an opinion/personal preference. As I showed, there are other
possible return values, so should we add all of these interfaces just because
some people might like them?

> Having it in libc rather than an external library has the benefit that it will 
> have support from the compiler (better warnings and optimizations).

No. Compiler and libc support are totally different things. If your library is
deemed useful and used in lots of projects, it may be reasonable to add the
headers to GLIBC. But this will not affect compiler optimization - it would
use the same header and produce the same code.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-06  2:26 Wilco Dijkstra
@ 2023-01-06 13:49 ` Alejandro Colomar
  0 siblings, 0 replies; 11+ messages in thread
From: Alejandro Colomar @ 2023-01-06 13:49 UTC (permalink / raw)
  To: Wilco Dijkstra, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson


[-- Attachment #1.1: Type: text/plain, Size: 5135 bytes --]

Hi Wilco,

On 1/6/23 03:26, Wilco Dijkstra wrote:
> Hi Alex,
> 
>> Many projects redefine those functions themselves, with alternative names, so
>> it's hard to really count how much is the intention of projects to use it,
>> rather than actual use.  Since the standards don't guarantee such functions,
>> projects that care a lot, use a portable name (one that isn't reserved;
>> sometimes they don't even know that there's a GNU extension with that name and
>> use a weird one, such as cpymem() by nginx).
> 
> Yeah portability is a big issue with these non-standard functions. So even if you
> aren't considering the large cost of supporting these functions in C libraries, there
> are also costs in making applications portable, precisely because not all C libraries
> will support it...
> 
>> The thing is that those APIs are better (imagine that they were all standard,
>> and were all equally known by programmers; which ones would you use?).  Some
>> programmers will want to use the better APIs, independently of libc providing it
>> or not.  In some cases, for high performance programs, good APIs are even more
>> relevant.  Not implementing them in libc, will only mean that projects will roll
>> their own.
> 
> No, the use of non-standard functions is the problem here. bzero was deprecated
> more than 20 years ago, do you think C libraries will add support and optimize it
> even if they never supported it before?

Which C libraries never supported bzero(3)?  It was in POSIX once, so I guess 
it's supported everywhere in Unix(-like) systems (you can see that I don't care 
at all about other systems).  Even if only for backwards compatibility, the 
removal from POSIX will have not affected the portability of bzero(3) in source 
code (even where libc has removed it, the compiler will provide support).

> If it's non-standard, it's never going to
> happen.

So, I don't think that's a real problem yet.  We're not yet (or I believe so) in 
a point where bzero(3) is non-portable in source code.

> 
> If we continue with the mempcpy vs memcpy example of nginx, I presume
> nginx implements cpymem() similar to this:
> 
> #if HAVE_MEMPCPY_SUPPORT
>    return mempcpy (p, q, n);
> #else
>    return memcpy (p, q, n) + n;
> #endif
> 
> The define would be set by a special configure check.

Even simpler: it is unconditionally defined to memcpy() + len in a macro.

The reason (I guess) is that they didn't even know that mempcpy() exists.

> 
> Now if nginx got say 10% faster from using mempcpy then that would
> be great and it would be worth the trouble. However there is no difference
> since compilers typically generate identical code for these cases.

Actually, gcc optimizes differently.  When you call mempcpy(3), since it knows 
it exists, it calls it or replaces it by memcpy(3)+len, depending on 
optimization flags.  When you call memcpy(3)+len, since it doesn't know if 
mempcpy(3) is available, it keeps the memcpy(3) call always.

> So what's
> the point of mempcpy exactly?

The point of mempcpy(3) is that it's the simplest libc API to catenate strings 
when you know the length of the source strings, and you also want to know the 
length of the resulting string, and you know there will be no truncation.

Example:

src/nxt_h1proto.c:2287:    p = nxt_cpymem(p, r->method->start, r->method->length);
src/nxt_h1proto.c-2288-    *p++ = ' ';
src/nxt_h1proto.c:2289:    p = nxt_cpymem(p, r->target.start, r->target.length);
src/nxt_h1proto.c:2290:    p = nxt_cpymem(p, " HTTP/1.1\r\n", 11);
src/nxt_h1proto.c:2291:    p = nxt_cpymem(p, "Connection: close\r\n", 19);


Any other function will either be slower (stpcpy(3) will likely be slower), or 
make the code more complex (memcpy(3) will require adding +... everywhere).

I'm not saying that this will be significantly faster than memcpy(3), but it 
will be at least as fast (and negligibly faster if libc optimized mempcpy(3), 
but that's negligible).

> 
> By all means, create your own special copy interface function - it's just sugar.
> But deciding that mempcpy is great and then being forced to do extra work
> to make it portable for no gain is what I find insane...

 From a source code point of view, they let programmers write better/simpler 
source code than memcpy(3) or memset(3).  That's sugar... yes.  IMO, it's worth it.

> 
>> Where do you suggest that we put such function?  In or out of libc?
> 
> Well you mentioned that nginx and many other programs already define their
> own memcpy variants. It's perfectly reasonable to do what you proposed and
> create a library of inline string functions using standard calls as primitives.
> If it is a freely usable and portable, any project that likes it could just add it.

Having it in libc rather than an external library has the benefit that it will 
have support from the compiler (better warnings and optimizations).

But yes, for the time being, I'll keep developing such an external library.

> 
> Cheers,
> Wilco

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
@ 2023-01-06  2:26 Wilco Dijkstra
  2023-01-06 13:49 ` Alejandro Colomar
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2023-01-06  2:26 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson

Hi Alex,

> Many projects redefine those functions themselves, with alternative names, so 
> it's hard to really count how much is the intention of projects to use it, 
> rather than actual use.  Since the standards don't guarantee such functions, 
> projects that care a lot, use a portable name (one that isn't reserved; 
> sometimes they don't even know that there's a GNU extension with that name and 
> use a weird one, such as cpymem() by nginx).

Yeah portability is a big issue with these non-standard functions. So even if you
aren't considering the large cost of supporting these functions in C libraries, there
are also costs in making applications portable, precisely because not all C libraries
will support it...

> The thing is that those APIs are better (imagine that they were all standard, 
> and were all equally known by programmers; which ones would you use?).  Some 
> programmers will want to use the better APIs, independently of libc providing it 
> or not.  In some cases, for high performance programs, good APIs are even more 
> relevant.  Not implementing them in libc, will only mean that projects will roll 
> their own.

No, the use of non-standard functions is the problem here. bzero was deprecated
more than 20 years ago, do you think C libraries will add support and optimize it
even if they never supported it before? If it's non-standard, it's never going to
happen.

If we continue with the mempcpy vs memcpy example of nginx, I presume
nginx implements cpymem() similar to this:

#if HAVE_MEMPCPY_SUPPORT
  return mempcpy (p, q, n);
#else
  return memcpy (p, q, n) + n;
#endif

The define would be set by a special configure check.

Now if nginx got say 10% faster from using mempcpy then that would
be great and it would be worth the trouble. However there is no difference
since compilers typically generate identical code for these cases. So what's
the point of mempcpy exactly?

By all means, create your own special copy interface function - it's just sugar.
But deciding that mempcpy is great and then being forced to do extra work
to make it portable for no gain is what I find insane...

> Where do you suggest that we put such function?  In or out of libc?

Well you mentioned that nginx and many other programs already define their
own memcpy variants. It's perfectly reasonable to do what you proposed and
create a library of inline string functions using standard calls as primitives.
If it is a freely usable and portable, any project that likes it could just add it.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-05 21:12     ` [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti Wilco Dijkstra
  2023-01-05 21:33       ` Alejandro Colomar
@ 2023-01-05 23:30       ` Wilco Dijkstra
  1 sibling, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2023-01-05 23:30 UTC (permalink / raw)
  To: Paul Eggert, Adhemerval Zanella Netto, Alejandro Colomar, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson

Hi Alex,

> That's a good counterargument for the silly mistakes point.  But the cognitive 
> load that programmers need to care about the extra useless argument for no good 
> reason is still a problem of the memset(3) API that bszero(3) simply hasn't.

There is also the cognitive load of having to learn yet another interface. There is
also the overhead of libraries having to implement yet another function. Plus
compilers optimizing it. Maintaining, testing, and documenting it. And so on.

Why would anyone invest all this effort if there isn't a significant gain after all that?
So a new interface must be significantly and measurably better to be worth all this
work. More than 20 years ago people decided that it is not worth it for bzero and
various other functions given they have almost identical equivalents in the C
standard which were already supported on all targets and in most cases better
optimized. One of the most common portability issues was the lack of bcopy and
bzero which lead to hacky and buggy workarounds.

> I'd like to get a rationale for why we should promote strnlen(3) but not 
> bzero(3) that doesn't reduce to "it is standard".  Why would the standard cover 
> on and not the other?

Firstly memchr is not an equivalent of strnlen, it would be something like:

tmp = memchr (p, '\0', n);
len = (tmp == NULL) ? n : tmp - p;

Be honest, would you really prefer writing that over strnlen (p)?

And neither does memchr have the same performance. Searching for zero is typically
faster than searching for any character, so a well optimized strnlen should beat memchr.
Note that doesn't mean it is unreasonable for a generic strnlen to call memchr - one
typically starts optimizing the C standard functions, and generic string functions use
those as primitives if no optimized version is available (yet).

An optimized bzero function wouldn't be faster than memset - while you might need
a check for zero (or duplicate the input byte), that is a small overhead that is hard to
measure on modern hardware. We had a proposal for adding memzero/memclr/
memclear/memset0/memset_zero/... a while back so I measured it and concluded
there is just no benefit. A few decades ago I was programming on an 8MHz in-order
core and every single cycle&byte mattered then, but it's a very different world today!

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-05 21:12     ` [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti Wilco Dijkstra
@ 2023-01-05 21:33       ` Alejandro Colomar
  2023-01-05 23:30       ` Wilco Dijkstra
  1 sibling, 0 replies; 11+ messages in thread
From: Alejandro Colomar @ 2023-01-05 21:33 UTC (permalink / raw)
  To: Wilco Dijkstra, Paul Eggert, Adhemerval Zanella Netto, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson


[-- Attachment #1.1: Type: text/plain, Size: 3365 bytes --]

Hi Wilco, Paul, and Adhemerval,

On 1/5/23 22:12, Wilco Dijkstra wrote:
> Hi,
> 
>>> bzero is deprecated by POSIX.1-2001, removed by POSIX.1-2008, and on glibc
>>> implementation now calls memset (previously some architecture added ifunc
>>> redirection to optimized bzero to avoid the extra function call, it was
>>> removed from all architectures).

Sure, POSIX prefers memset(3).  But why?  "Because it is standard" isn't a valid 
reasoning, because POSIX decides what is standard, so it would be circular 
reasoning.

Anyway, the fact that libc doesn't provide it is not a problem for callers: the 
compiler provides it.  And even if the compiler dropped support, one can write 
it with a one liner, which even POSIX once recommended[1]:

     #define bzero(b,len) (memset((b), '\0', (len)), (void) 0)


[1]:  <https://pubs.opengroup.org/onlinepubs/009695399/functions/bzero.html>

>>>
>>> Also, GCC for some time also replaces bzero with memset so there is no gain
>>> in actually call bzero (check glibc commit 9403b71ae97e3f1a91c796ddcbb4e6f044434734).

No gain in generated code; but there's a gain in source code: less cognitive 
load.  Instead of 3 arguments, the order of which is not easy to remember, the 
programmer only needs to care about 2, and they're obvious.

> 
> Agreed, there is no benefit from using it, and certainly no reason to try to undo
> its removal. We should promote standards, not try to subvert them...

I generally promote POSIX; but at the same time I'm not blindly following it, 
and when its decisions don't make sense I will deviate from it.

> A more productive way would be to propose new functions to the C/C++
> committees.

I believe in existing practice as a way of improving the standards.  IMO, it 
doesn't make sense to ask POSIX to add a function, if no-one uses it.

So I think it should be fine to recommend using a well designed API, and use 
that as a way to increase its use, hopefully resulting long-term in a 
reincorporation to POSIX.

I also defend mempcpy(3) over any POSIX or ISO alternatives.  It's simply 
superior to the alternatives, and POSIX should standardize it (and I'd say even 
deprecate memcpy(3), which is a misdesigned API, but I won't go that far for now).

> 
>> In addition, gcc -Wall warns if you mistakenly pass 0 as memset's 3rd
>> arg, which undercuts the argument that bzero avoids silly mistakes.
> 
> Also I think GCC should give a deprecated/obsolete warning (or perhaps error?)
> when using bzero, bcmp, bcopy, index, rindex etc so people can start removing
> the few remaining uses from ancient code.

There are many users of bzero(3) in the wild, and it is a fine API from a 
usability point of view.  Not being promoted by POSIX or ISO is not enough to 
warn about its use.  With that reasoning, we should warn about many GNU 
extensions, and some of them are really fine (so much that many ended up in POSIX).

However, it would be fine to warn about those that are error-prone:

-  bcopy(3)

And also warn about those that have a drop-in replacement in POSIX:

-  bcmp(3)
-  index(3)
-  rindex(3)

I would endorse warning about those.  BTW, I'll rewrite the bcmp(3) page to say 
it's identical to memcmp(3).

> 
> Cheers,
> Wilco

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
  2023-01-05 20:55   ` Paul Eggert
@ 2023-01-05 21:12     ` Wilco Dijkstra
  2023-01-05 21:33       ` Alejandro Colomar
  2023-01-05 23:30       ` Wilco Dijkstra
  0 siblings, 2 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2023-01-05 21:12 UTC (permalink / raw)
  To: Paul Eggert, Adhemerval Zanella Netto, Alejandro Colomar, linux-man
  Cc: Alejandro Colomar, libc-alpha, G. Branden Robinson

Hi,

>> bzero is deprecated by POSIX.1-2001, removed by POSIX.1-2008, and on glibc
>> implementation now calls memset (previously some architecture added ifunc
>> redirection to optimized bzero to avoid the extra function call, it was
>> removed from all architectures).
>> 
>> Also, GCC for some time also replaces bzero with memset so there is no gain
>> in actually call bzero (check glibc commit 9403b71ae97e3f1a91c796ddcbb4e6f044434734).

Agreed, there is no benefit from using it, and certainly no reason to try to undo
its removal. We should promote standards, not try to subvert them...
A more productive way would be to propose new functions to the C/C++
committees.

> In addition, gcc -Wall warns if you mistakenly pass 0 as memset's 3rd 
> arg, which undercuts the argument that bzero avoids silly mistakes.

Also I think GCC should give a deprecated/obsolete warning (or perhaps error?)
when using bzero, bcmp, bcopy, index, rindex etc so people can start removing
the few remaining uses from ancient code.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-01-06 17:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-06  0:02 [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti Wilco Dijkstra
2023-01-06  0:22 ` Alejandro Colomar
2023-01-06  0:57   ` Alejandro Colomar
  -- strict thread matches above, loose matches on Subject: below --
2023-01-06 15:53 Wilco Dijkstra
2023-01-06 16:20 ` Alejandro Colomar
2023-01-06 17:01   ` Joseph Myers
2023-01-06  2:26 Wilco Dijkstra
2023-01-06 13:49 ` Alejandro Colomar
2023-01-05 19:37 [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rtime.3, rtnetlink.3, strptime.3, NULL.3const, size_t.3type, void.3type, aio.7, netlink.7, unix.7: Prefer bzero(3) over memset(3) Alejandro Colomar
2023-01-05 20:48 ` Adhemerval Zanella Netto
2023-01-05 20:55   ` Paul Eggert
2023-01-05 21:12     ` [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti Wilco Dijkstra
2023-01-05 21:33       ` Alejandro Colomar
2023-01-05 23:30       ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).