public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	Paul Eggert <eggert@cs.ucla.edu>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	"linux-man@vger.kernel.org" <linux-man@vger.kernel.org>
Cc: Alejandro Colomar <alx@kernel.org>,
	"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>,
	"G. Branden Robinson" <g.branden.robinson@gmail.com>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti...
Date: Fri, 6 Jan 2023 17:20:30 +0100	[thread overview]
Message-ID: <91de4293-9237-68d5-7743-9519f4259ecb@gmail.com> (raw)
In-Reply-To: <PAWPR08MB89821938473B0BCE72D99B9083FB9@PAWPR08MB8982.eurprd08.prod.outlook.com>


[-- Attachment #1.1: Type: text/plain, Size: 6275 bytes --]

Hi Wilco,

On 1/6/23 16:53, Wilco Dijkstra wrote:
> Hi Alex,
> 
>> Which C libraries never supported bzero(3)?  It was in POSIX once, so I guess
>> it's supported everywhere in Unix(-like) systems (you can see that I don't care
>> at all about other systems).  Even if only for backwards compatibility, the
>> removal from POSIX will have not affected the portability of bzero(3) in source
>> code (even where libc has removed it, the compiler will provide support).
> 
> These functions have caused portability issues since many UNIX systems didn't
> support them, neither did Windows nor most of the embedded world. So they
> always required extra work - if you do the codesearch on bzero you will find
> many examples of those hacks in existing code.
> 
> You may not care about anything outside Linux,

I do care about non-Linux.  What I don't care about is non-POSIX.

> but many libcs that support
> bzero are not optimized. Even GLIBC used a slow C implementation for bzero
> until we changed it to call memset. I have no idea what all other libs do (and
> given bzero is dead, it doesn't even matter), but bad performance is also a
> portability issue.
> 
>> So, I don't think that's a real problem yet.  We're not yet (or I believe so) in
>> a point where bzero(3) is non-portable in source code.
> 
> It never was portable or well optimized, which were reasons to deprecate it.

Okay, I guess I'll drop the bzero(3) patch for the man-pages, and maybe write an 
alternative patch documenting at least some of this discussion.

> 
>> Even simpler: it is unconditionally defined to memcpy() + len in a macro.
>> The reason (I guess) is that they didn't even know that mempcpy() exists.
> 
> But that means it will never use mempcpy - not in the source, not in the binary.
> So they have done the right thing, and there is no argument that adding or
> optimizing mempcpy in C libraries will improve nginx.
> 
>> Actually, gcc optimizes differently.  When you call mempcpy(3), since it knows
>> it exists, it calls it or replaces it by memcpy(3)+len, depending on
>> optimization flags.  When you call memcpy(3)+len, since it doesn't know if
>> mempcpy(3) is available, it keeps the memcpy(3) call always.
> 
> I don't care what -O0 does, what matters is that in almost all cases mempcpy
> gets optimized into memcpy, and that is the right thing to do.
> 
>> src/nxt_h1proto.c:2290:    p = nxt_cpymem(p, " HTTP/1.1\r\n", 11);
>> src/nxt_h1proto.c:2291:    p = nxt_cpymem(p, "Connection: close\r\n", 19);
> 
> Sure but one could equally argue that returning src + len is useful too:
> 
> p = memscpy (dest1, p, size1);
> p = memscpy (dest2, p, size2);

Be honest, how much do you think this would be used? ;)

> 
> Or return the size so you can easily keep track of how much bytes were copied:
> 
> copied += memncpy (dest1, src1, size1);
> copied += memncpy (dest2, src2, size2);

This kind of interface is error-prone.  In memcpy(3) it's not a problem, because 
there's not truncation, but in other functions it has been the cause of bugs.

snprintf(3) has that interface, and that one causes bugs.  p += snprintf() is 
always wrong.  If there's truncation (and you can't know it before-hand, or 
you'd be calling sprintf(3)), 'p' will overflow, and the behavior is undefined.

That's why I proposed stpeprintf() recently, to prevent such issues.

Having more interfaces promoting that kind of return value is wrong.  I'd rather 
remove all interfaces that return the length of the resulting string, when a 
pointer can be returned.

> 
> And there are lots of other possibilities.

Actually not many.

For an interface that copies n bytes from s to d, the possible return values are 
(ignoring the possibility of returning the result of arc4random(3) :P):

-  (void)
-  s
-  d
-  s + n
-  d + n
-  n

Ignoring void for obvious reasons...

s or s+n are useless, because memcpy(3) is way more frequently used to create a 
new string by catenation of many source strings, compared to the few times it is 
used to copy a single source string into many destination strings.

n interfaces are error-prone, and difficult to use; I wouldn't recommend adding 
yet another one.

d is useless, because that's a value that's already there before the call, and 
most calls to memcpy(3) that don't ignore the return value are going to add the 
length to it.  Isn't that a good hint that the good choice would have been d+n 
from the beginning?

> So who is to say that mempcpy is better
> than all these options?

I do.  And considering that there are projects that reimplement mempcpy(), but I 
haven't seen any that implements 'memscpy()' or 'memncpy()' (as suggested 
above), I'd say some other programmers also do.

> 
>>  From a source code point of view, they let programmers write better/simpler
>> source code than memcpy(3) or memset(3).  That's sugar... yes.  IMO, it's worth it.
> 
> Exactly, it's an opinion/personal preference. As I showed, there are other
> possible return values, so should we add all of these interfaces just because
> some people might like them?

I'd first wait to see where are those people :)

> 
>> Having it in libc rather than an external library has the benefit that it will
>> have support from the compiler (better warnings and optimizations).
> 
> No. Compiler and libc support are totally different things. If your library is
> deemed useful and used in lots of projects, it may be reasonable to add the
> headers to GLIBC. But this will not affect compiler optimization - it would
> use the same header and produce the same code.

GCC can see patterns in code and replace them by the function, but only if that 
function is standard (being in glibc is not enough).  If the function is in an 
external library, such optimizations can't be performed.

> 
> Cheers,
> Wilco

Cheers,

Alex

P.S.:  To compensate for the time I'm taking from you, I'm preparing a patch to 
remove rawmemchr(3) from glibc.  I'll send it when it's "ready".  Although since 
I haven't sent many patches to glibc, I guess it will still be far from ready ;)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-01-06 16:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 15:53 Wilco Dijkstra
2023-01-06 16:20 ` Alejandro Colomar [this message]
2023-01-06 17:01   ` Joseph Myers
  -- strict thread matches above, loose matches on Subject: below --
2023-01-06  2:26 Wilco Dijkstra
2023-01-06 13:49 ` Alejandro Colomar
2023-01-06  0:02 Wilco Dijkstra
2023-01-06  0:22 ` Alejandro Colomar
2023-01-06  0:57   ` Alejandro Colomar
2023-01-05 19:37 [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rtime.3, rtnetlink.3, strptime.3, NULL.3const, size_t.3type, void.3type, aio.7, netlink.7, unix.7: Prefer bzero(3) over memset(3) Alejandro Colomar
2023-01-05 20:48 ` Adhemerval Zanella Netto
2023-01-05 20:55   ` Paul Eggert
2023-01-05 21:12     ` [PATCH] bind.2, mount_setattr.2, openat2.2, perf_event_open.2, pidfd_send_signal.2, recvmmsg.2, seccomp_unotify.2, select_tut.2, sendmmsg.2, set_thread_area.2, sysctl.2, bzero.3, getaddrinfo.3, getaddrinfo_a.3, getutent.3, mbrtowc.3, mbsinit.3, rti Wilco Dijkstra
2023-01-05 21:33       ` Alejandro Colomar
2023-01-05 23:30       ` Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91de4293-9237-68d5-7743-9519f4259ecb@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=alx@kernel.org \
    --cc=brauner@kernel.org \
    --cc=eggert@cs.ucla.edu \
    --cc=g.branden.robinson@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).