From: Alejandro Colomar <alx.manpages@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: "'GNU C Library'" <libc-alpha@sourceware.org>,
"Cristian Rodríguez" <crrodriguez@opensuse.org>,
"Damian McGuckin" <damianm@esi.com.au>,
"G. Branden Robinson" <g.branden.robinson@gmail.com>,
Alexis <flexibeast@gmail.com>
Subject: Re: [manual]: rawmemchr(3) and UB
Date: Fri, 30 Dec 2022 15:16:51 +0100 [thread overview]
Message-ID: <40eb3667-b8a2-e250-0004-92ba8ca34e70@gmail.com> (raw)
In-Reply-To: <PAWPR08MB89824F0238DB1A045E0B9DF983F09@PAWPR08MB8982.eurprd08.prod.outlook.com>
[-- Attachment #1.1: Type: text/plain, Size: 4950 bytes --]
Hi Wilco,
On 12/30/22 14:13, Wilco Dijkstra via Libc-alpha wrote:
> Hi Alex,
>
>> It seems I misunderstood your email. I've seen that glibc implements
>> rawmemchr(3) in terms of strlen(3) and memchr(3). So it seems better to just
>> not implement this function in my library, and optimize strlen(3) directly. The
>> non-'\0' case seems useless, so probably not worth this function unless I see a
>> use for it.
>
> The idea is that compilers should treat it like mempcpy, bcopy etc and replace
> all uses with standard strlen/memchr. GCC/LLVM don't do this yet for rawmemchr.
>
> Since it is not in any standard and there is no benefit of having it, we should
> obsolete this function along with all the other GNU extensions.
I'm fine deprecating rawmemchr(3) in the manual page if you suggest it. I don't
see any use cases for it. If you confirm, I'll do it.
However,
I wouldn't obsolete many functions indiscriminately, since most can be very
useful for users:
bzero(3) is much more useful than memset(3). I've only used memset(3) for
something non-zero once in my life, IIRC. Writing bzero(p, n) is easier to get
right, and simpler.
mempcpy(3) is also much more useful than memcpy(3) (and in fact, It would be
great if glibc optimized mempcpy(3) and then implemented memcpy(3) in terms of it).
bcopy(3) is already deprecated in the manual page. That function is dead.
My opinion is that moving the responsibility of providing inline versions of
functions to the compiler, is just a consequence of the mess that libc is. Not
glibc, but every libc. It's a consequence of the turbulent design of the C
library, with huge headers that provide monolithic libraries, which have many
problems.
The solution would be to completely redesign the C library, without any regards
to backwards compatibility in mind (but continue reading, it gets better). If
we had let's say several dozens of micro libraries that each provide just a few
headers and functions, making most of the functions inline, the compiler
wouldn't need to know what substitutions to perform, with few exceptions.
For example, a libc-mem microlibrary could implement:
<c/mem/chr/memchr.h>
c_memchr() // in terms of c_memchrend()
* c_memchrend() // *like memchr(3), but return mem + size instead of NULL
<c/mem/chr/memrchr.h>
* c_memrchr()
<c/mem/cmp/memcmp.h>
* c_memcmp()
<c/mem/cpy/memcpy.h>
* c_mempcpy()
c_memcpy() // in terms of c_mempcpy()
<c/mem/mem/memmem.h>
* c_memmem()
<c/mem/mv/memmove.h>
* c_mempmove()
c_memmove() // in terms of c_mempmove()
<c/mem/set/memset.h>
* c_mempset()
c_memset() // in terms of c_mempset()
<c/mem/set/memzero.h>
c_mempzero() // in terms of c_mempset()
c_bzero() // in terms of c_mempzero()
Functions with a '*' would be the primitives, the ones that are optimized, and
the others just wrappers around them.
See <http://www.alejandro-colomar.es/src/alx/alx/libc/libc-mem.git/tree/include>
Then, you could write a compatibility layer for the standard organization of
headers would just include the necessary headers in the common ones (e.g.,
<string.h>), and alias without the c_* prefix (or __*, if you prefer it).
Having the function definitions inline allows the compiler to see the entire
dependency until the primitive definitions, and optimize as much as is possible.
That would even allow the kernel to use a large portion of the userspace C
library: just link statically to the micro-libraries, but the inline definitions
are fine to use inside a kernel.
I wrote a proof of concept, with already half a dozen of those micro-libraries
just for fun here:
<http://www.alejandro-colomar.es/src/alx/alx/libc/>
For now, I wrote the primitives as calls to glibc, but it would be easy to flip
the dependency so that glibc depends on the microlibraries, if they were
extended enough for that.
In some benchmark I wrote recently for a string-copying function that I proposed
for glibc (stpecpy(3)), this library of mine outperforms any other definition of
it by a very large margin, just by making it inline. Of course, I expect that
if enough code is added to GCC, using the normal definition would be as fast,
but the point is that this doesn't require optimizing code in the compiler to
get really fast code.
And another advantage of that model of a C library is that it allows replacing a
single microlibrary, instead of having to replace the entire libc. If I prefer
the string-copying functions of library X, but the rest I prefer it from library
Y, I could mix'n'match them easily.
Disadvantages: C89 is forbidden (no inline, or GNU inline, which is worse).
So, it has a long list of advantages over the traditional libc. Maybe it's
worth thinking about it for the future. :)
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-12-30 14:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-30 13:13 Wilco Dijkstra
2022-12-30 14:16 ` Alejandro Colomar [this message]
-- strict thread matches above, loose matches on Subject: below --
2023-01-04 19:41 Wilco Dijkstra
2023-01-04 20:05 ` Alejandro Colomar
2023-01-04 20:19 ` G. Branden Robinson
2023-01-04 20:34 ` Alejandro Colomar
2023-01-05 12:21 ` G. Branden Robinson
2022-12-29 19:19 Alejandro Colomar
2022-12-29 19:27 ` Alejandro Colomar
2022-12-29 19:45 ` Cristian Rodríguez
2022-12-29 19:50 ` Alejandro Colomar
2022-12-30 10:31 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40eb3667-b8a2-e250-0004-92ba8ca34e70@gmail.com \
--to=alx.manpages@gmail.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=crrodriguez@opensuse.org \
--cc=damianm@esi.com.au \
--cc=flexibeast@gmail.com \
--cc=g.branden.robinson@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).