public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: "'GNU C Library'" <libc-alpha@sourceware.org>,
	"Cristian Rodríguez" <crrodriguez@opensuse.org>,
	"Damian McGuckin" <damianm@esi.com.au>,
	"G. Branden Robinson" <g.branden.robinson@gmail.com>,
	Alexis <flexibeast@gmail.com>
Subject: Re: [manual]: rawmemchr(3) and UB
Date: Fri, 30 Dec 2022 15:16:51 +0100	[thread overview]
Message-ID: <40eb3667-b8a2-e250-0004-92ba8ca34e70@gmail.com> (raw)
In-Reply-To: <PAWPR08MB89824F0238DB1A045E0B9DF983F09@PAWPR08MB8982.eurprd08.prod.outlook.com>


[-- Attachment #1.1: Type: text/plain, Size: 4950 bytes --]

Hi Wilco,

On 12/30/22 14:13, Wilco Dijkstra via Libc-alpha wrote:
> Hi Alex,
> 
>> It seems I misunderstood your email.  I've seen that glibc implements
>> rawmemchr(3) in terms of strlen(3) and memchr(3).  So it seems better to just
>> not implement this function in my library, and optimize strlen(3) directly.  The
>> non-'\0' case seems useless, so probably not worth this function unless I see a
>> use for it.
> 
> The idea is that compilers should treat it like mempcpy, bcopy etc and replace
> all uses with standard strlen/memchr. GCC/LLVM don't do this yet for rawmemchr.
> 
> Since it is not in any standard and there is no benefit of having it, we should
> obsolete this function along with all the other GNU extensions.

I'm fine deprecating rawmemchr(3) in the manual page if you suggest it.  I don't 
see any use cases for it.  If you confirm, I'll do it.

However,

I wouldn't obsolete many functions indiscriminately, since most can be very 
useful for users:

bzero(3) is much more useful than memset(3).  I've only used memset(3) for 
something non-zero once in my life, IIRC.  Writing bzero(p, n) is easier to get 
right, and simpler.

mempcpy(3) is also much more useful than memcpy(3) (and in fact, It would be 
great if glibc optimized mempcpy(3) and then implemented memcpy(3) in terms of it).

bcopy(3) is already deprecated in the manual page.  That function is dead.


My opinion is that moving the responsibility of providing inline versions of 
functions to the compiler, is just a consequence of the mess that libc is.  Not 
glibc, but every libc.  It's a consequence of the turbulent design of the C 
library, with huge headers that provide monolithic libraries, which have many 
problems.

The solution would be to completely redesign the C library, without any regards 
to backwards compatibility in mind (but continue reading, it gets better).  If 
we had let's say several dozens of micro libraries that each provide just a few 
headers and functions, making most of the functions inline, the compiler 
wouldn't need to know what substitutions to perform, with few exceptions.

For example, a libc-mem microlibrary could implement:

<c/mem/chr/memchr.h>
	c_memchr()    // in terms of c_memchrend()
*	c_memchrend() // *like memchr(3), but return mem + size instead of NULL
<c/mem/chr/memrchr.h>
*	c_memrchr()
<c/mem/cmp/memcmp.h>
*	c_memcmp()
<c/mem/cpy/memcpy.h>
*	c_mempcpy()
	c_memcpy()    // in terms of c_mempcpy()
<c/mem/mem/memmem.h>
*	c_memmem()
<c/mem/mv/memmove.h>
*	c_mempmove()
	c_memmove()   // in terms of c_mempmove()
<c/mem/set/memset.h>
*	c_mempset()
	c_memset()    // in terms of c_mempset()
<c/mem/set/memzero.h>
	c_mempzero()  // in terms of c_mempset()
	c_bzero()     // in terms of c_mempzero()

Functions with a '*' would be the primitives, the ones that are optimized, and 
the others just wrappers around them.

See <http://www.alejandro-colomar.es/src/alx/alx/libc/libc-mem.git/tree/include>

Then, you could write a compatibility layer for the standard organization of 
headers would just include the necessary headers in the common ones (e.g., 
<string.h>), and alias without the c_* prefix (or __*, if you prefer it).

Having the function definitions inline allows the compiler to see the entire 
dependency until the primitive definitions, and optimize as much as is possible.

That would even allow the kernel to use a large portion of the userspace C 
library: just link statically to the micro-libraries, but the inline definitions 
are fine to use inside a kernel.

I wrote a proof of concept, with already half a dozen of those micro-libraries 
just for fun here:

<http://www.alejandro-colomar.es/src/alx/alx/libc/>

For now, I wrote the primitives as calls to glibc, but it would be easy to flip 
the dependency so that glibc depends on the microlibraries, if they were 
extended enough for that.

In some benchmark I wrote recently for a string-copying function that I proposed 
for glibc (stpecpy(3)), this library of mine outperforms any other definition of 
it by a very large margin, just by making it inline.  Of course, I expect that 
if enough code is added to GCC, using the normal definition would be as fast, 
but the point is that this doesn't require optimizing code in the compiler to 
get really fast code.

And another advantage of that model of a C library is that it allows replacing a 
single microlibrary, instead of having to replace the entire libc.  If I prefer 
the string-copying functions of library X, but the rest I prefer it from library 
Y, I could mix'n'match them easily.

Disadvantages:  C89 is forbidden (no inline, or GNU inline, which is worse).

So, it has a long list of advantages over the traditional libc.  Maybe it's 
worth thinking about it for the future.  :)


Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-12-30 14:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30 13:13 Wilco Dijkstra
2022-12-30 14:16 ` Alejandro Colomar [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-01-04 19:41 Wilco Dijkstra
2023-01-04 20:05 ` Alejandro Colomar
2023-01-04 20:19 ` G. Branden Robinson
2023-01-04 20:34   ` Alejandro Colomar
2023-01-05 12:21   ` G. Branden Robinson
2022-12-29 19:19 Alejandro Colomar
2022-12-29 19:27 ` Alejandro Colomar
2022-12-29 19:45 ` Cristian Rodríguez
2022-12-29 19:50   ` Alejandro Colomar
2022-12-30 10:31     ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40eb3667-b8a2-e250-0004-92ba8ca34e70@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=crrodriguez@opensuse.org \
    --cc=damianm@esi.com.au \
    --cc=flexibeast@gmail.com \
    --cc=g.branden.robinson@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).