Re: [manual]: rawmemchr(3) and UB

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Re: [manual]: rawmemchr(3) and UB
@ 2023-01-04 19:41 Wilco Dijkstra
  2023-01-04 20:05 ` Alejandro Colomar
  2023-01-04 20:19 ` G. Branden Robinson
  0 siblings, 2 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2023-01-04 19:41 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: 'GNU C Library',
	Cristian Rodríguez, Damian McGuckin, G. Branden Robinson,
	Alexis

Hi Alex,

> I'm fine deprecating rawmemchr(3) in the manual page if you suggest it.  I don't 
> see any use cases for it.  If you confirm, I'll do it.

Yes, I don't see the point, the main use-case appears to be for s + strlen (s) but
is often slower since targets may implement it like memchr (which is more
complex than searching just for zero).

> bzero(3) is much more useful than memset(3).  I've only used memset(3) for 
> something non-zero once in my life, IIRC.  Writing bzero(p, n) is easier to get 
> right, and simpler.

It may save a few keypresses but it's dead so all you're doing is confuse people
who have never heard of it...

> mempcpy(3) is also much more useful than memcpy(3) (and in fact, It would be 
> great if glibc optimized mempcpy(3) and then implemented memcpy(3) in terms of it).

It makes no sense at all to support memcpy in terms of mempcpy since the latter is
rarely used. Even if is supported in a libc, it's often not optimized... So a redesigned
memcpy would obviously return 'void' as very few calls use a return value.

> bcopy(3) is already deprecated in the manual page.  That function is dead.

Good. Progress!

> My opinion is that moving the responsibility of providing inline versions of 
> functions to the compiler, is just a consequence of the mess that libc is.

Compilers aren't just inlining, they are *optimizing*. GLIBC string.h used to
thousands of lines of complex macros and inline functions trying to handoptimize
every special case in many string functions. I ripped it all out and the generated
code is now better since the compiler optimizes it.

> For example, a libc-mem microlibrary could implement:

I don't get what the supposed benefit would be of having both 'p' and
non-'p' variants of most of these. Yes, for some str* variants it is better to return
the end rather than the start since that can avoid an extra strlen call. However
I can't see any benefit for the mem* ones that don't return a value. It's simply
extra overhead to return a value, which in almost all cases won't be used...

> Having the function definitions inline allows the compiler to see the entire 
> dependency until the primitive definitions, and optimize as much as is possible.

That's fine for simple syntactic sugar, but it doesn't work in complex cases.

> In some benchmark I wrote recently for a string-copying function that I proposed 
> for glibc (stpecpy(3)), this library of mine outperforms any other definition of 
> it by a very large margin, just by making it inline.  Of course, I expect that 
> if enough code is added to GCC, using the normal definition would be as fast, 
> but the point is that this doesn't require optimizing code in the compiler to 
> get really fast code.

Again this works fine for syntactic sugar where the compiler will always inline and
optimize the underlying primitives. If you define new functionality or something
more complex (say memrchr before it existed) then how are you going to
implement it efficiently in an inline function?

Also you still need to add the symbol to a library as well or pay the cost of having
multiple outline copies of the same static inline function when the compiler isn't
able to inline for whatever reason. So yeah we've been there done that with GLIBC
headers, and it was a total mess. There are still exported symbols for internal inline
functions that we have to keep supporting for backwards compatibility...

So there are good reasons we do stuff the way we do - it works!

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2023-01-04 19:41 [manual]: rawmemchr(3) and UB Wilco Dijkstra
@ 2023-01-04 20:05 ` Alejandro Colomar
  2023-01-04 20:19 ` G. Branden Robinson
  1 sibling, 0 replies; 12+ messages in thread
From: Alejandro Colomar @ 2023-01-04 20:05 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: 'GNU C Library',
	Cristian Rodríguez, Damian McGuckin, G. Branden Robinson,
	Alexis


[-- Attachment #1.1: Type: text/plain, Size: 5823 bytes --]

Hi Wilco,

On 1/4/23 20:41, Wilco Dijkstra wrote:
> Hi Alex,
> 
>> I'm fine deprecating rawmemchr(3) in the manual page if you suggest it.  I don't
>> see any use cases for it.  If you confirm, I'll do it.
> 
> Yes, I don't see the point, the main use-case appears to be for s + strlen (s) but
> is often slower since targets may implement it like memchr (which is more
> complex than searching just for zero).

Okay; will mark it as deprecated.

> 
>> bzero(3) is much more useful than memset(3).  I've only used memset(3) for
>> something non-zero once in my life, IIRC.  Writing bzero(p, n) is easier to get
>> right, and simpler.
> 
> It may save a few keypresses but it's dead so all you're doing is confuse people
> who have never heard of it...

It's not about the keypresses.  It's more about how easy it is to insert a bug 
accidentally using memset(3), while it is very hard with bzero(3).

> 
>> mempcpy(3) is also much more useful than memcpy(3) (and in fact, It would be
>> great if glibc optimized mempcpy(3) and then implemented memcpy(3) in terms of it).
> 
> It makes no sense at all to support memcpy in terms of mempcpy since the latter is
> rarely used. Even if is supported in a libc, it's often not optimized... So a redesigned
> memcpy would obviously return 'void' as very few calls use a return value.

Not at all.  mempcpy(3) is _very_ used, at least in projects where I work. 
Also, you could efficiently implement all string-copying functions with just 3 
functions: strlen(3), memchr(3), and mempcpy(3).

In NGINX it is used very extensively.  There it is used for copying strings 
where we know the length (since we store strings in structures where we store 
the length).  See:

alx@asus5775:~/src/nginx/nginx$ grep -rn ngx_memcpy | wc -l
256
alx@asus5775:~/src/nginx/nginx$ grep -rn ngx_cpymem | wc -l
259

ngx_memcpy() is NGINX's name for memcpy(3), and ngx_cpymem() is NGINX's name for 
mempcpy(3).

In NGINX Unit, which you can think of as NGINX 2.0 (that was the original 
intention, but it was repurposed to have more features and drop some others), 
the use of mempcpy(3) is even more pronounced compared to memcpy(3):

alx@asus5775:~/src/nginx/unit/master$ grep -rn nxt_memcpy | wc -l
97
alx@asus5775:~/src/nginx/unit/master$ grep -rn nxt_cpymem | wc -l
145

So use of mempcpy(3) compared to memcpy(3) is increasing in these high 
performance programs.

> 
>> bcopy(3) is already deprecated in the manual page.  That function is dead.
> 
> Good. Progress!
> 
>> My opinion is that moving the responsibility of providing inline versions of
>> functions to the compiler, is just a consequence of the mess that libc is.
> 
> Compilers aren't just inlining, they are *optimizing*. GLIBC string.h used to
> thousands of lines of complex macros and inline functions trying to handoptimize
> every special case in many string functions. I ripped it all out and the generated
> code is now better since the compiler optimizes it.
> 
>> For example, a libc-mem microlibrary could implement:
> 
> I don't get what the supposed benefit would be of having both 'p' and
> non-'p' variants of most of these. Yes, for some str* variants it is better to return
> the end rather than the start since that can avoid an extra strlen call. However
> I can't see any benefit for the mem* ones that don't return a value. It's simply
> extra overhead to return a value, which in almost all cases won't be used...
> 
>> Having the function definitions inline allows the compiler to see the entire
>> dependency until the primitive definitions, and optimize as much as is possible.
> 
> That's fine for simple syntactic sugar, but it doesn't work in complex cases.
> 
>> In some benchmark I wrote recently for a string-copying function that I proposed
>> for glibc (stpecpy(3)), this library of mine outperforms any other definition of
>> it by a very large margin, just by making it inline.  Of course, I expect that
>> if enough code is added to GCC, using the normal definition would be as fast,
>> but the point is that this doesn't require optimizing code in the compiler to
>> get really fast code.
> 
> Again this works fine for syntactic sugar where the compiler will always inline and
> optimize the underlying primitives. If you define new functionality or something
> more complex (say memrchr before it existed) then how are you going to
> implement it efficiently in an inline function?

C99 inline is a hybrid where the compiler can see the code, but it's not forced 
to inline, and still the symbol is only in the library.  So you can implement it 
in the same way as you'd implement it in extern functions, but switching the 
implementation to the .h file, and the prototype to the .c file.

You could even implement the actual symbol in assembly, but provide inline 
definitions that don't produce any new symbol.

> 
> Also you still need to add the symbol to a library as well or pay the cost of having
> multiple outline copies of the same static inline function when the compiler isn't
> able to inline for whatever reason.

I use C99 inline, which avoids the cost of static inline.  It provides a single 
symbol per function.

> So yeah we've been there done that with GLIBC
> headers, and it was a total mess. There are still exported symbols for internal inline
> functions that we have to keep supporting for backwards compatibility...
> 
> So there are good reasons we do stuff the way we do - it works!

I guess.  Although in some cases it feels like some things would be better if 
C89/GNU89 were unsupported.

<https://www.greenend.org.uk/rjk/tech/inline.html>

Cheers,

Alex

> 
> Cheers,
> Wilco

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2023-01-04 19:41 [manual]: rawmemchr(3) and UB Wilco Dijkstra
  2023-01-04 20:05 ` Alejandro Colomar
@ 2023-01-04 20:19 ` G. Branden Robinson
  2023-01-04 20:34   ` Alejandro Colomar
  2023-01-05 12:21   ` G. Branden Robinson
  1 sibling, 2 replies; 12+ messages in thread
From: G. Branden Robinson @ 2023-01-04 20:19 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Alejandro Colomar, 'GNU C Library',
	Cristian Rodríguez, Damian McGuckin, Alexis

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

Since I'm CCed on this I'll chuck in two cents...

At 2023-01-04T19:41:30+0000, Wilco Dijkstra wrote:
> > bzero(3) is much more useful than memset(3).  I've only used
> > memset(3) for something non-zero once in my life, IIRC.  Writing
> > bzero(p, n) is easier to get right, and simpler.
> 
> It may save a few keypresses but it's dead so all you're doing is
> confuse people who have never heard of it...

I agree with Wilco here.  My understanding is that memset(), memcpy(),
and memmove() are almost C language primitives masquerading as function
calls, in that the language is _unimplementable_, even in a freestanding
environment, without them.  (How are you going to copy a struct?  How
are you going to work safely with hunks of allocated memory?[1])

The line between language runtime services and operating system
services is fuzzy in C, at least as the the language is presented and
taught.  Slowly, over time, that line is being clarified, to the horror
of those who remember writing C on a PDP-11.

You can always have your own static function: memclear() or something.

The b in bzero() or for Bad BSD Bogosity.  Ban it.  :P  Like index() and
rindex() it is duplicative.

Regards,
Branden

[1] That last point may be contrived.  For many years, no one cared.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2023-01-04 20:19 ` G. Branden Robinson
@ 2023-01-04 20:34   ` Alejandro Colomar
  2023-01-05 12:21   ` G. Branden Robinson
  1 sibling, 0 replies; 12+ messages in thread
From: Alejandro Colomar @ 2023-01-04 20:34 UTC (permalink / raw)
  To: G. Branden Robinson, Wilco Dijkstra
  Cc: 'GNU C Library',
	Cristian Rodríguez, Damian McGuckin, Alexis


[-- Attachment #1.1: Type: text/plain, Size: 4354 bytes --]

Hi Branden, Wilco,

On 1/4/23 21:19, G. Branden Robinson wrote:
> Since I'm CCed on this I'll chuck in two cents...
> 
> At 2023-01-04T19:41:30+0000, Wilco Dijkstra wrote:
>>> bzero(3) is much more useful than memset(3).  I've only used
>>> memset(3) for something non-zero once in my life, IIRC.  Writing
>>> bzero(p, n) is easier to get right, and simpler.
>>
>> It may save a few keypresses but it's dead so all you're doing is
>> confuse people who have never heard of it...

I forgot to add a reference in my previous message.  I wanted to link to this 
stackoverflow answer:

<https://stackoverflow.com/a/17097978>

Some relevant quote from there:

"""
In a comment to another answer here, Aaron Newton cited the following from Unix 
Network Programming, Volume 1, 3rd Edition by Stevens, et al., Section 1.2 
(emphasis added):

---
     bzero is not an ANSI C function. It is derived from early Berkely 
networking code. Nevertheless, we use it throughout the text, instead of the 
ANSI C memset function, because bzero is easier to remember (with only two 
arguments) than memset (with three arguments). Almost every vendor that supports 
the sockets API also provides bzero, and if not, we provide a macro definition 
in our unp.h header.

     *Indeed, the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens 1996] 
made the mistake of swapping the second and third arguments to memset in 10 
occurrences in the first printing.* A C compiler cannot catch this error because 
both arguments are of the same type. (Actually, the second argument is an int 
and the third argument is size_t, which is typically an unsigned int, but the 
values specified, 0 and 16, respectively, are still acceptable for the other 
type of argument.) The call to memset still worked, because only a few of the 
socket functions actually require that the final 8 bytes of an Internet socket 
address structure be set to 0. Nevertheless, it was an error, and one that could 
be avoided by using bzero, because swapping the two arguments to bzero will 
always be caught by the C compiler if function prototypes are used.
---
"""

> 
> I agree with Wilco here.  My understanding is that memset(), memcpy(),
> and memmove() are almost C language primitives masquerading as function
> calls, in that the language is _unimplementable_, even in a freestanding
> environment, without them.  (How are you going to copy a struct?  How
> are you going to work safely with hunks of allocated memory?[1])

memcpy() is a useless function.  You could replace _every_ single call to it by 
mempcpy(3), and you wouldn't loose a single bit of performace (assuming 
equally-optimized implementations).  And mempcpy(3) has use cases that memcpy(3) 
can't cover without adding an extra +len operation.

memset(3):  alright, it allows doing more things than bzero(3).  So, yes, in the 
end, you need to implement memset(3) unconditionally in glibc, but can implement 
bzero(3) either in glibc or in the compiler, and can be implemented as a thin 
wrapper around memset(3).

However, when was the last time you used it for setting "mem" to anything other 
than 0?  I've only done that exactly once in my life.  I don't remember the 
exact details, but I needed to set something to 1s.

> 
> The line between language runtime services and operating system
> services is fuzzy in C, at least as the the language is presented and
> taught.  Slowly, over time, that line is being clarified, to the horror
> of those who remember writing C on a PDP-11.
> 
> You can always have your own static function: memclear() or something.

static?  Do you mean static inline?  Or static within a .c file?

Most projects I've worked in, either call bzero(3) directly, or define a macro 
memzero() that is effectively memset(mem, 0, size) or memset(mem, size, 0) (I 
didn't care to check the manual page, because my point is exactly that: I don't 
remember, both look reasonably good).

> 
> The b in bzero() or for Bad BSD Bogosity.  Ban it.  :P  Like index() and
> rindex() it is duplicative.

Okay, let's call it memzero() if you prefer :)

> 
> Regards,
> Branden
> 
> [1] That last point may be contrived.  For many years, no one cared.

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2023-01-04 20:19 ` G. Branden Robinson
  2023-01-04 20:34   ` Alejandro Colomar
@ 2023-01-05 12:21   ` G. Branden Robinson
  1 sibling, 0 replies; 12+ messages in thread
From: G. Branden Robinson @ 2023-01-05 12:21 UTC (permalink / raw)
  To: Alejandro Colomar, 'GNU C Library'
  Cc: Wilco Dijkstra, Cristian Rodríguez, Damian McGuckin, Alexis

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

My punishment for smarting off to the libc-alpha list is that I have to
admit an error.

At 2023-01-04T14:19:23-0600, G. Branden Robinson wrote:
> The b in bzero() [is] for Bad BSD Bogosity.  [...] Like index() and
> rindex() it is duplicative.

While BSD did hang on to {r,}index for a long time, I was digging around
in libc history while amiably reading C. D. Perez's "A Guide to the C
Library for Unix Users" (ca. 1981)[1] and found that I have to correct
myself...and possibly the Linux man-pages documents for them.

They date all the way back to V7 Unix and antedate str{r,}chr().

https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/index.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/rindex.c

Mea culpa.

Alex, in your current mission you might find entertainment of the
cerebral hemorrhage-causing variety in some of the following files.

https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strcat.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strcmp.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strcpy.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strlen.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strncat.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strncmp.c
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/gen/strncpy.c

Regards,
Branden

[1] https://www.tuhs.org/Archive/Documentation/Manuals/Unix_4.0/Volume_1/D.1.2_A_Guide_to_the_C_Library.pdf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2022-12-30 13:13 Wilco Dijkstra
@ 2022-12-30 14:16 ` Alejandro Colomar
  0 siblings, 0 replies; 12+ messages in thread
From: Alejandro Colomar @ 2022-12-30 14:16 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: 'GNU C Library',
	Cristian Rodríguez, Damian McGuckin, G. Branden Robinson,
	Alexis

[-- Attachment #1.1: Type: text/plain, Size: 4950 bytes --]

Hi Wilco,

On 12/30/22 14:13, Wilco Dijkstra via Libc-alpha wrote:
> Hi Alex,
> 
>> It seems I misunderstood your email.  I've seen that glibc implements
>> rawmemchr(3) in terms of strlen(3) and memchr(3).  So it seems better to just
>> not implement this function in my library, and optimize strlen(3) directly.  The
>> non-'\0' case seems useless, so probably not worth this function unless I see a
>> use for it.
> 
> The idea is that compilers should treat it like mempcpy, bcopy etc and replace
> all uses with standard strlen/memchr. GCC/LLVM don't do this yet for rawmemchr.
> 
> Since it is not in any standard and there is no benefit of having it, we should
> obsolete this function along with all the other GNU extensions.

I'm fine deprecating rawmemchr(3) in the manual page if you suggest it.  I don't 
see any use cases for it.  If you confirm, I'll do it.

However,

I wouldn't obsolete many functions indiscriminately, since most can be very 
useful for users:

bzero(3) is much more useful than memset(3).  I've only used memset(3) for 
something non-zero once in my life, IIRC.  Writing bzero(p, n) is easier to get 
right, and simpler.

mempcpy(3) is also much more useful than memcpy(3) (and in fact, It would be 
great if glibc optimized mempcpy(3) and then implemented memcpy(3) in terms of it).

bcopy(3) is already deprecated in the manual page.  That function is dead.

My opinion is that moving the responsibility of providing inline versions of 
functions to the compiler, is just a consequence of the mess that libc is.  Not 
glibc, but every libc.  It's a consequence of the turbulent design of the C 
library, with huge headers that provide monolithic libraries, which have many 
problems.

The solution would be to completely redesign the C library, without any regards 
to backwards compatibility in mind (but continue reading, it gets better).  If 
we had let's say several dozens of micro libraries that each provide just a few 
headers and functions, making most of the functions inline, the compiler 
wouldn't need to know what substitutions to perform, with few exceptions.

For example, a libc-mem microlibrary could implement:

<c/mem/chr/memchr.h>
	c_memchr()    // in terms of c_memchrend()
*	c_memchrend() // *like memchr(3), but return mem + size instead of NULL
<c/mem/chr/memrchr.h>
*	c_memrchr()
<c/mem/cmp/memcmp.h>
*	c_memcmp()
<c/mem/cpy/memcpy.h>
*	c_mempcpy()
	c_memcpy()    // in terms of c_mempcpy()
<c/mem/mem/memmem.h>
*	c_memmem()
<c/mem/mv/memmove.h>
*	c_mempmove()
	c_memmove()   // in terms of c_mempmove()
<c/mem/set/memset.h>
*	c_mempset()
	c_memset()    // in terms of c_mempset()
<c/mem/set/memzero.h>
	c_mempzero()  // in terms of c_mempset()
	c_bzero()     // in terms of c_mempzero()

Functions with a '*' would be the primitives, the ones that are optimized, and 
the others just wrappers around them.

See <http://www.alejandro-colomar.es/src/alx/alx/libc/libc-mem.git/tree/include>

Then, you could write a compatibility layer for the standard organization of 
headers would just include the necessary headers in the common ones (e.g., 
<string.h>), and alias without the c_* prefix (or __*, if you prefer it).

Having the function definitions inline allows the compiler to see the entire 
dependency until the primitive definitions, and optimize as much as is possible.

That would even allow the kernel to use a large portion of the userspace C 
library: just link statically to the micro-libraries, but the inline definitions 
are fine to use inside a kernel.

I wrote a proof of concept, with already half a dozen of those micro-libraries 
just for fun here:

<http://www.alejandro-colomar.es/src/alx/alx/libc/>

For now, I wrote the primitives as calls to glibc, but it would be easy to flip 
the dependency so that glibc depends on the microlibraries, if they were 
extended enough for that.

In some benchmark I wrote recently for a string-copying function that I proposed 
for glibc (stpecpy(3)), this library of mine outperforms any other definition of 
it by a very large margin, just by making it inline.  Of course, I expect that 
if enough code is added to GCC, using the normal definition would be as fast, 
but the point is that this doesn't require optimizing code in the compiler to 
get really fast code.

And another advantage of that model of a C library is that it allows replacing a 
single microlibrary, instead of having to replace the entire libc.  If I prefer 
the string-copying functions of library X, but the rest I prefer it from library 
Y, I could mix'n'match them easily.

Disadvantages:  C89 is forbidden (no inline, or GNU inline, which is worse).

So, it has a long list of advantages over the traditional libc.  Maybe it's 
worth thinking about it for the future.  :)

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [manual]: rawmemchr(3) and UB
@ 2022-12-30 13:13 Wilco Dijkstra
  2022-12-30 14:16 ` Alejandro Colomar
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2022-12-30 13:13 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: 'GNU C Library', Cristian Rodríguez

Hi Alex,

> It seems I misunderstood your email.  I've seen that glibc implements 
> rawmemchr(3) in terms of strlen(3) and memchr(3).  So it seems better to just 
> not implement this function in my library, and optimize strlen(3) directly.  The 
> non-'\0' case seems useless, so probably not worth this function unless I see a 
> use for it.

The idea is that compilers should treat it like mempcpy, bcopy etc and replace
all uses with standard strlen/memchr. GCC/LLVM don't do this yet for rawmemchr.

Since it is not in any standard and there is no benefit of having it, we should
obsolete this function along with all the other GNU extensions.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2022-12-29 19:50   ` Alejandro Colomar
@ 2022-12-30 10:31     ` Alejandro Colomar
  0 siblings, 0 replies; 12+ messages in thread
From: Alejandro Colomar @ 2022-12-30 10:31 UTC (permalink / raw)
  To: Cristian Rodríguez; +Cc: GNU C Library, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1369 bytes --]

Hi Cristian,

On 12/29/22 20:50, Alejandro Colomar wrote:
> On 12/29/22 20:45, Cristian Rodríguez wrote:
>> On Thu, Dec 29, 2022 at 4:20 PM Alejandro Colomar via Libc-alpha
>> <libc-alpha@sourceware.org> wrote:

[...]

>>
>> The library itself uses this function mostly to find NULL as an
>> optimization. This is all before GCC handled all of this so it is
>> mostly obsolete.
>> gcc replaces null byte searches that use str*chr with s + strlen(s)
>> and expands memchr c=null  and rawmemchr-like patterns inline.
> 
> You mean that GCC does the following?:
> 
> 
> inline size_t
> strlen(const char *s)
> {
>      return rawmemchr(s, '\0');
Obvious typo here: I forgot to subtract s.
> }
> 
> 
> If so, great, because I am writing a libc replacement, and was implementing 
> strlen(3) exactly like that, which is why I needed the docs.  It may be 
> something not very useful, but I guess it's still very useful for libc internals.
> 

It seems I misunderstood your email.  I've seen that glibc implements 
rawmemchr(3) in terms of strlen(3) and memchr(3).  So it seems better to just 
not implement this function in my library, and optimize strlen(3) directly.  The 
non-'\0' case seems useless, so probably not worth this function unless I see a 
use for it.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2022-12-29 19:45 ` Cristian Rodríguez
@ 2022-12-29 19:50   ` Alejandro Colomar
  2022-12-30 10:31     ` Alejandro Colomar
  0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Colomar @ 2022-12-29 19:50 UTC (permalink / raw)
  To: Cristian Rodríguez; +Cc: GNU C Library, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1605 bytes --]

Hi Cristian,

On 12/29/22 20:45, Cristian Rodríguez wrote:
> On Thu, Dec 29, 2022 at 4:20 PM Alejandro Colomar via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> Hi,
>>
>> I was reading rawmemchr(3), and found some funny text:
>>
>> RETURN VALUE
>>          The  memchr()  and memrchr() functions return a pointer to the matching
>>          byte or NULL if the character does not occur in the given memory area.
>>
>>          The rawmemchr() function returns a pointer to the matching byte, if one
>>          is found.  If no matching byte is found, the result is unspecified.
>>
>>
>> Of course, if the byte is not found, the result is not unspecified, but rather
>> undefined, and a crash is very likely so maybe there's not even a result.  I
>> thought this might be a thinko of the manual page, but the glibc manual seems to
>> have similar text:
>>
> 
> The library itself uses this function mostly to find NULL as an
> optimization. This is all before GCC handled all of this so it is
> mostly obsolete.
> gcc replaces null byte searches that use str*chr with s + strlen(s)
> and expands memchr c=null  and rawmemchr-like patterns inline.

You mean that GCC does the following?:


inline size_t
strlen(const char *s)
{
	return rawmemchr(s, '\0');
}


If so, great, because I am writing a libc replacement, and was implementing 
strlen(3) exactly like that, which is why I needed the docs.  It may be 
something not very useful, but I guess it's still very useful for libc internals.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2022-12-29 19:19 Alejandro Colomar
  2022-12-29 19:27 ` Alejandro Colomar
@ 2022-12-29 19:45 ` Cristian Rodríguez
  2022-12-29 19:50   ` Alejandro Colomar
  1 sibling, 1 reply; 12+ messages in thread
From: Cristian Rodríguez @ 2022-12-29 19:45 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: GNU C Library, linux-man

On Thu, Dec 29, 2022 at 4:20 PM Alejandro Colomar via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Hi,
>
> I was reading rawmemchr(3), and found some funny text:
>
> RETURN VALUE
>         The  memchr()  and memrchr() functions return a pointer to the matching
>         byte or NULL if the character does not occur in the given memory area.
>
>         The rawmemchr() function returns a pointer to the matching byte, if one
>         is found.  If no matching byte is found, the result is unspecified.
>
>
> Of course, if the byte is not found, the result is not unspecified, but rather
> undefined, and a crash is very likely so maybe there's not even a result.  I
> thought this might be a thinko of the manual page, but the glibc manual seems to
> have similar text:
>

The library itself uses this function mostly to find NULL as an
optimization. This is all before GCC handled all of this so it is
mostly obsolete.
gcc replaces null byte searches that use str*chr with s + strlen(s)
and expands memchr c=null  and rawmemchr-like patterns inline.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [manual]: rawmemchr(3) and UB
  2022-12-29 19:19 Alejandro Colomar
@ 2022-12-29 19:27 ` Alejandro Colomar
  2022-12-29 19:45 ` Cristian Rodríguez
  1 sibling, 0 replies; 12+ messages in thread
From: Alejandro Colomar @ 2022-12-29 19:27 UTC (permalink / raw)
  To: GNU C Library; +Cc: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1597 bytes --]



On 12/29/22 20:19, Alejandro Colomar via Libc-alpha wrote:
> Hi,
> 
> I was reading rawmemchr(3), and found some funny text:
> 
> RETURN VALUE
>         The  memchr()  and memrchr() functions return a pointer to the matching
>         byte or NULL if the character does not occur in the given memory area.
> 
>         The rawmemchr() function returns a pointer to the matching byte, if one
>         is found.  If no matching byte is found, the result is unspecified.
> 
> 
> Of course, if the byte is not found, the result is not unspecified, but rather 
> undefined, and a crash is very likely so maybe there's not even a result.  I 
> thought this might be a thinko of the manual page, but the glibc manual seems to 
> have similar text:
> 
> 
> <https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-rawmemchr>
> "
> The rawmemchr function exists for just this situation which is surprisingly 
> frequent. The interface is similar to memchr except that the size parameter is 
> missing. The function will look beyond the end of the block pointed to by block 
> in case the programmer made an error in assuming that the byte c is present in 
> the block. In this case the result is unspecified. Otherwise the return value is 
> a pointer to the located byte.
> "
> 
> 
> That test can't be true, and the result of that function when there's no match 

s/test/text/

> can't be anything other than UB, and likely a crash.  Please fix the doc.
> 
> Cheers,
> 
> Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [manual]: rawmemchr(3) and UB
@ 2022-12-29 19:19 Alejandro Colomar
  2022-12-29 19:27 ` Alejandro Colomar
  2022-12-29 19:45 ` Cristian Rodríguez
  0 siblings, 2 replies; 12+ messages in thread
From: Alejandro Colomar @ 2022-12-29 19:19 UTC (permalink / raw)
  To: GNU C Library; +Cc: linux-man

[-- Attachment #1.1: Type: text/plain, Size: 1414 bytes --]

Hi,

I was reading rawmemchr(3), and found some funny text:

RETURN VALUE
        The  memchr()  and memrchr() functions return a pointer to the matching
        byte or NULL if the character does not occur in the given memory area.

        The rawmemchr() function returns a pointer to the matching byte, if one
        is found.  If no matching byte is found, the result is unspecified.

Of course, if the byte is not found, the result is not unspecified, but rather 
undefined, and a crash is very likely so maybe there's not even a result.  I 
thought this might be a thinko of the manual page, but the glibc manual seems to 
have similar text:

<https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-rawmemchr>
"
The rawmemchr function exists for just this situation which is surprisingly 
frequent. The interface is similar to memchr except that the size parameter is 
missing. The function will look beyond the end of the block pointed to by block 
in case the programmer made an error in assuming that the byte c is present in 
the block. In this case the result is unspecified. Otherwise the return value is 
a pointer to the located byte.
"

That test can't be true, and the result of that function when there's no match 
can't be anything other than UB, and likely a crash.  Please fix the doc.

Cheers,

Alex
-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-05 12:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 19:41 [manual]: rawmemchr(3) and UB Wilco Dijkstra
2023-01-04 20:05 ` Alejandro Colomar
2023-01-04 20:19 ` G. Branden Robinson
2023-01-04 20:34   ` Alejandro Colomar
2023-01-05 12:21   ` G. Branden Robinson
  -- strict thread matches above, loose matches on Subject: below --
2022-12-30 13:13 Wilco Dijkstra
2022-12-30 14:16 ` Alejandro Colomar
2022-12-29 19:19 Alejandro Colomar
2022-12-29 19:27 ` Alejandro Colomar
2022-12-29 19:45 ` Cristian Rodríguez
2022-12-29 19:50   ` Alejandro Colomar
2022-12-30 10:31     ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).