public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: [manual]: rawmemchr(3) and UB
@ 2023-01-04 19:41 Wilco Dijkstra
  2023-01-04 20:05 ` Alejandro Colomar
  2023-01-04 20:19 ` G. Branden Robinson
  0 siblings, 2 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2023-01-04 19:41 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: 'GNU C Library',
	Cristian Rodríguez, Damian McGuckin, G. Branden Robinson,
	Alexis

Hi Alex,

> I'm fine deprecating rawmemchr(3) in the manual page if you suggest it.  I don't 
> see any use cases for it.  If you confirm, I'll do it.

Yes, I don't see the point, the main use-case appears to be for s + strlen (s) but
is often slower since targets may implement it like memchr (which is more
complex than searching just for zero).

> bzero(3) is much more useful than memset(3).  I've only used memset(3) for 
> something non-zero once in my life, IIRC.  Writing bzero(p, n) is easier to get 
> right, and simpler.

It may save a few keypresses but it's dead so all you're doing is confuse people
who have never heard of it...

> mempcpy(3) is also much more useful than memcpy(3) (and in fact, It would be 
> great if glibc optimized mempcpy(3) and then implemented memcpy(3) in terms of it).

It makes no sense at all to support memcpy in terms of mempcpy since the latter is
rarely used. Even if is supported in a libc, it's often not optimized... So a redesigned
memcpy would obviously return 'void' as very few calls use a return value.

> bcopy(3) is already deprecated in the manual page.  That function is dead.

Good. Progress!

> My opinion is that moving the responsibility of providing inline versions of 
> functions to the compiler, is just a consequence of the mess that libc is.

Compilers aren't just inlining, they are *optimizing*. GLIBC string.h used to
thousands of lines of complex macros and inline functions trying to handoptimize
every special case in many string functions. I ripped it all out and the generated
code is now better since the compiler optimizes it.

> For example, a libc-mem microlibrary could implement:

I don't get what the supposed benefit would be of having both 'p' and
non-'p' variants of most of these. Yes, for some str* variants it is better to return
the end rather than the start since that can avoid an extra strlen call. However
I can't see any benefit for the mem* ones that don't return a value. It's simply
extra overhead to return a value, which in almost all cases won't be used...

> Having the function definitions inline allows the compiler to see the entire 
> dependency until the primitive definitions, and optimize as much as is possible.

That's fine for simple syntactic sugar, but it doesn't work in complex cases.

> In some benchmark I wrote recently for a string-copying function that I proposed 
> for glibc (stpecpy(3)), this library of mine outperforms any other definition of 
> it by a very large margin, just by making it inline.  Of course, I expect that 
> if enough code is added to GCC, using the normal definition would be as fast, 
> but the point is that this doesn't require optimizing code in the compiler to 
> get really fast code.

Again this works fine for syntactic sugar where the compiler will always inline and
optimize the underlying primitives. If you define new functionality or something
more complex (say memrchr before it existed) then how are you going to
implement it efficiently in an inline function?

Also you still need to add the symbol to a library as well or pay the cost of having
multiple outline copies of the same static inline function when the compiler isn't
able to inline for whatever reason. So yeah we've been there done that with GLIBC
headers, and it was a total mess. There are still exported symbols for internal inline
functions that we have to keep supporting for backwards compatibility...

So there are good reasons we do stuff the way we do - it works!

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread
* [manual]: rawmemchr(3) and UB
@ 2022-12-30 13:13 Wilco Dijkstra
  2022-12-30 14:16 ` Alejandro Colomar
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2022-12-30 13:13 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: 'GNU C Library', Cristian Rodríguez

Hi Alex,

> It seems I misunderstood your email.  I've seen that glibc implements 
> rawmemchr(3) in terms of strlen(3) and memchr(3).  So it seems better to just 
> not implement this function in my library, and optimize strlen(3) directly.  The 
> non-'\0' case seems useless, so probably not worth this function unless I see a 
> use for it.

The idea is that compilers should treat it like mempcpy, bcopy etc and replace
all uses with standard strlen/memchr. GCC/LLVM don't do this yet for rawmemchr.

Since it is not in any standard and there is no benefit of having it, we should
obsolete this function along with all the other GNU extensions.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread
* [manual]: rawmemchr(3) and UB
@ 2022-12-29 19:19 Alejandro Colomar
  2022-12-29 19:27 ` Alejandro Colomar
  2022-12-29 19:45 ` Cristian Rodríguez
  0 siblings, 2 replies; 12+ messages in thread
From: Alejandro Colomar @ 2022-12-29 19:19 UTC (permalink / raw)
  To: GNU C Library; +Cc: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1414 bytes --]

Hi,

I was reading rawmemchr(3), and found some funny text:

RETURN VALUE
        The  memchr()  and memrchr() functions return a pointer to the matching
        byte or NULL if the character does not occur in the given memory area.

        The rawmemchr() function returns a pointer to the matching byte, if one
        is found.  If no matching byte is found, the result is unspecified.


Of course, if the byte is not found, the result is not unspecified, but rather 
undefined, and a crash is very likely so maybe there's not even a result.  I 
thought this might be a thinko of the manual page, but the glibc manual seems to 
have similar text:


<https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-rawmemchr>
"
The rawmemchr function exists for just this situation which is surprisingly 
frequent. The interface is similar to memchr except that the size parameter is 
missing. The function will look beyond the end of the block pointed to by block 
in case the programmer made an error in assuming that the byte c is present in 
the block. In this case the result is unspecified. Otherwise the return value is 
a pointer to the located byte.
"


That test can't be true, and the result of that function when there's no match 
can't be anything other than UB, and likely a crash.  Please fix the doc.

Cheers,

Alex
-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-05 12:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 19:41 [manual]: rawmemchr(3) and UB Wilco Dijkstra
2023-01-04 20:05 ` Alejandro Colomar
2023-01-04 20:19 ` G. Branden Robinson
2023-01-04 20:34   ` Alejandro Colomar
2023-01-05 12:21   ` G. Branden Robinson
  -- strict thread matches above, loose matches on Subject: below --
2022-12-30 13:13 Wilco Dijkstra
2022-12-30 14:16 ` Alejandro Colomar
2022-12-29 19:19 Alejandro Colomar
2022-12-29 19:27 ` Alejandro Colomar
2022-12-29 19:45 ` Cristian Rodríguez
2022-12-29 19:50   ` Alejandro Colomar
2022-12-30 10:31     ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).