* Re: strlen [not found] ` <5566b180-1333-d73b-22ee-6c6d32053921@jguk.org> @ 2021-07-08 11:06 ` Alejandro Colomar (man-pages) 2021-07-08 12:05 ` strlen johnsfine ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-08 11:06 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help On 7/8/21 12:07 PM, Jonny Grant wrote: > Thank you for your reply. > > We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. > > I'd like to avoid the compiler removing certain execution paths. > I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. > > > Probably this would work: > > size_t __attribute__((optimize("O0"))) safestrlen(const char * s) > { > if (NULL == s) return 0; > else return strlen(s); > } I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. Moreover, I recommend you to optimize as much as possible. Even though NULL is possible in your code, I guess it's unlikely. Also, calling a function safe is too generic. I'd call it with the suffix null, as it act different on null. Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. Use the POSIX type 'ssize_t'. That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. Recommended implementation (requires C99 or later due to the usage of inline): [ // compiler.h #define likely(x) __builtin_expect((x), 1) #define unlikely(x) __builtin_expect((x), 0) ] [ // strlennull.h #include <string.h> #include <sys/types.h> #include "compiler.h" [[gnu::pure]] inline ssize_t strlennull(const char *s); inline ssize_t strlennull(const char *s) { if (unlikely(!s)) return -1; return strlen(s); } ] [ // strlennull.c #include "strlennull.h" #include <sys/types.h> extern inline ssize_t strlennull(const char *s); ] BTW, if the input is so untrusted that NULL may be possible, I guess it might as well contain invalid strings (non-NUL-terminated), for which strlen(3) would behave as bad or worse. So I recommend having a look at strnlen(3) and maybe implement strnlennull() instead of strlennull(). Unless you _know_ there's a compiler bug that doesn't allow you to optimize, please optimize. Otherwise, if it's just a bit of paranoia, you could as well not optimize any code at all. Specifically not optimizing this code by the use of pragmas or attributes is *wrong*. Just revise the preprocessor output, the compiler output, and also try introducing a NULL at run time, to check that everything's fine. > > I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out. Are you sure you don't want to optimize? Are those addresses hardware I/O or interrupts? Maybe you just want membarrier(2). > >> >> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above). > > Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet. You can install gcc-10 for Bionic: apt-get install gcc-10 I recommend it. It finds bugs very deep in the code. Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) @ 2021-07-08 12:05 ` johnsfine 2021-07-09 2:08 ` strlen LIU Hao 2021-07-08 12:13 ` strlen Xi Ruoyao ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: johnsfine @ 2021-07-08 12:05 UTC (permalink / raw) To: gcc-help, jg; +Cc: gcc-help, linux-man, fw, mtk.manpages This is not the forum for such a discussion. But I want to make people reading this aware that many expert C and C++ programmers (likely a majority) consider that advice to avoid unsigned types to be horrible advice. I advise people to avoid signed types and I do so myself. If an integer value won't be negative, it shouldn't be signed. That makes your intent clearer to anyone reading your code, and (especially in x86-64) lets the compiler generate smaller and faster code. If you aren't an expert, I strongly suggest letting the library functions guide you in choice of data type. Where it takes a size_t, your variables that exist primarily to be passed that way should be size_t. Where it returns std::size_t, your variables that exist primarily to take that return should be size_t and especially, typical functions you write to parallel those functions should have the same return type. This is a rule I usually break myself, because I write a lot of code in which speed is critical and in which various cache issues dominate speed (rather than actual computations dominating speed), because I know when values will reliably be less than 2 to 32, and because there are many usages in which 32 bit unsigned is the most efficient x86-64 data type (more efficient that size_t and much more efficient than int). But for most programmers, the safety and clarity and portability of using size_t (rather than 32 bit unsigned) in such situations outweighs the performance difference. Obviously, others disagree. I hope I'm not starting a big discussion here (I don't mind a big discussion somewhere, but not here). I just wanted to balance the one sided opinion (that I have seen too many times recently). -----Original Message----- From: Alejandro Colomar (man-pages) via Gcc-help <gcc-help@gcc.gnu.org> To: Jonny Grant <jg@jguk.org> Cc: gcc-help@gcc.gnu.org; linux-man <linux-man@vger.kernel.org>; Florian Weimer <fw@deneb.enyo.de>; Michael Kerrisk <mtk.manpages@gmail.com> Sent: Thu, Jul 8, 2021 7:06 am Subject: Re: strlen On 7/8/21 12:07 PM, Jonny Grant wrote: > Thank you for your reply. > > We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. > > I'd like to avoid the compiler removing certain execution paths. > I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. > > > Probably this would work: > > size_t __attribute__((optimize("O0"))) safestrlen(const char * s) > { > if (NULL == s) return 0; > else return strlen(s); > } I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. Moreover, I recommend you to optimize as much as possible. Even though NULL is possible in your code, I guess it's unlikely. Also, calling a function safe is too generic. I'd call it with the suffix null, as it act different on null. Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. Use the POSIX type 'ssize_t'. That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. Recommended implementation (requires C99 or later due to the usage of inline): [ // compiler.h #define likely(x) __builtin_expect((x), 1) #define unlikely(x) __builtin_expect((x), 0) ] [ // strlennull.h #include <string.h> #include <sys/types.h> #include "compiler.h" [[gnu::pure]] inline ssize_t strlennull(const char *s); inline ssize_t strlennull(const char *s) { if (unlikely(!s)) return -1; return strlen(s); } ] [ // strlennull.c #include "strlennull.h" #include <sys/types.h> extern inline ssize_t strlennull(const char *s); ] BTW, if the input is so untrusted that NULL may be possible, I guess it might as well contain invalid strings (non-NUL-terminated), for which strlen(3) would behave as bad or worse. So I recommend having a look at strnlen(3) and maybe implement strnlennull() instead of strlennull(). Unless you _know_ there's a compiler bug that doesn't allow you to optimize, please optimize. Otherwise, if it's just a bit of paranoia, you could as well not optimize any code at all. Specifically not optimizing this code by the use of pragmas or attributes is *wrong*. Just revise the preprocessor output, the compiler output, and also try introducing a NULL at run time, to check that everything's fine. > > I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out. Are you sure you don't want to optimize? Are those addresses hardware I/O or interrupts? Maybe you just want membarrier(2). > >> >> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above). > > Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet. You can install gcc-10 for Bionic: apt-get install gcc-10 I recommend it. It finds bugs very deep in the code. Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-08 12:05 ` strlen johnsfine @ 2021-07-09 2:08 ` LIU Hao 2021-07-09 17:26 ` strlen Martin Sebor 0 siblings, 1 reply; 15+ messages in thread From: LIU Hao @ 2021-07-09 2:08 UTC (permalink / raw) To: johnsfine, gcc-help, jg; +Cc: linux-man, fw, mtk.manpages [-- Attachment #1.1: Type: text/plain, Size: 1256 bytes --] 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道: > This is not the forum for such a discussion. But I want to make people reading this aware that many expert C and C++ programmers (likely a majority) consider that advice to avoid unsigned types to be horrible advice. > I advise people to avoid signed types and I do so myself. If an integer value won't be negative, it shouldn't be signed. That makes your intent clearer to anyone reading your code, and (especially in x86-64) lets the compiler generate smaller and faster code. That makes no sense. Would you prefer unsigned integers to signed ones, for something that can only be one of {1,2,3,4}? Just because something can't be negative, does not mean it should be unsigned. Conversion between `uint64_t` and `double` is much slower than `int64_t` on some platforms, e.g. x86, so 'smaller and faster code' is also groundless [1]. A signed integer is a value that denotes the number or amount of something, such as bytes, characters, files, seconds, kilometers, or even dollars. An unsigned integer is merely a fixed-sized collection of bits. That's the only fundamental difference. [1] https://gcc.godbolt.org/z/6YY61rhGe -- Best regards, LIU Hao [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 2:08 ` strlen LIU Hao @ 2021-07-09 17:26 ` Martin Sebor 2021-07-09 19:51 ` strlen johnsfine 2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages) 0 siblings, 2 replies; 15+ messages in thread From: Martin Sebor @ 2021-07-09 17:26 UTC (permalink / raw) To: LIU Hao, johnsfine, gcc-help, jg; +Cc: linux-man, fw, mtk.manpages On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote: > 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道: >> This is not the forum for such a discussion. But I want to make >> people reading this aware that many expert C and C++ programmers >> (likely a majority) consider that advice to avoid unsigned types to be >> horrible advice. >> I advise people to avoid signed types and I do so myself. If an >> integer value won't be negative, it shouldn't be signed. That makes >> your intent clearer to anyone reading your code, and (especially in >> x86-64) lets the compiler generate smaller and faster code. > > That makes no sense. Would you prefer unsigned integers to signed ones, > for something that can only be one of {1,2,3,4}? Just because something > can't be negative, does not mean it should be unsigned. There are benefits to making that explicit by choosing an unsigned type: the result of converting a narrower unsigned type to a wider unsigned type is unchanged. The result of converting it to a wider signed type may change. This has an impact on value range propagation which in turn can influence other optimizations as well as warnings. Choosing an unsigned type has liabilities as well. Because unsigned arithmetic wraps around zero whereas signed arithmetic does not (it overflows which is undefined), its results are less constrained than those of signed arithmetic. Martin > > Conversion between `uint64_t` and `double` is much slower than `int64_t` > on some platforms, e.g. x86, so 'smaller and faster code' is also > groundless [1]. > > > A signed integer is a value that denotes the number or amount of > something, such as bytes, characters, files, seconds, kilometers, or > even dollars. An unsigned integer is merely a fixed-sized collection of > bits. That's the only fundamental difference. > > > [1] https://gcc.godbolt.org/z/6YY61rhGe > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 17:26 ` strlen Martin Sebor @ 2021-07-09 19:51 ` johnsfine 2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages) 1 sibling, 0 replies; 15+ messages in thread From: johnsfine @ 2021-07-09 19:51 UTC (permalink / raw) To: msebor, lh_mouse, gcc-help, jg; +Cc: linux-man, fw, mtk.manpages > Choosing an unsigned type has liabilities as well. Because unsigned > arithmetic wraps around zero whereas signed arithmetic does not (it > overflows which is undefined), its results are less constrained than > those of signed arithmetic. In the vast majority of real programming cases in which the defined overflow of unsigned causes a significant performance weakness, signed has a different but typically worse performance weakness. I wish gcc had some attribute to correct this flaw. (If that has been added since I last checked, please tell me). The common example (that an expert will recognize as an example of a more common issue) is: p[x+1] where x is a 32 bit unsigned. p[ size_t(x) + 1 ] very often generates smaller/faster code. But that is too tedious for the programmer to insert and the programmer should not need to know when that would help. The programmer often knows that size_t(x)+1 would behave the same as size_t(x+1), but the compiler does not know that.The compiler knows which of size_t(x)+1 vs. size_t(x+1) is better in a specific usage. The programmer may know that one or the other can be better, but shouldn't need to worry about which. Usually those who want this feature express it as a data type like uint32_t but with undefined overflow. So the compiler can then know that size_t(x)+1 would behave the same as size_t(x+1) because the code behavior would be undefined if that were not true. I would prefer an "unspecified promotion" attribute to an "undefined overflow" attribute, so a type that is stored as 32 bit unsigned, but at the compiler's convenience MIGHT be promoted to a larger type anywhere up the path (within the tree) from where it occurs to where it would necessarily stop being a 32 bit unsigned anyway. Example p[x*q+r] in the common case that p is of some type such that this is actually an inlined call to operator[](size_t) assuming q and r are also smaller than size_t, and x is of the type with unspecified promotion, we have size_t(x*q+r) but the compiler should freely be able to choose size_t(x*q)+r or size_t(x)*q+r, whichever of those optimizes best. In almost all cases that really matter, such an unspecified promotion vs. undefined overflow would allow exactly the same optimizations. In the obscure cases where one would allow an optimization that the other would not, I think the programmer would be much less surprised by the "unspecified promotion" behavior than "undefined overflow". But all of this is just a wish for slightly better. In the world as it exists now, choosing a 32 bit unsigned as an index type produces smaller faster code than either a 64 bit unsigned or either size signed. The cases where a 64 bit index type (either signed or unsigned) is smaller/faster than 32 bit would be fixed by such an attribute, but even without fixing that, they are dwarfed by the cases where a 32 bit unsigned index type is better than 64 bit. Through all of that, the beginner choice of 32 bit signed for an indexed type produces the least efficient code. All of this also only applies in the case where you are really trying to push performance and you are really sure there won't be more than 4 billion items in one level of one container. Otherwise (and for beginners, regardless) your index type should be size_t because that is the library's index type. I have done a lot programming in situations where the performance of the index type is critical, and even though the total number of objects exceeds 4 billion, the total number of objects visible to a single indexing operation cannot. I always do something like:typedef unsigned index_t;and then use index_t for all indexes.That allows a test build changing just that typedef to use std::size_t to verify both that the results are the same and that those same results take significantly longer. I think there are only a dozen places in a massive amount of code where profiling then told me to do just a bit better by changing p[x+1] to (p+1)[x] where p was an iterator and x was an index. The optimizer would get to the same place with p[size_t(x)+1], which also works for vectors, but I find (p+1)[x] clearer and I tended to access through iterators even when the containers were vectors. -----Original Message----- From: Martin Sebor <msebor@gmail.com> To: LIU Hao <lh_mouse@126.com>; johnsfine@verizon.net; gcc-help@gcc.gnu.org <gcc-help@gcc.gnu.org>; jg@jguk.org <jg@jguk.org> Cc: linux-man@vger.kernel.org <linux-man@vger.kernel.org>; fw@deneb.enyo.de <fw@deneb.enyo.de>; mtk.manpages@gmail.com <mtk.manpages@gmail.com> Sent: Fri, Jul 9, 2021 1:26 pm Subject: Re: strlen ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 17:26 ` strlen Martin Sebor 2021-07-09 19:51 ` strlen johnsfine @ 2021-07-09 20:19 ` Alejandro Colomar (man-pages) 1 sibling, 0 replies; 15+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 20:19 UTC (permalink / raw) To: Martin Sebor, LIU Hao, johnsfine, gcc-help, jg Cc: linux-man, fw, mtk.manpages On 7/9/21 7:26 PM, Martin Sebor wrote: > On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote: >> 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道: >>> This is not the forum for such a discussion. But I want to make >>> people reading this aware that many expert C and C++ programmers >>> (likely a majority) consider that advice to avoid unsigned types to >>> be horrible advice. I'm sorry if it's not the right place, I could remove the lists from the CC if it's too noisy, but I think an important point is here, and a couple of emails won't damage too much the mailing lists. On the other hand, I consider bad etiquette removing CCs from a discussion. >>> I advise people to avoid signed types and I do so myself. If an >>> integer value won't be negative, it shouldn't be signed. That makes >>> your intent clearer to anyone reading your code, and (especially in >>> x86-64) lets the compiler generate smaller and faster code. As others have showed with facts, and the Google style guide also hints, using unsigned types just removes the opportunity of the compiler to optimize on overflow, because it has to account for wrapping around. >> >> That makes no sense. Would you prefer unsigned integers to signed >> ones, for something that can only be one of {1,2,3,4}? Just because >> something can't be negative, does not mean it should be unsigned. That. The same way that because a number is never going to be greater than 100 is it any better to use [u]int256_t. Just use int, unless you have a reason not to. Please John, read the paper from Bjarne about unsigned types, it really covers a lot. If you still disagree after reading that, feel free to argument it. <http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf> > > There are benefits to making that explicit by choosing an unsigned > type: the result of converting a narrower unsigned type to a wider > unsigned type is unchanged. The result of converting it to a wider > signed type may change. This has an impact on value range propagation > which in turn can influence other optimizations as well as warnings. That problem with implicit conversions to larger types is not a problem of signed integers, not even a problem of unsigned integers. It's a problem of mixing both. If you use signed integers for everything, you won't have that problem. Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:05 ` strlen johnsfine @ 2021-07-08 12:13 ` Xi Ruoyao 2021-07-08 23:49 ` strlen Segher Boessenkool 2021-07-09 10:50 ` strlen Jonny Grant 3 siblings, 0 replies; 15+ messages in thread From: Xi Ruoyao @ 2021-07-08 12:13 UTC (permalink / raw) To: Alejandro Colomar (man-pages), Jonny Grant Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk On Thu, 2021-07-08 at 13:06 +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: > On 7/8/21 12:07 PM, Jonny Grant wrote: > > Thank you for your reply. > > > > We can't guarantee safestrlen() won't be called with NULL. So because > > strlen() itself doesn't check for NULL in C standard we'd need to call > > the wrapper so that NULL can be checked for. > > > > I'd like to avoid the compiler removing certain execution paths. > > I'd rather keep all code paths, even if they are not taken, just in > > case a NULL pointer creeps in due to an external device that is > > connected to an embedded system. If you are taking a pointer from external device "correctly", gcc won't delete your NULL checking path. For example: // defined by linker script extern volatile char *an_io_port_providing_a_pointer; int f() { char *ptr = an_io_port_providing_a_pointer; // C standard disallows to remove it if (ptr == NULL) { gracefully_report_bug("some message"); return -EINVAL; } return g(ptr); } Or // in assembly extern char *read_pointer_from_io_port(int io_port_id); int f() { char *ptr = read_pointer_from_io_port(IO_PORT_A); // C standard disallows to remove it if (ptr == NULL) { gracefully_report_bug("some message"); return -EINVAL; } return g(ptr); } OTOH, if you are taking the pointer from external input incorrectly (i. e. violating C standard and invoking some UB), even if you used some way to enforce the compiler to keep the NULL checking, it would be still unsafe. Even if you want to be "careful" (I'd rather call this "paranoid"), you can use -fno-delete-null-pointer-checks, instead of turning off all optimizations. And, GCC "optimize" attribute/pragma is somewhat buggy and only intended for debugging GCC. If you need to turn off some optmization for a function, it's better to put the function into a seperate TU and use command line option to disable the optimization. By the way, if C can't provide the safety feature you need (for example programming something launching a nuclear missile :), maybe it's better to use Ada or something. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:05 ` strlen johnsfine 2021-07-08 12:13 ` strlen Xi Ruoyao @ 2021-07-08 23:49 ` Segher Boessenkool 2021-07-09 13:54 ` strlen Jonny Grant 2021-07-09 10:50 ` strlen Jonny Grant 3 siblings, 1 reply; 15+ messages in thread From: Segher Boessenkool @ 2021-07-08 23:49 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: > On 7/8/21 12:07 PM, Jonny Grant wrote: > >We can't guarantee safestrlen() won't be called with NULL. So because > >strlen() itself doesn't check for NULL in C standard we'd need to call the > >wrapper so that NULL can be checked for. > >size_t __attribute__((optimize("O0"))) safestrlen(const char * s) > >{ > > if (NULL == s) return 0; > > else return strlen(s); > >} > That also allows differentiating a length of 0 (i.e., "") from an > invalid string (i.e., NULL), by returning -1 for NULL. It is incorrect to return any particular value for strlen(0); not 0, not -1, not anything. Since there *is* no string, it doesn't have a length either. So instead of making some function for this, I recommend just writing something like bla = s ? strlen(s) : 0; wherever you need it. If a function name isn't self-explanatory, and even *cannot* be, your factoring is most likely not ideal. Code is primarily there for humans to read, it should be optimised for that. Segher ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-08 23:49 ` strlen Segher Boessenkool @ 2021-07-09 13:54 ` Jonny Grant 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 15+ messages in thread From: Jonny Grant @ 2021-07-09 13:54 UTC (permalink / raw) To: Segher Boessenkool, Alejandro Colomar (man-pages) Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk On 09/07/2021 00:49, Segher Boessenkool wrote: > On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: >> On 7/8/21 12:07 PM, Jonny Grant wrote: >>> We can't guarantee safestrlen() won't be called with NULL. So because >>> strlen() itself doesn't check for NULL in C standard we'd need to call the >>> wrapper so that NULL can be checked for. > >>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s) >>> { >>> if (NULL == s) return 0; >>> else return strlen(s); >>> } > >> That also allows differentiating a length of 0 (i.e., "") from an >> invalid string (i.e., NULL), by returning -1 for NULL. > > It is incorrect to return any particular value for strlen(0); not 0, not > -1, not anything. Since there *is* no string, it doesn't have a length > either. > > So instead of making some function for this, I recommend just writing > something like > > bla = s ? strlen(s) : 0; Hi Segher Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better. inline size_t safestrlen(const char * s) {return s?strlen(s) : 0} Perhaps there are too many email addresses on this cc list now. I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s. > > wherever you need it. If a function name isn't self-explanatory, and > even *cannot* be, your factoring is most likely not ideal. Code is > primarily there for humans to read, it should be optimised for that. > > > Segher > . Good point Jonny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 13:54 ` strlen Jonny Grant @ 2021-07-09 14:17 ` Alejandro Colomar (man-pages) 2021-07-09 16:11 ` strlen Xi Ruoyao 2021-07-10 1:00 ` strlen Segher Boessenkool 0 siblings, 2 replies; 15+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 14:17 UTC (permalink / raw) To: Jonny Grant, Segher Boessenkool Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk Hi Jonny, Segher, On 7/9/21 3:54 PM, Jonny Grant wrote: > Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better. I agree on that. > > inline size_t safestrlen(const char * s) {return s?strlen(s) : 0} > > Perhaps there are too many email addresses on this cc list now. > > I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s. Please, consider not calling some function safesomething() or similar, as it isn't 100% safe. It's like calling some thing "the new X". How will you call the next version? "the nova X"? And the next? "the supernew X"? As I said before, unsigned types are unsafe, you may want to accept it or not, but they are. Now, the day you realize that and develop an even safer function that doesn't use unsigned size_t, what will you call it? supersafestrlen()? Use names that define your functions as closely as possible. > > > On 09/07/2021 00:49, Segher Boessenkool wrote: >> wherever you need it. If a function name isn't self-explanatory, and Agree on this >> even *cannot* be, your factoring is most likely not ideal. Code is But not on this. You could call it strlennull(), that is, a strlen() that special-cases NULL. I find it a good enough name, as long as you document your function. It saves the problem of repeating yourself every time. >> primarily there for humans to read, it should be optimised for that. >> Agree on this again, but I think the following is readable: len = strlennull(maybenull); Regards, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) @ 2021-07-09 16:11 ` Xi Ruoyao 2021-07-10 1:00 ` strlen Segher Boessenkool 1 sibling, 0 replies; 15+ messages in thread From: Xi Ruoyao @ 2021-07-09 16:11 UTC (permalink / raw) To: Alejandro Colomar (man-pages), Jonny Grant, Segher Boessenkool Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk On Fri, 2021-07-09 at 16:17 +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: > Hi Jonny, Segher, > > On 7/9/21 3:54 PM, Jonny Grant wrote: > > Yes, this could work. But it does rely on programmer typing it like > > that every time... Maybe an inline function better. > > I agree on that. > > > > > inline size_t safestrlen(const char * s) {return s?strlen(s) : 0} > > > > Perhaps there are too many email addresses on this cc list now. I think the discussion at least has nothing to do with linux-man or gcc- help: man pages just describe the existing API (C or POSIX or Linux specific), and GCC just compiles code and doesn't care what the API is. Neither is a place to discuss "how to design an API". And I think Jonny should discuss the API design with the users of his API (maybe his collegue or downstream developers), instead of some random guys in mail list. The users are the ones who will call his function anyway so it's better to choose an API they like. Yes, Jonny can "force" the users to do something for safety, but this decision should also be discussed with them and documented. Or they won't understand the decision, and may "invent" or "improvise" some "new wheels", breaking Jonny's design. For example, I don't like a function silently treats NULL as an empty string. I prefer a function to abort() or print a log "strlen_checked() is called with NULL, there is a bug in your code" when I (mis)use NULL. But it's just my 2 cents: if the potential users of the API agree the function to act as that, then it's good to go. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) 2021-07-09 16:11 ` strlen Xi Ruoyao @ 2021-07-10 1:00 ` Segher Boessenkool 1 sibling, 0 replies; 15+ messages in thread From: Segher Boessenkool @ 2021-07-10 1:00 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk Fine, I'll bite :-) On Fri, Jul 09, 2021 at 04:17:08PM +0200, Alejandro Colomar (man-pages) wrote: > On 7/9/21 3:54 PM, Jonny Grant wrote: > >Yes, this could work. But it does rely on programmer typing it like that > >every time... Maybe an inline function better. > > I agree on that. A function (or any other abstraction) can be fine for this, *iff* you can make people use it correctly. Since it is pretty much impossible to give a good succinct name to this function, I posit that is not the case. Please feel free to prove me wrong (by coming up with a decent name for it). > >I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() > >- but there isn't one such as strnlen_s. > > Please, consider not calling some function safesomething() or similar, > as it isn't 100% safe. It's like calling some thing "the new X". How > will you call the next version? "the nova X"? And the next? "the > supernew X"? > > As I said before, unsigned types are unsafe, you may want to accept it > or not, but they are. I thought Annex K was great entertainment, but calling unsigned types "unsafe" takes the cake. Unsigned types are Z/nZ with n some power of two. Signed types are not even Z (overflow is undefined). Unsigned types are useful for describing many machine things. They are useful for sizes, not only because sizes cannot be negative, but also because sizes can be bigger than the maximum positive number that can fit in the same size signed number. They are useful for bitty things, registers maybe, stuff in memory, or device I/O registers. And they are much more useful than C signed numbers for holding memory addresses, where you need that (you can do sane aritmetic on it). Using unsigned types without range checking is often wrong ("unsafe" in your words). Using signed types without range checking is just as wrong in the same cases, if not more (overflow is undefined). At least in the "unsigned" case it is *possible* its behaviour is what the programmer intended! > Agree on this again, but I think the following is readable: > > len = strlennull(maybenull); If you use it a million times, of course you can give it a short and non-sensical name, and expect the users to learn what it means. If not, it is better to be slightly more verbose, and reduce the mental load. Segher ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) ` (2 preceding siblings ...) 2021-07-08 23:49 ` strlen Segher Boessenkool @ 2021-07-09 10:50 ` Jonny Grant 2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages) 3 siblings, 1 reply; 15+ messages in thread From: Jonny Grant @ 2021-07-09 10:50 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote: > On 7/8/21 12:07 PM, Jonny Grant wrote: >> Thank you for your reply. >> >> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. >> >> I'd like to avoid the compiler removing certain execution paths. >> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. >> >> >> Probably this would work: >> >> size_t __attribute__((optimize("O0"))) safestrlen(const char * s) >> { >> if (NULL == s) return 0; >> else return strlen(s); >> } > > I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL. > Moreover, I recommend you to optimize as much as possible. > Even though NULL is possible in your code, I guess it's unlikely. > > Also, calling a function safe is too generic. > I'd call it with the suffix null, as it act different on null. > > Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). > See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. > Use the POSIX type 'ssize_t'. > That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. > https://man7.org/linux/man-pages/man3/strlen.3.html size_t strlen(const char *s); I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement. Cheers, Jonny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 10:50 ` strlen Jonny Grant @ 2021-07-09 11:27 ` Alejandro Colomar (man-pages) 2021-07-09 11:43 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 15+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 11:27 UTC (permalink / raw) To: Jonny Grant Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help, Segher Boessenkool, Xi Ruoyao Hi Jonny, On 7/9/21 12:50 PM, Jonny Grant wrote: > > > On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote: >> On 7/8/21 12:07 PM, Jonny Grant wrote: >>> Thank you for your reply. >>> >>> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. >>> >>> I'd like to avoid the compiler removing certain execution paths. >>> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. >>> >>> >>> Probably this would work: >>> >>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s) >>> { >>> if (NULL == s) return 0; >>> else return strlen(s); >>> } >> >> I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. > > That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL. Not always. If you inline that function, that path may be removed in some calls, if the compiler knows better than you that it can. My point is that you shouldn't care; your code is completely legal, and whatever the compiler decides to do will also be legal (no undefined behavior, and no crashes). If it optimizes, it will be a good thing that you shouldn't prevent. If the compiler does otherwise, that's a bug in the compiler, and something you can't solve by writing different code or preventing optimizations, or at least there's no guarantee about it, since it's a bug. But I don't believe you'll find a bug in this case. So please, trust the compiler, at least when using perfectly defined behavior. And if you don't trust the compiler, which is perfectly reasonable, test it, but don't try to workaround bugs that don't exist. > >> Moreover, I recommend you to optimize as much as possible. >> Even though NULL is possible in your code, I guess it's unlikely. >> >> Also, calling a function safe is too generic. >> I'd call it with the suffix null, as it act different on null. >> >> Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). >> See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. >> Use the POSIX type 'ssize_t'. >> That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. >> > > https://man7.org/linux/man-pages/man3/strlen.3.html > size_t strlen(const char *s); > > I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement. That's a historical accident. A long time ago (much before I was born, and much before the first standard, I mean in the early times of K&R C and Unix), unsigned types were used more than they should, and the first C standards (I mean ANSI/ISO standards (i.e., C89, C99, ...), not POSIX), with a lot of already existing code, didn't attempt to change the language, but to annotate common usage. In POSIX.1 there's a mix, because POSIX has the type 'ssize_t', which is not defined by the C standards. POSIX in general tends to use the signed type 'ssize_t' for its POSIX-only functions (i.e., not in the C standards). Annex K has been an attempt of Microsoft to provide safer functions, but while there are some functions there that have good intentions, most of them are just badly designed. That annex K is DOA, and will probably be marked as deprecated in C22 (currently C2x). I think that a standard should not try to design new functions, and instead just annotate common usage, as they did in the first ones. Problems like the ones Annex K suffers could have been detected early if they had been implemented as an extension to some compiler(s) decade(s) before being standardized. Therefore, if the implementation passes the test of time, you standardize it, else not, IMO. Otherwise, we have a standard that is declared deprecated in the next version of the standard, similar to what is happening with the C++ standards (which, guess what, BTW I recently read that they are undeprecating a lot of C stuff they deprecated in the first standards). Using unsigned for anything else than bitfileds and similar stuff is just *wrong*, as you can read from the Google C++ style guide I linked before. Another source you can read is this paper from Bjarne Stroustrup: <http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf> This is one of the few cases where I agree with something coming from him. I hope some day we get a ssizeof operator :-) Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: strlen 2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages) @ 2021-07-09 11:43 ` Alejandro Colomar (man-pages) 0 siblings, 0 replies; 15+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 11:43 UTC (permalink / raw) To: Jonny Grant Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help, Segher Boessenkool, Xi Ruoyao On 7/9/21 1:27 PM, Alejandro Colomar (man-pages) wrote: > Annex K has been an attempt of Microsoft to provide safer functions, but > while there are some functions there that have good intentions, most of > them are just badly designed. That annex K is DOA, and will probably be > marked as deprecated in C22 (currently C2x). > > I think that a standard should not try to design new functions, and > instead just annotate common usage, as they did in the first ones. > Problems like the ones Annex K suffers could have been detected early if > they had been implemented as an extension to some compiler(s) decade(s) > before being standardized. Therefore, if the implementation passes the > test of time, you standardize it, else not, IMO. Otherwise, we have a > standard that is declared deprecated in the next version of the > standard, similar to what is happening with the C++ standards (which, > guess what, BTW I recently read that they are undeprecating a lot of C > stuff they deprecated in the first standards). But let's analyze Annex K (C11): It has been designed by Microsoft. MS's compiler (MS Visual Studio) doesn't even fully support C99 yet (and by that trend, I doubt it never will). At most it supports C89. Visual Studio has a long history of not supporting C except for those parts required to implement their C++ compiler. Would you buy a car designed by a bike manufacturer? -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-07-10 1:02 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <0bf239e9-cfc7-8889-ca00-b1d1566bd174@jguk.org> [not found] ` <87lfhpgxf8.fsf@mid.deneb.enyo.de> [not found] ` <017a5a66-ba66-7cc8-c607-f851c2e54fc4@jguk.org> [not found] ` <87363whf2z.fsf@mid.deneb.enyo.de> [not found] ` <48e874cb-2b95-2783-ddfc-ae12d9aaf8f5@jguk.org> [not found] ` <87bl7fozxi.fsf@mid.deneb.enyo.de> [not found] ` <23679a28-5986-0af2-4d98-7de68ed0deec@jguk.org> [not found] ` <53b3666b-d200-ef97-b050-cc38669c61cd@gmail.com> [not found] ` <b6fccca1-6e2b-fb20-d9d6-9df94cd3f05f@gmail.com> [not found] ` <564825ed-1e1f-b344-da35-1b83c551ed5f@jguk.org> [not found] ` <b71170df-7c6b-4582-c3d1-84b811fe5259@gmail.com> [not found] ` <5566b180-1333-d73b-22ee-6c6d32053921@jguk.org> 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:05 ` strlen johnsfine 2021-07-09 2:08 ` strlen LIU Hao 2021-07-09 17:26 ` strlen Martin Sebor 2021-07-09 19:51 ` strlen johnsfine 2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:13 ` strlen Xi Ruoyao 2021-07-08 23:49 ` strlen Segher Boessenkool 2021-07-09 13:54 ` strlen Jonny Grant 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) 2021-07-09 16:11 ` strlen Xi Ruoyao 2021-07-10 1:00 ` strlen Segher Boessenkool 2021-07-09 10:50 ` strlen Jonny Grant 2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages) 2021-07-09 11:43 ` strlen Alejandro Colomar (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).