[CC: Added those who contributed to the discussion in linux-man@, and also the authors of N2861 for C2x] Hi David, On 2/16/23 16:15, David Malcolm wrote: > On Thu, 2023-02-16 at 15:35 +0100, Alejandro Colomar via Gcc wrote: >> Hi! >> >> I was preparing an example program of a use-after-realloc bug, >> when I found that GCC doesn't warn in a case where it should. >> >> >> alx@debian:~/tmp$ cat realloc.c >> #include >> #include >> #include >> #include >> #include >> >> static inline char * >> xstrdup(const char *s) >> { >>         char  *p; >> >>         p = strdup(s); >>         if (p == NULL) >>                 exit(EXIT_FAILURE); >>         return p; >> } >> >> static inline char * >> strnul(const char *s) >> { >>         return (char *) s + strlen(s); >> } >> >> int >> main(void) >> { >>         char  *p, *q; >> >>         p = xstrdup(""); >>         q = strnul(p); >> >>         if (p == q) >>                 puts("equal before"); >>         else >>                 exit(EXIT_FAILURE); // It's an empty string; this >> won't happen >> >>         printf("p = %p; q = %p\n", p, q); >> >>         p = realloc(p, UINT16_MAX); >>         if (p == NULL) >>                 exit(EXIT_FAILURE); >>         puts("realloc()"); >> >>         if (p == q) {  // Use after realloc.  I'd expect a warning >> here. >>                 puts("equal after"); >>         } else { >>                 /* Can we get here? >>                    Let's see the options: >> >>                         - realloc(3) fails: >>                                 We exit immediately.  We don't arrive >> here. >> >>                         - realloc(3) doesn't move the memory: >>                                 p == q, as before >> >>                         - realloc(3) moved the memory: >>                                 p is guaranteed to be a unique >> pointer, >>                                 and q is now an invalid pointer.  It >> is >>                                 Undefined Behavior to read `q`, so `p >> == q` >>                                 is UB. >> >>                    As we see, there's no _defined_ path where this >> can happen >>                  */ >>                 printf("PID = %i\n", (int) getpid()); >>         } >> >>         printf("p = %p; q = %p\n", p, q); >> } >> alx@debian:~/tmp$ cc -Wall -Wextra realloc.c -O3 -fanalyzer >> realloc.c: In function ‘main’: >> realloc.c:67:9: warning: pointer ‘p’ may be used after ‘realloc’ [- >> Wuse-after-free] >>    67 |         printf("p = %p; q = %p\n", p, q); >>       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> realloc.c:39:13: note: call to ‘realloc’ here >>    39 |         p = realloc(p, UINT16_MAX); >>       |             ^~~~~~~~~~~~~~~~~~~~~~ >> alx@debian:~/tmp$ ./a.out >> equal before >> p = 0x55bff80802a0; q = 0x55bff80802a0 >> realloc() >> PID = 25222 >> p = 0x55bff80806d0; q = 0x55bff80802a0 >> >> >> Did I miss anything? > > GCC's -fanalyzer will warn if you dereference q, so e.g. adding: > printf("*q = %i\n", *q); > gives a warning: > https://godbolt.org/z/6qx4afb3E > > : In function 'main': > :65:29: warning: use after 'free' of 'q' [CWE-416] [-Wanalyzer-use-after-free] > 65 | printf("*q = %i\n", *q); > | ^~ [...] > > I'm not convinced that it's useful to the end-user to warn about the > "use of q itself" case. I didn't quote the standard because I couldn't find it. I was searching in C11, and it seems that it was only implicitly Undefined Behavior, without explicit spelling (the value of the pointer was indeterminate, according to C11). Now C23 will better clarify that reading such a pointer value (not even dereferencing) is Undefined Behavior. There was a discussion in linux-man@ some years ago, which now I realize it didn't end up being applied (I thought we had applied a patch, but it seems we didn't). I'll check if we still need such a patch (and I guess we do, since we're having this conversation). Using the pointer is _wrong_. And by wrong, I mean that it's Undefined Behavior. I think that alone is enough to issue a warning. Especially, since the compiler already has that information; otherwise, it couldn't have warned about line 67 of my example program. I could understand if due to optimizations the compiler lost that information, so it couldn't warn, but in this case, there's no excuse. The benefit for users? They'll realize that the code they wrote is bad. Not even suspicious, as some warnings warn about suspicious code. This case is uncontroversially wrong. That code has no valid reason to be written that way, under ISO C. Cheers, Alex > > Dave -- GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5