Hi, On 11/18/22 00:04, Alejandro Colomar wrote: >>> The main advantage of this code compared to the equivalent ssize_t or >>> ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and >>> manage to access the array at [-1], if i is unsigned you'll access >>> [SIZE_MAX], which will definitely crash your program. >> >> That's not true on the vast majority of today's platforms, which don't have >> subscript checking, and for which a[-1] is treated the same way a[SIZE_MAX] >> is. On my platform (Fedora 36 x86-64) the same machine code is generated for >> 'a' and 'b' for the following C code. >> >>    #include >>    int a(int *p) { return p[-1]; } >>    int b(int *p) { return p[SIZE_MAX]; } > > Hmm, this seems to be true in my platform (amd64) per the experiment I just did: > > $ cat s.c > #include > > char > f(char *p, ssize_t i) > { >     return p[i]; > } > $ cat u.c > #include > > char > f(char *p, size_t i) > { >     return p[i]; > } > $ cc -Wall -Wextra -Werror -S -O3 s.c u.c > $ diff -u u.s s.s > --- u.s    2022-11-17 23:41:47.773805041 +0100 > +++ s.s    2022-11-17 23:41:47.761805265 +0100 > @@ -1,15 +1,15 @@ > -    .file    "u.c" > +    .file    "s.c" >      .text >      .p2align 4 >      .globl    f >      .type    f, @function >  f: > -.LFB0: > +.LFB6: >      .cfi_startproc >      movzbl    (%rdi,%rsi), %eax >      ret >      .cfi_endproc > -.LFE0: > +.LFE6: >      .size    f, .-f >      .ident    "GCC: (Debian 12.2.0-9) 12.2.0" >      .section    .note.GNU-stack,"",@progbits > > > It seems a violation of the standard, isn't it? > > The operator [] doesn't have a type, and an argument to it should be treated > with whatever type it has after default promotions.  If I pass a size_t to it, > the type should be unsigned, and that should be preserved, by accessing the > array at a high value, which the compiler has no way to know if it will exist or > not, by that function definition.  The extreme of -1 and SIZE_MAX might be not > the best one, since we would need a pointer to be 0 to be accessible at > [SIZE_MAX], but if you replace those by -RANDOM, and (size_t)-RANDOM, then the > compiler definitely needs to generate different code, yet it doesn't. > > I'm guessing this is an optimization by GCC knowing that we will never be close > to using the whole 64-bit address space.  If we use int and unsigned, things > change: > > $ cat s.c > char > f(char *p, int i) > { >     return p[i]; > } > alx@asus5775:~/tmp$ cat u.c > char > f(char *p, unsigned i) > { >     return p[i]; > } > $ cc -Wall -Wextra -Werror -S -O3 s.c u.c > $ diff -u u.s s.s > --- u.s    2022-11-17 23:44:54.446318186 +0100 > +++ s.s    2022-11-17 23:44:54.434318409 +0100 > @@ -1,4 +1,4 @@ > -    .file    "u.c" > +    .file    "s.c" >      .text >      .p2align 4 >      .globl    f > @@ -6,7 +6,7 @@ >  f: >  .LFB0: >      .cfi_startproc > -    movl    %esi, %esi > +    movslq    %esi, %rsi >      movzbl    (%rdi,%rsi), %eax >      ret >      .cfi_endproc > > > I'm guessing that GCC doesn't do the assumption here, and I guess the unsigned > version would crash, while the signed version would cause nasal demons.  Anyway, > now that I'm here, I'll test it: > > > $ cat s.c > [[gnu::noipa]] > char > f(char *p, int i) > { >     return p[i]; > } > > int main(void) > { >     int i = -1; >     char c[4]; > >     return f(c, i); > } > $ cc -Wall -Wextra -Werror -O3 s.c > $ ./a.out > $ echo $? > 0 > > > $ cat u.c > [[gnu::noipa]] > char > f(char *p, unsigned i) > { >     return p[i]; > } > > int main(void) > { >     unsigned i = -1; >     char c[4]; > >     return f(c, i); > } > $ cc -Wall -Wextra -Werror -O3 u.c > $ ./a.out > Segmentation fault > > > I get this SEGV difference consistently.  I CCed gcc@ in case they consider this > to be something they want to address.  Maybe the optimization is important for > size_t-sized indices, but if it is not, I'd prefer getting the SEGV for SIZE_MAX. > After some though, of course the compiler can't produce any different code, since pointers are 64 bits. A different story would be if pointers were 128 bits, but that might cause its own issues; should sizes be still 64 bits? or 128 bits? Maybe using a configurable size_t would be interesting for debugging. Anyway, it's good to know that tweaking size_t to be 32 bits in some debug builds might help catch some off-by-one errors. Cheers, Alex --