Hi,

On 11/18/22 00:04, Alejandro Colomar wrote:
>>> The main advantage of this code compared to the equivalent ssize_t or 
>>> ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and 
>>> manage to access the array at [-1], if i is unsigned you'll access 
>>> [SIZE_MAX], which will definitely crash your program.
>>
>> That's not true on the vast majority of today's platforms, which don't have 
>> subscript checking, and for which a[-1] is treated the same way a[SIZE_MAX] 
>> is. On my platform (Fedora 36 x86-64) the same machine code is generated for 
>> 'a' and 'b' for the following C code.
>>
>>    #include <stdint.h>
>>    int a(int *p) { return p[-1]; }
>>    int b(int *p) { return p[SIZE_MAX]; }
> 
> Hmm, this seems to be true in my platform (amd64) per the experiment I just did:
> 
> $ cat s.c
> #include <sys/types.h>
> 
> char
> f(char *p, ssize_t i)
> {
>      return p[i];
> }
> $ cat u.c
> #include <stddef.h>
> 
> char
> f(char *p, size_t i)
> {
>      return p[i];
> }
> $ cc -Wall -Wextra -Werror -S -O3 s.c u.c
> $ diff -u u.s s.s
> --- u.s    2022-11-17 23:41:47.773805041 +0100
> +++ s.s    2022-11-17 23:41:47.761805265 +0100
> @@ -1,15 +1,15 @@
> -    .file    "u.c"
> +    .file    "s.c"
>       .text
>       .p2align 4
>       .globl    f
>       .type    f, @function
>   f:
> -.LFB0:
> +.LFB6:
>       .cfi_startproc
>       movzbl    (%rdi,%rsi), %eax
>       ret
>       .cfi_endproc
> -.LFE0:
> +.LFE6:
>       .size    f, .-f
>       .ident    "GCC: (Debian 12.2.0-9) 12.2.0"
>       .section    .note.GNU-stack,"",@progbits
> 
> 
> It seems a violation of the standard, isn't it?
> 
> The operator [] doesn't have a type, and an argument to it should be treated 
> with whatever type it has after default promotions.  If I pass a size_t to it, 
> the type should be unsigned, and that should be preserved, by accessing the 
> array at a high value, which the compiler has no way to know if it will exist or 
> not, by that function definition.  The extreme of -1 and SIZE_MAX might be not 
> the best one, since we would need a pointer to be 0 to be accessible at 
> [SIZE_MAX], but if you replace those by -RANDOM, and (size_t)-RANDOM, then the 
> compiler definitely needs to generate different code, yet it doesn't.
> 
> I'm guessing this is an optimization by GCC knowing that we will never be close 
> to using the whole 64-bit address space.  If we use int and unsigned, things 
> change:
> 
> $ cat s.c
> char
> f(char *p, int i)
> {
>      return p[i];
> }
> alx@asus5775:~/tmp$ cat u.c
> char
> f(char *p, unsigned i)
> {
>      return p[i];
> }
> $ cc -Wall -Wextra -Werror -S -O3 s.c u.c
> $ diff -u u.s s.s
> --- u.s    2022-11-17 23:44:54.446318186 +0100
> +++ s.s    2022-11-17 23:44:54.434318409 +0100
> @@ -1,4 +1,4 @@
> -    .file    "u.c"
> +    .file    "s.c"
>       .text
>       .p2align 4
>       .globl    f
> @@ -6,7 +6,7 @@
>   f:
>   .LFB0:
>       .cfi_startproc
> -    movl    %esi, %esi
> +    movslq    %esi, %rsi
>       movzbl    (%rdi,%rsi), %eax
>       ret
>       .cfi_endproc
> 
> 
> I'm guessing that GCC doesn't do the assumption here, and I guess the unsigned 
> version would crash, while the signed version would cause nasal demons.  Anyway, 
> now that I'm here, I'll test it:
> 
> 
> $ cat s.c
> [[gnu::noipa]]
> char
> f(char *p, int i)
> {
>      return p[i];
> }
> 
> int main(void)
> {
>      int i = -1;
>      char c[4];
> 
>      return f(c, i);
> }
> $ cc -Wall -Wextra -Werror -O3 s.c
> $ ./a.out
> $ echo $?
> 0
> 
> 
> $ cat u.c
> [[gnu::noipa]]
> char
> f(char *p, unsigned i)
> {
>      return p[i];
> }
> 
> int main(void)
> {
>      unsigned i = -1;
>      char c[4];
> 
>      return f(c, i);
> }
> $ cc -Wall -Wextra -Werror -O3 u.c
> $ ./a.out
> Segmentation fault
> 
> 
> I get this SEGV difference consistently.  I CCed gcc@ in case they consider this 
> to be something they want to address.  Maybe the optimization is important for 
> size_t-sized indices, but if it is not, I'd prefer getting the SEGV for SIZE_MAX.
> 

After some though, of course the compiler can't produce any different code, 
since pointers are 64 bits.  A different story would be if pointers were 128 
bits, but that might cause its own issues; should sizes be still 64 bits? or 128 
bits?  Maybe using a configurable size_t would be interesting for debugging.

Anyway, it's good to know that tweaking size_t to be 32 bits in some debug 
builds might help catch some off-by-one errors.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>