Re: size_t vs long. - Paul Eggert

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Paul Eggert <eggert@cs.ucla.edu>
To: Alejandro Colomar <alx.manpages@gmail.com>,
	A <amit234234234234@gmail.com>,
	libc-alpha@sourceware.org
Subject: Re: size_t vs long.
Date: Thu, 17 Nov 2022 13:39:08 -0800	[thread overview]
Message-ID: <27229b18-673b-d038-9a4c-c32c50ca547c@cs.ucla.edu> (raw)
In-Reply-To: <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com>

>> Second and more important, that code is bogus. Nobody should ever write code like that. If I wrote code like that, I'd *want* a trap.
> 
> for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) {
>    A[i] = something_nice;
> }
> 
> The code above seems a bug by not being used to it.  Once you get used to it, it can become natural, but let's go for the more natural:
> 
> 
> for (size_t i = 0; i < sizeof A / sizeof A[0]; ++i) {
>    A[i] = something_nice;
> } 

Those loops do not mean the same thing. The first is bogus; the second 
one is OK (notice, the bogus loop has a "41", the OK loop doesn't).

I'm not surprised you didn't notice how bogus the first loop was - most 
people wouldn't notice it either. And it's Gustedt's main point! I don't 
know why he went off the rails with that overly-clever code, but he did.

> The main advantage of this code compared to the equivalent ssize_t or ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and manage to access the array at [-1], if i is unsigned you'll access [SIZE_MAX], which will definitely crash your program.

That's not true on the vast majority of today's platforms, which don't 
have subscript checking, and for which a[-1] is treated the same way 
a[SIZE_MAX] is. On my platform (Fedora 36 x86-64) the same machine code 
is generated for 'a' and 'b' for the following C code.

   #include <stdint.h>
   int a(int *p) { return p[-1]; }
   int b(int *p) { return p[SIZE_MAX]; }

Yes, debugging implementations might catch p[SIZE_MAX], but the ones 
that do will likely catch p[-1] as well.

In short, there's little advantage to using size_t for indexes, and 
there are real disadvantages due to comparison confusion and lack of 
signed integer overflow checking.

>> First, Gustedt technically incorrect, because the code *can* trap on 
>> platforms where SIZE_MAX <= INT_MAX,

> I honestly don't know of any existing platforms where that is true

They're a dying breed. The main problem from my point of view is that C 
and POSIX allow these oddballs, so if you want to write really portable 
code you have to worry about them - and this understadably discourages 
people from writing really portable code. (What's the point of coding to 
the standards if it's just a bunch of make-work?)

Anyway, one example is Unisys Clearpath C, in which INT_MAX and SIZE_MAX 
both equal 2**39 - 1. This is allowed by the current POSIX and C 
standards, and this compiler is still for sale and supported. (I doubt 
whether they'll port it to C23, so there's that....)

> C23 will require that signed integers are 2's complement, which I guess 
> removes the possibility of a trap

It doesn't remove the possibility, since signed integers can have trap 
representations. But we are straying from the more important point.

next prev parent reply	other threads:[~2022-11-17 21:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  7:02 A
2022-11-17  9:21 ` Alejandro Colomar
2022-11-17  9:48   ` A
2022-11-17 11:00     ` Alejandro Colomar
2022-11-17 19:40       ` Jason Duerstock
2022-11-17 20:01         ` Alejandro Colomar
2022-11-17 19:17   ` Paul Eggert
2022-11-17 20:27     ` Alejandro Colomar
2022-11-17 21:39       ` Paul Eggert [this message]
2022-11-17 23:04         ` Alejandro Colomar
2022-11-23 20:08           ` Using size_t to crash on off-by-one errors (was: size_t vs long.) Alejandro Colomar
2022-11-18  2:11         ` size_t vs long Maciej W. Rozycki
2022-11-18  2:47           ` Paul Eggert
2022-11-23 20:01             ` Alejandro Colomar
2022-11-17 21:58 ` DJ Delorie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27229b18-673b-d038-9a4c-c32c50ca547c@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=alx.manpages@gmail.com \
    --cc=amit234234234234@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).