From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=7UV0=3R=cs.ucla.edu=eggert@sourceware.org>
Received: from zimbra.cs.ucla.edu (zimbra.cs.ucla.edu [131.179.128.68])
	by sourceware.org (Postfix) with ESMTPS id D4D843852215
	for <libc-alpha@sourceware.org>; Thu, 17 Nov 2022 21:39:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D4D843852215
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=cs.ucla.edu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.ucla.edu
Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 48AF0160037;
	Thu, 17 Nov 2022 13:39:10 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
	with ESMTP id zO7QK7XQG-la; Thu, 17 Nov 2022 13:39:09 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5CE4E160043;
	Thu, 17 Nov 2022 13:39:09 -0800 (PST)
DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu 5CE4E160043
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu;
	s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1668721149;
	bh=tqqu3r66m3iU18KNOIZqXVw23Fe2gL6XKphix9Imo88=;
	h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type:
	 Content-Transfer-Encoding;
	b=Peavge+d1CCC0K66H1bLAGc6hK+UjonuwGPNsIq9wzRYmwuF2YlPgwuXa99BIb6pM
	 JXTLd8sQWMnaLkdMeSGJVHKmEoPjVOdYWfYA3OVj4lh2KmGhp9M5G8VDnrqtOFMN8J
	 9vbbiPr4rB4FxP+RTrzHeuqtEjtYIqGtOXADsgA8=
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id zqZzDqsII43k; Thu, 17 Nov 2022 13:39:09 -0800 (PST)
Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200])
	by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 41C52160037;
	Thu, 17 Nov 2022 13:39:09 -0800 (PST)
Message-ID: <27229b18-673b-d038-9a4c-c32c50ca547c@cs.ucla.edu>
Date: Thu, 17 Nov 2022 13:39:08 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.1
Subject: Re: size_t vs long.
Content-Language: en-US
To: Alejandro Colomar <alx.manpages@gmail.com>, A
 <amit234234234234@gmail.com>, libc-alpha@sourceware.org
References: <CAOM0=dac1uM+96yT7f_GDp1f0bF=oW2JoDPFPPOgd4OjzRR+mA@mail.gmail.com>
 <c139d396-4084-f8c0-44e8-31d20d171056@gmail.com>
 <dd16db9e-bdfe-901d-9b9f-c0aa2836e55e@cs.ucla.edu>
 <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com>
From: Paul Eggert <eggert@cs.ucla.edu>
Organization: UCLA Computer Science Department
In-Reply-To: <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,JMQ_SPF_NEUTRAL,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>

>> Second and more important, that code is bogus. Nobody should ever write code like that. If I wrote code like that, I'd *want* a trap.
> 
> for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) {
>    A[i] = something_nice;
> }
> 
> The code above seems a bug by not being used to it.  Once you get used to it, it can become natural, but let's go for the more natural:
> 
> 
> for (size_t i = 0; i < sizeof A / sizeof A[0]; ++i) {
>    A[i] = something_nice;
> } 

Those loops do not mean the same thing. The first is bogus; the second 
one is OK (notice, the bogus loop has a "41", the OK loop doesn't).

I'm not surprised you didn't notice how bogus the first loop was - most 
people wouldn't notice it either. And it's Gustedt's main point! I don't 
know why he went off the rails with that overly-clever code, but he did.


> The main advantage of this code compared to the equivalent ssize_t or ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and manage to access the array at [-1], if i is unsigned you'll access [SIZE_MAX], which will definitely crash your program.

That's not true on the vast majority of today's platforms, which don't 
have subscript checking, and for which a[-1] is treated the same way 
a[SIZE_MAX] is. On my platform (Fedora 36 x86-64) the same machine code 
is generated for 'a' and 'b' for the following C code.

   #include <stdint.h>
   int a(int *p) { return p[-1]; }
   int b(int *p) { return p[SIZE_MAX]; }

Yes, debugging implementations might catch p[SIZE_MAX], but the ones 
that do will likely catch p[-1] as well.

In short, there's little advantage to using size_t for indexes, and 
there are real disadvantages due to comparison confusion and lack of 
signed integer overflow checking.


>> First, Gustedt technically incorrect, because the code *can* trap on 
>> platforms where SIZE_MAX <= INT_MAX,

> I honestly don't know of any existing platforms where that is true

They're a dying breed. The main problem from my point of view is that C 
and POSIX allow these oddballs, so if you want to write really portable 
code you have to worry about them - and this understadably discourages 
people from writing really portable code. (What's the point of coding to 
the standards if it's just a bunch of make-work?)

Anyway, one example is Unisys Clearpath C, in which INT_MAX and SIZE_MAX 
both equal 2**39 - 1. This is allowed by the current POSIX and C 
standards, and this compiler is still for sale and supported. (I doubt 
whether they'll port it to C23, so there's that....)


> C23 will require that signed integers are 2's complement, which I guess 
> removes the possibility of a trap

It doesn't remove the possibility, since signed integers can have trap 
representations. But we are straying from the more important point.