From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zimbra.cs.ucla.edu (zimbra.cs.ucla.edu [131.179.128.68]) by sourceware.org (Postfix) with ESMTPS id D4D843852215 for ; Thu, 17 Nov 2022 21:39:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D4D843852215 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=cs.ucla.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.ucla.edu Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 48AF0160037; Thu, 17 Nov 2022 13:39:10 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id zO7QK7XQG-la; Thu, 17 Nov 2022 13:39:09 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5CE4E160043; Thu, 17 Nov 2022 13:39:09 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu 5CE4E160043 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1668721149; bh=tqqu3r66m3iU18KNOIZqXVw23Fe2gL6XKphix9Imo88=; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type: Content-Transfer-Encoding; b=Peavge+d1CCC0K66H1bLAGc6hK+UjonuwGPNsIq9wzRYmwuF2YlPgwuXa99BIb6pM JXTLd8sQWMnaLkdMeSGJVHKmEoPjVOdYWfYA3OVj4lh2KmGhp9M5G8VDnrqtOFMN8J 9vbbiPr4rB4FxP+RTrzHeuqtEjtYIqGtOXADsgA8= X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id zqZzDqsII43k; Thu, 17 Nov 2022 13:39:09 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 41C52160037; Thu, 17 Nov 2022 13:39:09 -0800 (PST) Message-ID: <27229b18-673b-d038-9a4c-c32c50ca547c@cs.ucla.edu> Date: Thu, 17 Nov 2022 13:39:08 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: size_t vs long. Content-Language: en-US To: Alejandro Colomar , A , libc-alpha@sourceware.org References: <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com> From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,JMQ_SPF_NEUTRAL,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: >> Second and more important, that code is bogus. Nobody should ever write code like that. If I wrote code like that, I'd *want* a trap. > > for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) { > A[i] = something_nice; > } > > The code above seems a bug by not being used to it. Once you get used to it, it can become natural, but let's go for the more natural: > > > for (size_t i = 0; i < sizeof A / sizeof A[0]; ++i) { > A[i] = something_nice; > } Those loops do not mean the same thing. The first is bogus; the second one is OK (notice, the bogus loop has a "41", the OK loop doesn't). I'm not surprised you didn't notice how bogus the first loop was - most people wouldn't notice it either. And it's Gustedt's main point! I don't know why he went off the rails with that overly-clever code, but he did. > The main advantage of this code compared to the equivalent ssize_t or ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and manage to access the array at [-1], if i is unsigned you'll access [SIZE_MAX], which will definitely crash your program. That's not true on the vast majority of today's platforms, which don't have subscript checking, and for which a[-1] is treated the same way a[SIZE_MAX] is. On my platform (Fedora 36 x86-64) the same machine code is generated for 'a' and 'b' for the following C code. #include int a(int *p) { return p[-1]; } int b(int *p) { return p[SIZE_MAX]; } Yes, debugging implementations might catch p[SIZE_MAX], but the ones that do will likely catch p[-1] as well. In short, there's little advantage to using size_t for indexes, and there are real disadvantages due to comparison confusion and lack of signed integer overflow checking. >> First, Gustedt technically incorrect, because the code *can* trap on >> platforms where SIZE_MAX <= INT_MAX, > I honestly don't know of any existing platforms where that is true They're a dying breed. The main problem from my point of view is that C and POSIX allow these oddballs, so if you want to write really portable code you have to worry about them - and this understadably discourages people from writing really portable code. (What's the point of coding to the standards if it's just a bunch of make-work?) Anyway, one example is Unisys Clearpath C, in which INT_MAX and SIZE_MAX both equal 2**39 - 1. This is allowed by the current POSIX and C standards, and this compiler is still for sale and supported. (I doubt whether they'll port it to C23, so there's that....) > C23 will require that signed integers are 2's complement, which I guess > removes the possibility of a trap It doesn't remove the possibility, since signed integers can have trap representations. But we are straying from the more important point.