Re: size_t vs long. - Alejandro Colomar

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Alejandro Colomar <alx.manpages@gmail.com>
To: A <amit234234234234@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: size_t vs long.
Date: Thu, 17 Nov 2022 12:00:29 +0100	[thread overview]
Message-ID: <7729eea1-d160-9148-f556-a0128d4a2e0e@gmail.com> (raw)
In-Reply-To: <CAOM0=dappbzwOTo2yKU7vE0eHzSfK9f4PFmo84zOBwJGb2Hk+w@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 5842 bytes --]

Hello,

On 11/17/22 10:48, A wrote:
>>>
>>> But if size_t is used, then most probably, it will result in a crash -
>>
>> And I love that.  Crashing is the best thing you can do.  That tells me
>> immediately that I wrote a bug.  Isn't that what we wanted in the first place?
> 
> No, I don't want a crash if I can get an error value returned back and
> errno being set properly.
> 
>>
>>> like malloc(-1) will crash the program because unsigned -1 is
>>> 0xFFFFFFFF and this much memory is not available on today's computers
>>> and probably may not be available at all in future also (RAM size of
>>> 2^64 bits is really really huge).
>>
>> We're not so lucky with malloc(3), since it's virtual memory, and you won't get
>> it all at once.  But yes, sooner or later, if you passed -1 to malloc(3), you'll
>> see a crash, which is a Good Thing (tm).
>>
> 
> I am programming since last 35 years and I haven't heard this kind of
> logic before that crash is better than getting an error value returned
> back and errno being set properly. And I have worked for companies
> like Cisco Systems and Juniper Networks.

Returning error codes is for when the input is wrong but the program logic is 
OK.  When the program logic is wrong, the behaviour of the program is by 
necessity undefined.  And ISO C defines two types of Undefined Behaviour: 
bounded UB, and critical UB.  Bounded UB means basically that you don't 
overwrite any files in your system, or otherwise modify the state of your 
system.  Critical UB means anything else, including wiping your hard drive, and 
daemons getting out of your nose [1].

[1]: (nasal demons)
  <https://www.catb.org/jargon/html/N/nasal-demons.html>

<https://stackoverflow.com/questions/32132574/does-undefined-behavior-really-permit-anything-to-happen>

<https://stackoverflow.com/questions/13444690/is-passing-additional-parameters-through-function-pointer-legal-defined-in-c/13444785#13444785>

Crashing your program on bounded UB means that you prevent continuing with the 
broken program logic.  Continuing with a broken program logic would very likely 
result in critical undefined behavior, and it could cost millions of dollars, 
depending on how unlucky you are.

> 
> Crash is not a good thing otherwise things like checking the
> validity/sanity of arguments passed in a function would not have
> existed.

Checking validity of arguments is for user input.  It also makes sense for 
static analysis.  Checking at runtime also makes sense if you just want to debug 
an existing program, but you shouldn't modify the program for that (or you would 
be debugging a program that is a different one).

Adding logic for runtime checks that your program logic is correct makes no 
sense at all.

> If we all loved crashes then we would never check any
> argument passed to a function.
strlcpy(3) is designed to crash your program, if the input is not 
NUL-terminated.  The function was designed in OpenBSD, which takes security very 
seriously.  Just an example.

> We would simply go ahead without
> checking the arguments and let the function crash and then let the
> user debug it. Getting an error value returned back and errno being
> set properly is way more easier than debugging the crash. Debugging a
> crash can take several man hours, but  by getting an error value back
> and then checking errno can solve the issue very quickly.

And then what?  Continue with a program that has already proven that has dubious 
logic?  If I detect that error, next I'll be calling is abort(3).  I prefer that 
the kernel kills my program and so I write less code.

> 
>>>
>>> Another thing is that if size_t is used an array index then array[-1]
>>> will result in wrong behavior or program crash. But with long, the
>>> developer can check whether the index is negative, thus avoiding
>>> program crash.
>>
>> And what do you plan to do when you detect -1 in your code?  Set errno to
>> EPROGRAMMERNOTSMARTENOUGH and return -1 from your function?
> 
> Looks like you are trying  to make fun of me.

Not really.  I was trying to be explicative, while being a bit funny, to make it 
more entertaining to read.  But as Brian Kernighan said, noone is smart enough 
to debug its own code:

[
Debugging is twice as hard as writing the code in the first place. Therefore, if 
you write the code as cleverly as possible, you are, by definition, not smart 
enough to debug it.
]
         — Brian W. Kernighan and P. J. Plauger in The Elements of Programming 
Style.

<http://quotes.cat-v.org/programming/#bwk>

> I don't appreciate this.
> However, you can set errno to something like - "ENEGATIVESUBSCRIPT".

And then, abort(3).

> 
> Anyways, to shorten the discussion and making it to the point, I would
> like to know why is size_t used in malloc() when a negative value
> (passed by user by mistake) can crash the program. Using long and
> checking for negative values can prevent the program from crashing.

I already said it's probably a historic accident.

> 
> Some people might say that user should check for value being negative
> but for checking for negative values long is required, not size_t. So,
> then the user will end up using long instead of size_t.

You could check for a random high value with size_t; some call it RSIZE_MAX. 
It's the same thing in the end.

> So, in effect
> using size_t in malloc is not correct (unless we get further insight
> that explains why size_t is correct in malloc()).
> 
> Just saying that size_t is good and long is bad (in malloc()) without
> any reasons will not make any sense.

I pointed to an excellent article by Jens Gustedt.  Please read it.

> 
> Amit

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2022-11-17 11:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  7:02 A
2022-11-17  9:21 ` Alejandro Colomar
2022-11-17  9:48   ` A
2022-11-17 11:00     ` Alejandro Colomar [this message]
2022-11-17 19:40       ` Jason Duerstock
2022-11-17 20:01         ` Alejandro Colomar
2022-11-17 19:17   ` Paul Eggert
2022-11-17 20:27     ` Alejandro Colomar
2022-11-17 21:39       ` Paul Eggert
2022-11-17 23:04         ` Alejandro Colomar
2022-11-23 20:08           ` Using size_t to crash on off-by-one errors (was: size_t vs long.) Alejandro Colomar
2022-11-18  2:11         ` size_t vs long Maciej W. Rozycki
2022-11-18  2:47           ` Paul Eggert
2022-11-23 20:01             ` Alejandro Colomar
2022-11-17 21:58 ` DJ Delorie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7729eea1-d160-9148-f556-a0128d4a2e0e@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=amit234234234234@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).