public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Question about declaring an array in the stack on runtime
@ 2023-07-15 10:43 James R T
  2023-07-15 10:49 ` John Scott
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: James R T @ 2023-07-15 10:43 UTC (permalink / raw)
  To: gcc-help

Hi folks,

I hope that this is the correct mailing list to ask this question.

I have the following C code snippet:

```c
#include <stdio.h>

int main() {
    unsigned int* arr;
    int some_var = 7;

    if (some_var == 7) {
        arr = (unsigned int[7]){9, 10, 11, 12, 13, 14, 15};
    }

    printf("Value of arr:\n");
    for (unsigned int i = 0; i < 7; i++) {
        printf("%u ", arr[i]);
    }

    return 0;
}
```

I have included the relevant Godbolt link here: https://godbolt.org/z/b4rbn6eGT

I have a few questions related to this code snippet:

1. Is the conditional assignment to `arr` considered undefined
behavior? If it is, which exact clause of the C standard does this
code snippet violate and why? As seen in the Godbolt link, there is
different behavior between GCC and Clang (only GCC `-O1` and above
prints garbage values) which made me suspect that this is UB.

2. Regardless of whether this is UB or not, is it possible for GCC to
also output a warning in `-O0` as in `-O2`? If the behavior changes
across different optimization levels, it seems that it's worth a
warning or two. It can be a different warning instead of
`-Wdangling-pointer` since looking at the produced assembly code, GCC
seems to simply optimize out the whole conditional assignment block in
`-O2`. If it is UB, I understand that it is impossible to catch all
UB, but I am just checking on whether it is possible to catch this
specific one from GCC's perspective. Just FYI, I have also tried using
`-fsanitize=address` and `-fsanitize=undefined` and it seems that
AddressSanitizer would throw a `stack-use-after-scope` error in GCC if
`-fsanitize=address` is specified for both `-O0` and `-O2`, but not in
Clang. `-fsanitize=undefined` does not seem to be able to detect
anything.

If the GCC maintainers consider this an acceptable proposal to add the
warning, I am also willing to post a bug report and develop the
corresponding patch for it, although I would appreciate some guidance
since I am not very familiar with GCC's codebase.

Looking forward to your reply and have a great day ahead!

Best regards,
James Raphael Tiovalen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about declaring an array in the stack on runtime
  2023-07-15 10:43 Question about declaring an array in the stack on runtime James R T
@ 2023-07-15 10:49 ` John Scott
  2023-07-15 11:19 ` Xi Ruoyao
  2023-07-17 12:45 ` David Brown
  2 siblings, 0 replies; 5+ messages in thread
From: John Scott @ 2023-07-15 10:49 UTC (permalink / raw)
  To: gcc-help


[-- Attachment #1.1: Type: text/plain, Size: 756 bytes --]

On Sat, 2023-07-15 at 18:43 +0800, James R T via Gcc-help wrote:
> Is the conditional assignment to `arr` considered undefined
> behavior?
It's not the assignment that's the problem. The problem is that you're
using a compound literal, and a compound literal, like typical
automatically allocated variables, has a lifetime determined by its
scope.

In other words, the following:
if (some_var == 7) {
	arr = (unsigned int[7]){9, 10, 11, 12, 13, 14, 15};
}

is the same as
if (some_var == 7) {
	unsigned int baz[7] = {9, 10, 11, 12, 13, 14, 15};
	arr = baz;
}

I hope this makes the problem a little more obvious: after you leave the
'}', the array doesn't exist anymore, and arr is a dangling pointer to
an object that doesn't exist.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5880 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about declaring an array in the stack on runtime
  2023-07-15 10:43 Question about declaring an array in the stack on runtime James R T
  2023-07-15 10:49 ` John Scott
@ 2023-07-15 11:19 ` Xi Ruoyao
  2023-07-15 12:57   ` James R T
  2023-07-17 12:45 ` David Brown
  2 siblings, 1 reply; 5+ messages in thread
From: Xi Ruoyao @ 2023-07-15 11:19 UTC (permalink / raw)
  To: James R T, gcc-help

On Sat, 2023-07-15 at 18:43 +0800, James R T via Gcc-help wrote:
> Hi folks,
> 
> I hope that this is the correct mailing list to ask this question.
> 
> I have the following C code snippet:
> 
> ```c
> #include <stdio.h>
> 
> int main() {
>     unsigned int* arr;
>     int some_var = 7;
> 
>     if (some_var == 7) {
>         arr = (unsigned int[7]){9, 10, 11, 12, 13, 14, 15};
>     }
> 
>     printf("Value of arr:\n");
>     for (unsigned int i = 0; i < 7; i++) {
>         printf("%u ", arr[i]);
>     }
> 
>     return 0;
> }
> ```
> 
> I have included the relevant Godbolt link here:
> https://godbolt.org/z/b4rbn6eGT
> 
> I have a few questions related to this code snippet:
> 
> 1. Is the conditional assignment to `arr` considered undefined
> behavior?

No.  But the usage of arr[i] in the printf call is considered undefined
behavior.

> If it is, which exact clause of the C standard does this
> code snippet violate and why?

From C23 6.5.2.5p5:

A compound literal provides an unnamed object whose value, type, storage
duration and other properties are as if given by the definition syntax
in the constraints; if the storage duration is automatic, the lifetime
of the instance of the unnamed object is the current execution of the
enclosing block.

And 6.2.4p2:

If an object is referred to outside of its lifetime, the behavior is
undefined.

And 6.8p1:

primary-block:
  compound-statement
  selection-statement
  iteration-statement

And 6.8p3:

A block is either a primary block, a secondary block, or the block
associated with a function definition;

So here the "enclosing block" is the if statement, after the if
statement the lifetime of the unnamed object provided by the compound
literal has ended, thus referring it in the printf call is undefined.

> 2. Regardless of whether this is UB or not, is it possible for GCC to
> also output a warning in `-O0` as in `-O2`? If the behavior changes
> across different optimization levels, it seems that it's worth a
> warning or two. It can be a different warning instead of
> `-Wdangling-pointer` since looking at the produced assembly code, GCC
> seems to simply optimize out the whole conditional assignment block in
> `-O2`. If it is UB, I understand that it is impossible to catch all
> UB, but I am just checking on whether it is possible to catch this
> specific one from GCC's perspective.

It's very difficult.  The emit of the warning depends on the optimizing
of the loop.  Without optimization the compiler doesn't even know the
loop will be iterated at least once, so there will be no warning is
produced at -O0.

> Just FYI, I have also tried using
> `-fsanitize=address` and `-fsanitize=undefined` and it seems that
> AddressSanitizer would throw a `stack-use-after-scope` error in GCC if
> `-fsanitize=address` is specified for both `-O0` and `-O2`, but not in
> Clang. `-fsanitize=undefined` does not seem to be able to detect
> anything.

It seems Clang completely unroll the loop and produced a series of
printf calls at -O2:

	leaq	.str.1(%rip), %rbx
	movq	%rbx, %rdi
	movl	$9, %esi
	xorl	%eax, %eax
	callq	printf@PLT
	movq	%rbx, %rdi
	movl	$10, %esi
	xorl	%eax, %eax
	callq	printf@PLT

... ...

So obviously -fsanitize=address won't work because there is no memory
access at all.  On the contrary, clang -fsanitize=address -O0 is able to
detect the issue.

And this is just completely out of the capability of -
fsanitize=undefined.  Despite the naming, -fsanitize=undefined can only
catch a *small* subset of undefined behaviors.

Generally sanitizers are useful tools, but not silver bullets.

> If the GCC maintainers consider this an acceptable proposal to add the
> warning, I am also willing to post a bug report and develop the
> corresponding patch for it, although I would appreciate some guidance
> since I am not very familiar with GCC's codebase.

I believe there has been a lot of duplicates about "no warnings at -O0".
And as I've said, these are very difficult to fix.  A lot of warnings
just inherently needs some information from the optimizer, or there will
be either too many false positives, or too many false negatives.  If you
need such warnings, you should enable the optimization.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about declaring an array in the stack on runtime
  2023-07-15 11:19 ` Xi Ruoyao
@ 2023-07-15 12:57   ` James R T
  0 siblings, 0 replies; 5+ messages in thread
From: James R T @ 2023-07-15 12:57 UTC (permalink / raw)
  To: Xi Ruoyao, jscott; +Cc: gcc-help

On Sat, Jul 15, 2023 at 6:49 PM John Scott <jscott@posteo.net> wrote:
> It's not the assignment that's the problem. The problem is that you're
> using a compound literal, and a compound literal, like typical
> automatically allocated variables, has a lifetime determined by its
> scope.
>
> In other words, the following:
> if (some_var == 7) {
> arr = (unsigned int[7]){9, 10, 11, 12, 13, 14, 15};
> }
>
> is the same as
> if (some_var == 7) {
> unsigned int baz[7] = {9, 10, 11, 12, 13, 14, 15};
> arr = baz;
> }
>
> I hope this makes the problem a little more obvious: after you leave the
> '}', the array doesn't exist anymore, and arr is a dangling pointer to
> an object that doesn't exist.

Ah thank you for this John, the rearranged code is particularly illuminating.

On Sat, Jul 15, 2023 at 7:19 PM Xi Ruoyao <xry111@xry111.site> wrote:
> No.  But the usage of arr[i] in the printf call is considered undefined
> behavior.

Got it.

> From C23 6.5.2.5p5:
>
> A compound literal provides an unnamed object whose value, type, storage
> duration and other properties are as if given by the definition syntax
> in the constraints; if the storage duration is automatic, the lifetime
> of the instance of the unnamed object is the current execution of the
> enclosing block.
>
> And 6.2.4p2:
>
> If an object is referred to outside of its lifetime, the behavior is
> undefined.
>
> And 6.8p1:
>
> primary-block:
>   compound-statement
>   selection-statement
>   iteration-statement
>
> And 6.8p3:
>
> A block is either a primary block, a secondary block, or the block
> associated with a function definition;
>
> So here the "enclosing block" is the if statement, after the if
> statement the lifetime of the unnamed object provided by the compound
> literal has ended, thus referring it in the printf call is undefined.

Understood, thank you for the references to the specific clauses. Will
personally study these notes on the lifetimes of compound literals
more for my understanding.

> It's very difficult.  The emit of the warning depends on the optimizing
> of the loop.  Without optimization the compiler doesn't even know the
> loop will be iterated at least once, so there will be no warning is
> produced at -O0.

Ah alright, understood.

> It seems Clang completely unroll the loop and produced a series of
> printf calls at -O2:
>
>         leaq    .str.1(%rip), %rbx
>         movq    %rbx, %rdi
>         movl    $9, %esi
>         xorl    %eax, %eax
>         callq   printf@PLT
>         movq    %rbx, %rdi
>         movl    $10, %esi
>         xorl    %eax, %eax
>         callq   printf@PLT
>
> ... ...

Yes, I noticed that. I would assume that this is more of an artifact
of me hardcoding the value of `some_var`. If the value of `some_var`
cannot be determined at compile time, then I would assume that Clang
would not be able to unroll the loop and all bets are off, especially
since it is UB.

> So obviously -fsanitize=address won't work because there is no memory
> access at all.  On the contrary, clang -fsanitize=address -O0 is able to
> detect the issue.

Oh, interesting. Attempting to specify `-fsanitize=address` on Godbolt
for Clang `-O0` does not seem to detect the issue. I have not tried it
locally though.

> And this is just completely out of the capability of -
> fsanitize=undefined.  Despite the naming, -fsanitize=undefined can only
> catch a *small* subset of undefined behaviors.
>
> Generally sanitizers are useful tools, but not silver bullets.

Noted the part on `-fsanitize=undefined`. Yes, of course, I understand
that sanitizers would not be able to catch all issues.

> I believe there has been a lot of duplicates about "no warnings at -O0".
> And as I've said, these are very difficult to fix.  A lot of warnings
> just inherently needs some information from the optimizer, or there will
> be either too many false positives, or too many false negatives.  If you
> need such warnings, you should enable the optimization.

Understood, no worries. I was just checking on this specific issue.

Thank you for your responses, Xi and John. I hope that both of you
have a great day ahead.

Best regards,
James Raphael Tiovalen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about declaring an array in the stack on runtime
  2023-07-15 10:43 Question about declaring an array in the stack on runtime James R T
  2023-07-15 10:49 ` John Scott
  2023-07-15 11:19 ` Xi Ruoyao
@ 2023-07-17 12:45 ` David Brown
  2 siblings, 0 replies; 5+ messages in thread
From: David Brown @ 2023-07-17 12:45 UTC (permalink / raw)
  To: gcc-help

On 15/07/2023 12:43, James R T via Gcc-help wrote:
> Hi folks,
> 
> 2. Regardless of whether this is UB or not, is it possible for GCC to
> also output a warning in `-O0` as in `-O2`? If the behavior changes
> across different optimization levels, it seems that it's worth a
> warning or two. It can be a different warning instead of
> `-Wdangling-pointer` since looking at the produced assembly code, GCC
> seems to simply optimize out the whole conditional assignment block in
> `-O2`. If it is UB, I understand that it is impossible to catch all
> UB, but I am just checking on whether it is possible to catch this
> specific one from GCC's perspective. Just FYI, I have also tried using
> `-fsanitize=address` and `-fsanitize=undefined` and it seems that
> AddressSanitizer would throw a `stack-use-after-scope` error in GCC if
> `-fsanitize=address` is specified for both `-O0` and `-O2`, but not in
> Clang. `-fsanitize=undefined` does not seem to be able to detect
> anything.
> 

I think you've had solid answers to your other points already, so you 
can see why it is UB.

When you use the -O flags, you are not just enabling a selection of 
optimisation passes - you are also changing the code analysis passes. 
With -O0, relatively little analysis is done.  This means that many 
warnings are not active at -O0.  So it is quite normal for warnings like 
this to require -O1 or above to work, and some may need -O2 to work 
fully (I don't have details off-hand - I'm a long-term gcc user, not a 
gcc developer).

Personally, I've never had any use for -O0 - I use -O2 usually and -O1 
at a minimum, even for early checking of code, precisely because it 
allows far better static warnings.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-17 12:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-15 10:43 Question about declaring an array in the stack on runtime James R T
2023-07-15 10:49 ` John Scott
2023-07-15 11:19 ` Xi Ruoyao
2023-07-15 12:57   ` James R T
2023-07-17 12:45 ` David Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).