public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer
@ 2021-11-21  3:01 gabravier at gmail dot com
  2021-11-21  4:46 ` [Bug c/103343] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2021-11-21  3:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

            Bug ID: 103343
           Summary: Invalid codegen when comparing pointer to one past the
                    end and then dereferencing that pointer
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

extern int x[1], y;

int f(int *p, int *q) {
    *q = y;
    if (p == (x + 1)) {
        *p = 2;
        return y;
    }
    return 0;
}

GCC trunk currently outputs the following code with -O3:

f:
        mov     eax, DWORD PTR y[rip]
        mov     DWORD PTR [rsi], eax
        cmp     rdi, OFFSET FLAT:x+4
        je      .L5
        xor     eax, eax
        ret
.L5:
        mov     DWORD PTR x[rip+4], 2
        ret

Which is incorrect because `p` could point to `y`, for example if `f` was
called as such:

int whatever;
f(&y, &whatever);

and `y` could happen to be located in memory right after `x`.

Also, although the comparison invokes unspecified behavior, this still means
only two results are possible according to the standard:
- if `p == (x + 1)` results in `false`, then the result of `f` is 0
- if `p == (x + 1)` results in `true`, then the result of `f` is 2 since we do
`*p = 2` and `p` points to `y`.

GCC's optimization makes it so the result can also be the previous value of
`y`, which could be something else than 0 or 2.

It seems that GCC assumes that because `p == (x + 1)` it can replace all
occurrences of `p` with `x + 1` without any regard to provenance, and doing
that change manually would indeed mean the `return y;` could be optimized to
use the previous store (and the store to `x + 1` would be UB, too...), but this
isn't the case here: `p` could simultaneously validly point to `y` and be equal
to `x + 1`.

PS: This also results in plenty of invalid warnings when compiling with -Wall:

<source>: In function 'f':
<source>:6:9: warning: array subscript 1 is outside array bounds of 'int[1]'
[-Warray-bounds]
    6 |         *p = 2;
      |         ^~
<source>:1:12: note: at offset 4 into object 'x' of size 4
    1 | extern int x[1], y;
      |            ^

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/103343] Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer
  2021-11-21  3:01 [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer gabravier at gmail dot com
@ 2021-11-21  4:46 ` pinskia at gcc dot gnu.org
  2021-11-22 16:28 ` msebor at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-21  4:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 61502. (or a dup of bug 93052 or many others).

*** This bug has been marked as a duplicate of bug 61502 ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/103343] Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer
  2021-11-21  3:01 [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer gabravier at gmail dot com
  2021-11-21  4:46 ` [Bug c/103343] " pinskia at gcc dot gnu.org
@ 2021-11-22 16:28 ` msebor at gcc dot gnu.org
  2021-11-22 20:54 ` gabravier at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: msebor at gcc dot gnu.org @ 2021-11-22 16:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |msebor at gcc dot gnu.org

--- Comment #2 from Martin Sebor <msebor at gcc dot gnu.org> ---
(In reply to Gabriel Ravier from comment #0)
> PS: This also results in plenty of invalid warnings when compiling with
> -Wall:
> 
> <source>: In function 'f':
> <source>:6:9: warning: array subscript 1 is outside array bounds of 'int[1]'
> [-Warray-bounds]
>     6 |         *p = 2;
>       |         ^~
> <source>:1:12: note: at offset 4 into object 'x' of size 4
>     1 | extern int x[1], y;
>       |            ^

The warning in this case is valid and helpful: it's undefined to attempt to
access an object using a pointer derived from a pointer to an unrelated object
(the equality between pointers to adjacent objects notwithstanding).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/103343] Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer
  2021-11-21  3:01 [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer gabravier at gmail dot com
  2021-11-21  4:46 ` [Bug c/103343] " pinskia at gcc dot gnu.org
  2021-11-22 16:28 ` msebor at gcc dot gnu.org
@ 2021-11-22 20:54 ` gabravier at gmail dot com
  2021-11-22 23:07 ` leni536 at gmail dot com
  2021-11-22 23:13 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2021-11-22 20:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> ---
Well the code does not invoke undefined behavior here, it just so happens that
`p == (x + 1)` because `y` happens to be laid out in memory after `x` (note:
this isn't a guarantee, of course, but GCC can't prove this isn't the case as
it's defined in another TU and it's quite easy to make this happen). The
comparison doesn't imply the pointers have the same provenance, and the
standard has a specific provision for this exact comparison:

"If one pointer represents the address of a complete object, and another
pointer represents the address one past the last element of a different
complete object,72 the result of the comparison is unspecified."
- [expr.eq] (https://eel.is/c++draft/expr.eq#3.1)

Also, `y` isn't accessed through a pointer to `x`: I've already said the case
where the function is incorrect is when `f` is called with `&y` as the first
argument. If doing `p == (x + 1)` implied they derived from the same object,
then that would imply after doing `&y == (x + 1)` doing `*&y` would invoke
undefined behavior which is obviously ridiculous.

Although there is a case to be made that this code is stupid and deserves a
warning, though... I won't argue with that, this code is just something I wrote
to test things after a 3 hour long conversation about DR 260
(<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm>) and a lot of
standardese lawyering, so it's not intended to be real life code. I'd say,
though, that the warning is quite inaccurate in the details of what it's
saying, as `p` isn't actually equivalent to `(x + 1)` just because `p == (x +
1)`.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/103343] Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer
  2021-11-21  3:01 [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2021-11-22 20:54 ` gabravier at gmail dot com
@ 2021-11-22 23:07 ` leni536 at gmail dot com
  2021-11-22 23:13 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: leni536 at gmail dot com @ 2021-11-22 23:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

--- Comment #4 from Lénárd Szolnoki <leni536 at gmail dot com> ---
A complete program example:

f.h:
```
#pragma once
extern int x[1];
extern int y;

int f(int* p, int* q);
```

f.cpp:
```
#include "f.h"

int f(int* p, int* q) {
    *q = y;
    if (p == (x + 1)) {
        *p = 2;
        return y;
    }
    return 0;
}
```

x_y.cpp:
```
#include "f.h"

int y;
int x[1];
```

main.cpp:
```
#include "f.h"

int main() {
    y=1;
    int i;
    return f(&y, &i);
}
```

Compile with `g++ -o main main.cpp f.cpp x_y.cpp`.
https://godbolt.org/z/G4KTKc7hE

The well-formed program above has two possible evaluations, due to the
unspecified comparison. In one evaluation `main` returns 0, in the other it
returns 2. Compiled with g++ the program returns 1.

Within the single invocation of `f` `p` is pointer to an object, namely `y`.
Even after the unspecified comparison evaluates to true, `p` remains a pointer
to `y`. Therefore dereferencing `p` is still valid in that branch.

I don't think that it is a duplicate of bug 61502. The program does not rely on
the object representation of the pointer objects, their printed value or their
value converted to uintptr_t. The only thing that is questionable is the
comparison with pointer past the end of an object, which is merely unspecified.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/103343] Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer
  2021-11-21  3:01 [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2021-11-22 23:07 ` leni536 at gmail dot com
@ 2021-11-22 23:13 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-22 23:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> The only thing that is questionable is the comparison with pointer past the end of an object, which is merely unspecified.

Ok, it is a dup of bug 93051.

*** This bug has been marked as a duplicate of bug 93051 ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-22 23:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-21  3:01 [Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer gabravier at gmail dot com
2021-11-21  4:46 ` [Bug c/103343] " pinskia at gcc dot gnu.org
2021-11-22 16:28 ` msebor at gcc dot gnu.org
2021-11-22 20:54 ` gabravier at gmail dot com
2021-11-22 23:07 ` leni536 at gmail dot com
2021-11-22 23:13 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).