public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* GIMPLE undefined behavior
@ 2022-08-31 23:55 Krister Walfridsson
  2022-09-01  6:54 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Krister Walfridsson @ 2022-08-31 23:55 UTC (permalink / raw)
  To: gcc

I'm implementing a tool for translation validation (similar to Alive2 for 
LLVM). The tool uses an SMT solver to verify for each GIMPLE pass that the 
output IR is a refinement of the input IR:
  * That each compiled function returns an identical result before/after
    the pass (for input that does not invoke UB)
  * That each function does not have additional UB after the pass
  * That values in global memory are identical after the two versions of
    the function are run

I have reported three bugs (106523, 106613, 106744) where the tool has 
found differences in the result, but it is a bit unclear to me what is 
considered undefined behavior in GIMPLE, so I have not reported any such 
cases yet...

For example, ifcombine optimizes

int foo (int x, int a, int b)
{
   int c;
   int _1;
   int _2;
   int _3;
   int _4;

   <bb 2>:
   c_6 = 1 << a_5(D);
   _1 = c_6 & x_7(D);
   if (_1 != 0)
     goto <bb 3>;
   else
     goto <bb 5>;

   <bb 3>:
   _2 = x_7(D) >> b_8(D);
   _3 = _2 & 1;
   if (_3 != 0)
     goto <bb 4>;
   else
     goto <bb 5>;

   <bb 4>:

   <bb 5>:
   # _4 = PHI <2(4), 0(2), 0(3)>
   return _4;
}

to

int foo (int x, int a, int b)
{
   int c;
   int _4;
   int _10;
   int _11;
   int _12;
   int _13;

   <bb 2>:
   _10 = 1 << b_8(D);
   _11 = 1 << a_5(D);
   _12 = _10 | _11;
   _13 = x_7(D) & _12;
   if (_12 == _13)
     goto <bb 3>;
   else
     goto <bb 4>;

   <bb 3>:

   <bb 4>:
   # _4 = PHI <2(3), 0(2)>
   return _4;
}

Both return the same value for foo(8, 1, 34), but the optimized version 
shifts more than 31 bits when calculating _10. tree.def says for 
LSHIFT_EXPR that "the result is undefined" if the number of bits to shift 
by is larger than the type size, which I interpret as it just should be 
considered to return an arbitrary value. I.e., this does not invoke 
undefined behavior, so the optimization is allowed. Is my understanding 
correct?

What about signed integer wrapping for PLUS_EXPR? This happens for

int f (int c, int s)
{
   int y2;
   int y1;
   int x2;
   int x1;
   int _7;

   <bb 2>:
   y1_2 = c_1(D) + 2;
   x1_4 = y1_2 * s_3(D);
   y2_5 = c_1(D) + 4;
   x2_6 = s_3(D) * y2_5;
   _7 = x1_4 + x2_6;
   return _7;
}

where slsr optimizes this to

int f (int c, int s)
{
   int y1;
   int x2;
   int x1;
   int _7;
   int slsr_9;

   <bb 2>:
   y1_2 = c_1(D) + 2;
   x1_4 = y1_2 * s_3(D);
   slsr_9 = s_3(D) * 2;
   x2_6 = x1_4 + slsr_9;
   _7 = x1_4 + x2_6;
   return _7;

Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code, 
even though the original did not overflow. My understanding is that signed 
overflow invokes undefined behavior in GIMPLE, so this is a bug in 
ifcombine. Is my understanding correct?

I would appreciate some comments on which non-memory-related operations I 
should treat as invoking undefined behavior (memory operations are more 
complicated, and I'll be back with more concrete questions later...).

    /Krister

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GIMPLE undefined behavior
  2022-08-31 23:55 GIMPLE undefined behavior Krister Walfridsson
@ 2022-09-01  6:54 ` Richard Biener
  2022-09-02  0:03   ` Krister Walfridsson
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2022-09-01  6:54 UTC (permalink / raw)
  To: Krister Walfridsson, Andrew MacLeod; +Cc: GCC Development

On Thu, Sep 1, 2022 at 1:57 AM Krister Walfridsson via Gcc
<gcc@gcc.gnu.org> wrote:
>
> I'm implementing a tool for translation validation (similar to Alive2 for
> LLVM). The tool uses an SMT solver to verify for each GIMPLE pass that the
> output IR is a refinement of the input IR:
>   * That each compiled function returns an identical result before/after
>     the pass (for input that does not invoke UB)
>   * That each function does not have additional UB after the pass
>   * That values in global memory are identical after the two versions of
>     the function are run

Nice.

> I have reported three bugs (106523, 106613, 106744) where the tool has
> found differences in the result, but it is a bit unclear to me what is
> considered undefined behavior in GIMPLE, so I have not reported any such
> cases yet...
>
> For example, ifcombine optimizes
>
> int foo (int x, int a, int b)
> {
>    int c;
>    int _1;
>    int _2;
>    int _3;
>    int _4;
>
>    <bb 2>:
>    c_6 = 1 << a_5(D);
>    _1 = c_6 & x_7(D);
>    if (_1 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 5>;
>
>    <bb 3>:
>    _2 = x_7(D) >> b_8(D);
>    _3 = _2 & 1;
>    if (_3 != 0)
>      goto <bb 4>;
>    else
>      goto <bb 5>;
>
>    <bb 4>:
>
>    <bb 5>:
>    # _4 = PHI <2(4), 0(2), 0(3)>
>    return _4;
> }
>
> to
>
> int foo (int x, int a, int b)
> {
>    int c;
>    int _4;
>    int _10;
>    int _11;
>    int _12;
>    int _13;
>
>    <bb 2>:
>    _10 = 1 << b_8(D);
>    _11 = 1 << a_5(D);
>    _12 = _10 | _11;
>    _13 = x_7(D) & _12;
>    if (_12 == _13)
>      goto <bb 3>;
>    else
>      goto <bb 4>;
>
>    <bb 3>:
>
>    <bb 4>:
>    # _4 = PHI <2(3), 0(2)>
>    return _4;
> }
>
> Both return the same value for foo(8, 1, 34), but the optimized version
> shifts more than 31 bits when calculating _10. tree.def says for
> LSHIFT_EXPR that "the result is undefined" if the number of bits to shift
> by is larger than the type size, which I interpret as it just should be
> considered to return an arbitrary value. I.e., this does not invoke
> undefined behavior, so the optimization is allowed. Is my understanding
> correct?

It's generally poorly documented what is considered 'undefined behavior'.
We desparately need a section in the internals manual for this.
For the {L,R}SHIFT_EXPR case we assume the shift operand is
in range of [0, precision - 1], so in theory value-range propagation could
infer that b_8(D) < 32 after it "executed".  But it seems that
range-on-exit doesn't do that yet.

int x;
int foo (int i, int n)
{
  x = i << n;
  if (n > 32)
    return 7;
  return 0;
}

isn't optimized.

The problem with shifts is that there's not a "do it anway, but without
undefined behavior" operation to substitute.

As I said, we lack clear documentation here :/

> What about signed integer wrapping for PLUS_EXPR? This happens for
>
> int f (int c, int s)
> {
>    int y2;
>    int y1;
>    int x2;
>    int x1;
>    int _7;
>
>    <bb 2>:
>    y1_2 = c_1(D) + 2;
>    x1_4 = y1_2 * s_3(D);
>    y2_5 = c_1(D) + 4;
>    x2_6 = s_3(D) * y2_5;
>    _7 = x1_4 + x2_6;
>    return _7;
> }
>
> where slsr optimizes this to
>
> int f (int c, int s)
> {
>    int y1;
>    int x2;
>    int x1;
>    int _7;
>    int slsr_9;
>
>    <bb 2>:
>    y1_2 = c_1(D) + 2;
>    x1_4 = y1_2 * s_3(D);
>    slsr_9 = s_3(D) * 2;
>    x2_6 = x1_4 + slsr_9;
>    _7 = x1_4 + x2_6;
>    return _7;
>
> Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code,
> even though the original did not overflow. My understanding is that signed
> overflow invokes undefined behavior in GIMPLE, so this is a bug in
> ifcombine. Is my understanding correct?

Yes, the above would be a bug - again value-range propagation might be
leveraged to produce a wrong-code testcase.

> I would appreciate some comments on which non-memory-related operations I
> should treat as invoking undefined behavior (memory operations are more
> complicated, and I'll be back with more concrete questions later...).

The more "interesting" cases are uninitialized values (registers or memory).

In general what we should worry about most is introducing undefined
behavior that, when a later pass can assume it doesn't happen, causes
wrong code to be generated.  Likewise when we have late instrumentation
that would flag such undefined behavior as a user error.  I think only
-ftrapv would cause "late" instrumentation (but please don't test -ftrapv,
it's known to be broken in ways).

Thanks,
Richard.

>     /Krister

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GIMPLE undefined behavior
  2022-09-01  6:54 ` Richard Biener
@ 2022-09-02  0:03   ` Krister Walfridsson
  2022-09-02  6:19     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Krister Walfridsson @ 2022-09-02  0:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: Krister Walfridsson, Andrew MacLeod, GCC Development

On Thu, 1 Sep 2022, Richard Biener wrote:

> It's generally poorly documented what is considered 'undefined behavior'.
> We desparately need a section in the internals manual for this.
> For the {L,R}SHIFT_EXPR case we assume the shift operand is
> in range of [0, precision - 1], so in theory value-range propagation could
> infer that b_8(D) < 32 after it "executed".  But it seems that
> range-on-exit doesn't do that yet.
[...]
> The problem with shifts is that there's not a "do it anway, but without
> undefined behavior" operation to substitute.

I read this as I should not report these as bugs for now. But I'll 
probably keep this as UB in my tool to get an idea of how often this 
happens...


>> Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code,
>> even though the original did not overflow. My understanding is that signed
>> overflow invokes undefined behavior in GIMPLE, so this is a bug in
>> ifcombine. Is my understanding correct?
>
> Yes, the above would be a bug - again value-range propagation might be
> leveraged to produce a wrong-code testcase.

OK. I'll open bugs for the signed overflow issues the tool finds.


>> I would appreciate some comments on which non-memory-related operations I
>> should treat as invoking undefined behavior (memory operations are more
>> complicated, and I'll be back with more concrete questions later...).
>
> The more "interesting" cases are uninitialized values (registers or memory).

Yes, this is the next thing I was planning to implement. :)


> In general what we should worry about most is introducing undefined
> behavior that, when a later pass can assume it doesn't happen, causes
> wrong code to be generated.  Likewise when we have late instrumentation
> that would flag such undefined behavior as a user error.

Agreed. But that comes back to the issue of lacking documentation... :(

    /Krister

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GIMPLE undefined behavior
  2022-09-02  0:03   ` Krister Walfridsson
@ 2022-09-02  6:19     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2022-09-02  6:19 UTC (permalink / raw)
  To: Krister Walfridsson; +Cc: Andrew MacLeod, GCC Development

On Fri, Sep 2, 2022 at 2:03 AM Krister Walfridsson
<krister.walfridsson@gmail.com> wrote:
>
> On Thu, 1 Sep 2022, Richard Biener wrote:
>
> > It's generally poorly documented what is considered 'undefined behavior'.
> > We desparately need a section in the internals manual for this.
> > For the {L,R}SHIFT_EXPR case we assume the shift operand is
> > in range of [0, precision - 1], so in theory value-range propagation could
> > infer that b_8(D) < 32 after it "executed".  But it seems that
> > range-on-exit doesn't do that yet.
> [...]
> > The problem with shifts is that there's not a "do it anway, but without
> > undefined behavior" operation to substitute.
>
> I read this as I should not report these as bugs for now. But I'll
> probably keep this as UB in my tool to get an idea of how often this
> happens...

Can you report a single case as kind-of meta-bug for this particular issue?
That would be nice.

>
> >> Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code,
> >> even though the original did not overflow. My understanding is that signed
> >> overflow invokes undefined behavior in GIMPLE, so this is a bug in
> >> ifcombine. Is my understanding correct?
> >
> > Yes, the above would be a bug - again value-range propagation might be
> > leveraged to produce a wrong-code testcase.
>
> OK. I'll open bugs for the signed overflow issues the tool finds.

Thanks.

> >> I would appreciate some comments on which non-memory-related operations I
> >> should treat as invoking undefined behavior (memory operations are more
> >> complicated, and I'll be back with more concrete questions later...).
> >
> > The more "interesting" cases are uninitialized values (registers or memory).
>
> Yes, this is the next thing I was planning to implement. :)
>
>
> > In general what we should worry about most is introducing undefined
> > behavior that, when a later pass can assume it doesn't happen, causes
> > wrong code to be generated.  Likewise when we have late instrumentation
> > that would flag such undefined behavior as a user error.
>
> Agreed. But that comes back to the issue of lacking documentation... :(

Yes.  There's some "unknowns" in what GIMPLE treats as undefined given
the lack of a strict specification but there's also implementation facts for
which cases optimization passes already exploit undefined behavior (but even
that subset is not documented).

Richard.

>     /Krister

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-09-02  6:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31 23:55 GIMPLE undefined behavior Krister Walfridsson
2022-09-01  6:54 ` Richard Biener
2022-09-02  0:03   ` Krister Walfridsson
2022-09-02  6:19     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).