types in GIMPLE IR

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* types in GIMPLE IR
@ 2023-06-28  9:15 Krister Walfridsson
  2023-06-28 15:46 ` Michael Matz
  0 siblings, 1 reply; 8+ messages in thread
From: Krister Walfridsson @ 2023-06-28  9:15 UTC (permalink / raw)
  To: gcc

I have some random questions concerning types in the GIMPLE IR that I 
would appreciate some clarification on.


Type safety
-----------
Some transformations treat 1-bit types as a synonym of _Bool and mix the 
types in expressions, such as:

   <unnamed-unsigned:1> _2;
   _Bool _3;
   _Bool _4;
   ...
   _4 = _2 ^ _3;

and similarly mixing _Bool and enum

   enum E:bool { E0, E1 };

in one operation.

I had expected this to be invalid... What are the type safety rules in the 
GIMPLE IR?


Somewhat related, gcc.c-torture/compile/pr96796.c performs a 
VIEW_CONVERT_EXPR from

   struct S1 {
     long f3;
     char f4;
   } g_3_4;

to an int

   p_51_9 = VIEW_CONVERT_EXPR<int>(g_3_4);

That must be wrong?


Semantics of <signed-boolean:32>
--------------------------------
"Wide" Booleans, such as <signed-boolean:32>, seems to allow more values 
than 0 and 1. For example, I've seen some IR operations like:

   _66 = _16 ? _Literal (<signed-boolean:32>) -1 : 0;

But I guess there must be some semantic difference between 
<signed-boolean:32> and a 32-bit int, otherwise the wide Boolean type 
would not be needed... So what are the difference?

    /Krister

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-28  9:15 types in GIMPLE IR Krister Walfridsson
@ 2023-06-28 15:46 ` Michael Matz
  2023-06-29  6:21   ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Matz @ 2023-06-28 15:46 UTC (permalink / raw)
  To: Krister Walfridsson; +Cc: gcc

Hello,

On Wed, 28 Jun 2023, Krister Walfridsson via Gcc wrote:

> Type safety
> -----------
> Some transformations treat 1-bit types as a synonym of _Bool and mix the types
> in expressions, such as:
> 
>   <unnamed-unsigned:1> _2;
>   _Bool _3;
>   _Bool _4;
>   ...
>   _4 = _2 ^ _3;
> 
> and similarly mixing _Bool and enum
> 
>   enum E:bool { E0, E1 };
> 
> in one operation.
> 
> I had expected this to be invalid... What are the type safety rules in the
> GIMPLE IR?

Type safety in gimple is defined in terms of type compatiblity, with 
_many_ exceptions for specific types of statements.  Generally stuff is 
verified in verify_gimple_seq., in this case of a binary assign statement 
in verify_gimple_assign_binary.  As you can see there the normal rules for 
bread-and-butter binary assigns is simply that RHS, LHS1 and LHS2 must 
all be type-compatible.

T1 and T2 are compatible if conversions from T1 to T2 are useless and 
conversion from T2 to T1 are also useless.  (types_compatible_p)  The meat 
for that is all in gimple-expr.cc:useless_type_conversion_p.  For this 
specific case again we have:

      /* Preserve conversions to/from BOOLEAN_TYPE if types are not
         of precision one.  */
      if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
           != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
          && TYPE_PRECISION (outer_type) != 1)
        return false;

So, yes, booleans and 1-bit types can be compatible (under certain other 
conditions, see the function).

> Somewhat related, gcc.c-torture/compile/pr96796.c performs a VIEW_CONVERT_EXPR
> from
> 
>   struct S1 {
>     long f3;
>     char f4;
>   } g_3_4;
> 
> to an int
> 
>   p_51_9 = VIEW_CONVERT_EXPR<int>(g_3_4);
> 
> That must be wrong?

VIEW_CONVERT_EXPR is _very_ generous.  See 
verify_types_in_gimple_reference: 

      if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
        {
          /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check
             that their operand is not a register an invariant when
             requiring an lvalue (this usually means there is a SRA or IPA-SRA
             bug).  Otherwise there is nothing to verify, gross mismatches at
             most invoke undefined behavior.  */
          if (require_lvalue
              && (is_gimple_reg (op) || is_gimple_min_invariant (op)))
            {
              error ("conversion of %qs on the left hand side of %qs",
                     get_tree_code_name (TREE_CODE (op)), code_name);
              debug_generic_stmt (expr);
              return true;
            }
          else if (is_gimple_reg (op)
                   && TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE (op)))
            {
              error ("conversion of register to a different size in %qs",
                     code_name);
              debug_generic_stmt (expr);
              return true;
            }
        }

Here the operand is not a register (but a global memory object), so 
everything goes.

It should be said that over the years gimples type system became stricter 
and stricter, but it started as mostly everything-goes, so making it 
stricter is a bumpy road that isn't fully travelled yet, because checking 
types often results in testcase regressions :-)

> Semantics of <signed-boolean:32>
> --------------------------------
> "Wide" Booleans, such as <signed-boolean:32>, seems to allow more values than
> 0 and 1. For example, I've seen some IR operations like:
> 
>   _66 = _16 ? _Literal (<signed-boolean:32>) -1 : 0;
> 
> But I guess there must be some semantic difference between 
> <signed-boolean:32> and a 32-bit int, otherwise the wide Boolean type 
> would not be needed... So what are the difference?

See above, normally conversions to booleans that are wider than 1 bit are 
_not_ useless (because they require booleanization to true/false).  In the 
above case the not-useless cast is within a COND_EXPR, so it's quite 
possible that the gimplifier didn't look hard enough to split this out 
into a proper conversion statement.  (The verifier doesn't look inside 
the expressions of the COND_EXPR, so also doesn't catch this one)

If that turns out to be true and the above still happens when -1 is 
replaced by (say) 42, then it might be possible to construct a 
wrong-code testcase based on the fact that _66 as boolean should contain 
only two observable values (true/false), but could then contain 42.  OTOH, 
it might also not be possible to create such testcase, namely when GCC is 
internally too conservative in handling wide bools :-)  In that case we 
probably have a missed optimization somewhere, which when implemented 
would enable construction of such wrong-code testcase ;)

(I'm saying that -1 should be replaced by something else for a wrong-code 
testcase, because -1 is special and could justifieably be special-cased in 
GCC: -1 is the proper arithmetic value for a signed boolean that is 
"true").

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-28 15:46 ` Michael Matz
@ 2023-06-29  6:21   ` Richard Biener
  2023-06-29 12:06     ` Krister Walfridsson
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2023-06-29  6:21 UTC (permalink / raw)
  To: Michael Matz; +Cc: Krister Walfridsson, gcc

On Wed, Jun 28, 2023 at 5:47 PM Michael Matz via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hello,
>
> On Wed, 28 Jun 2023, Krister Walfridsson via Gcc wrote:
>
> > Type safety
> > -----------
> > Some transformations treat 1-bit types as a synonym of _Bool and mix the types
> > in expressions, such as:
> >
> >   <unnamed-unsigned:1> _2;
> >   _Bool _3;
> >   _Bool _4;
> >   ...
> >   _4 = _2 ^ _3;
> >
> > and similarly mixing _Bool and enum
> >
> >   enum E:bool { E0, E1 };
> >
> > in one operation.
> >
> > I had expected this to be invalid... What are the type safety rules in the
> > GIMPLE IR?
>
> Type safety in gimple is defined in terms of type compatiblity, with
> _many_ exceptions for specific types of statements.  Generally stuff is
> verified in verify_gimple_seq., in this case of a binary assign statement
> in verify_gimple_assign_binary.  As you can see there the normal rules for
> bread-and-butter binary assigns is simply that RHS, LHS1 and LHS2 must
> all be type-compatible.
>
> T1 and T2 are compatible if conversions from T1 to T2 are useless and
> conversion from T2 to T1 are also useless.  (types_compatible_p)  The meat
> for that is all in gimple-expr.cc:useless_type_conversion_p.  For this
> specific case again we have:
>
>       /* Preserve conversions to/from BOOLEAN_TYPE if types are not
>          of precision one.  */
>       if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
>            != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
>           && TYPE_PRECISION (outer_type) != 1)
>         return false;
>
> So, yes, booleans and 1-bit types can be compatible (under certain other
> conditions, see the function).
>
> > Somewhat related, gcc.c-torture/compile/pr96796.c performs a VIEW_CONVERT_EXPR
> > from
> >
> >   struct S1 {
> >     long f3;
> >     char f4;
> >   } g_3_4;
> >
> > to an int
> >
> >   p_51_9 = VIEW_CONVERT_EXPR<int>(g_3_4);
> >
> > That must be wrong?
>
> VIEW_CONVERT_EXPR is _very_ generous.  See
> verify_types_in_gimple_reference:

Yep.  In general these cases should rather use a BIT_FIELD_REF to select
a same sized subpart and only then do the rvalue conversion.  But as Micha says
below making the IL stricter isn't an easy task.

>       if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
>         {
>           /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check
>              that their operand is not a register an invariant when
>              requiring an lvalue (this usually means there is a SRA or IPA-SRA
>              bug).  Otherwise there is nothing to verify, gross mismatches at
>              most invoke undefined behavior.  */
>           if (require_lvalue
>               && (is_gimple_reg (op) || is_gimple_min_invariant (op)))
>             {
>               error ("conversion of %qs on the left hand side of %qs",
>                      get_tree_code_name (TREE_CODE (op)), code_name);
>               debug_generic_stmt (expr);
>               return true;
>             }
>           else if (is_gimple_reg (op)
>                    && TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE (op)))
>             {
>               error ("conversion of register to a different size in %qs",
>                      code_name);
>               debug_generic_stmt (expr);
>               return true;
>             }
>         }
>
> Here the operand is not a register (but a global memory object), so
> everything goes.
>
> It should be said that over the years gimples type system became stricter
> and stricter, but it started as mostly everything-goes, so making it
> stricter is a bumpy road that isn't fully travelled yet, because checking
> types often results in testcase regressions :-)
>
> > Semantics of <signed-boolean:32>
> > --------------------------------
> > "Wide" Booleans, such as <signed-boolean:32>, seems to allow more values than
> > 0 and 1. For example, I've seen some IR operations like:
> >
> >   _66 = _16 ? _Literal (<signed-boolean:32>) -1 : 0;
> >
> > But I guess there must be some semantic difference between
> > <signed-boolean:32> and a 32-bit int, otherwise the wide Boolean type
> > would not be needed... So what are the difference?
>
> See above, normally conversions to booleans that are wider than 1 bit are
> _not_ useless (because they require booleanization to true/false).  In the
> above case the not-useless cast is within a COND_EXPR, so it's quite
> possible that the gimplifier didn't look hard enough to split this out
> into a proper conversion statement.  (The verifier doesn't look inside
> the expressions of the COND_EXPR, so also doesn't catch this one)

Note the above is GIMPLE FE syntax for a constant of a specific type,
a regular dump would just show _16 ? -1 : 0;

The thing with signed bools is that the two relevant values are -1 (true)
and 0 (false), those are used for vector bool components where we also
need them to be of wider type (32bits in this case).

Richard.

> If that turns out to be true and the above still happens when -1 is
> replaced by (say) 42, then it might be possible to construct a
> wrong-code testcase based on the fact that _66 as boolean should contain
> only two observable values (true/false), but could then contain 42.  OTOH,
> it might also not be possible to create such testcase, namely when GCC is
> internally too conservative in handling wide bools :-)  In that case we
> probably have a missed optimization somewhere, which when implemented
> would enable construction of such wrong-code testcase ;)
>
> (I'm saying that -1 should be replaced by something else for a wrong-code
> testcase, because -1 is special and could justifieably be special-cased in
> GCC: -1 is the proper arithmetic value for a signed boolean that is
> "true").
>
>
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-29  6:21   ` Richard Biener
@ 2023-06-29 12:06     ` Krister Walfridsson
  2023-06-29 12:24       ` Michael Matz
  2023-06-29 13:19       ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Krister Walfridsson @ 2023-06-29 12:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Matz, Krister Walfridsson, gcc

On Thu, 29 Jun 2023, Richard Biener wrote:

> The thing with signed bools is that the two relevant values are -1 (true)
> and 0 (false), those are used for vector bool components where we also
> need them to be of wider type (32bits in this case).

My main confusion comes from seeing IR doing arithmetic such as

   <signed-boolean:32> _127;
   <signed-boolean:32> _169;
   ...
   _169 = _127 + -1;

or

   <signed-boolean:32> _127;
   <signed-boolean:32> _169;
   ...
   _169 = -_127;

and it was unclear to me what kind of arithmetic is allowed.

I have now verified that all cases seems to be just one operation of this 
form (where _127 has the value 0 or 1), so it cannot construct values 
such as 42. But the wide signed Boolean can have the three different 
values 1, 0, and -1, which I still think is at least one too many. :)

I'll update my tool to complain if the value is outside the range [-1, 1].

   /Krister

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-29 12:06     ` Krister Walfridsson
@ 2023-06-29 12:24       ` Michael Matz
  2023-06-29 13:19       ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Matz @ 2023-06-29 12:24 UTC (permalink / raw)
  To: Krister Walfridsson; +Cc: Richard Biener, gcc

Hello,

On Thu, 29 Jun 2023, Krister Walfridsson wrote:

> > The thing with signed bools is that the two relevant values are -1 (true)
> > and 0 (false), those are used for vector bool components where we also
> > need them to be of wider type (32bits in this case).
> 
> My main confusion comes from seeing IR doing arithmetic such as
> 
>   <signed-boolean:32> _127;
>   <signed-boolean:32> _169;
>   ...
>   _169 = _127 + -1;
> 
> or
> 
>   <signed-boolean:32> _127;
>   <signed-boolean:32> _169;
>   ...
>   _169 = -_127;
> 
> and it was unclear to me what kind of arithmetic is allowed.
> 
> I have now verified that all cases seems to be just one operation of this form
> (where _127 has the value 0 or 1), so it cannot construct values such as 42.
> But the wide signed Boolean can have the three different values 1, 0, and -1,
> which I still think is at least one too many. :)

It definitely is.  For signed bool it should be -1 and 0, for unsigned 
bool 1 and 0.  And of course, arithmetic on bools is always dubious, that  
should all be logical operations.  Modulo-arithmetic (mod 2) could be 
made to work, but then we would have to give up the idea of signed bools 
and always use conversions to signed int to get a bitmaks of all-ones.  
And as mod-2-arithmetic is equivalent to logical ops it seems a bit futile 
to go that way.

Of course, enforcing this all might lead to a surprising heap of errors, 
but one has to start somewhere, so ...

> I'll update my tool to complain if the value is outside the range [-1, 
> 1].

... maybe not do that, at least optionally, that maybe somewhen someone 
can look into fixing that all up? :-)  -fdubious-bools?

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-29 12:06     ` Krister Walfridsson
  2023-06-29 12:24       ` Michael Matz
@ 2023-06-29 13:19       ` Richard Biener
  2023-06-29 19:08         ` Krister Walfridsson
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Biener @ 2023-06-29 13:19 UTC (permalink / raw)
  To: Krister Walfridsson; +Cc: Michael Matz, gcc

On Thu, Jun 29, 2023 at 2:06 PM Krister Walfridsson
<krister.walfridsson@gmail.com> wrote:
>
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
> > The thing with signed bools is that the two relevant values are -1 (true)
> > and 0 (false), those are used for vector bool components where we also
> > need them to be of wider type (32bits in this case).
>
> My main confusion comes from seeing IR doing arithmetic such as
>
>    <signed-boolean:32> _127;
>    <signed-boolean:32> _169;
>    ...
>    _169 = _127 + -1;
>
> or
>
>    <signed-boolean:32> _127;
>    <signed-boolean:32> _169;
>    ...
>    _169 = -_127;
>
> and it was unclear to me what kind of arithmetic is allowed.

IIRC we have some simplification rules that turn bit operations into
arithmetics.  Arithmetic is allowed if it keeps the values inside
[-1,0] for signed bools or [0, 1] for unsigned bools.

> I have now verified that all cases seems to be just one operation of this
> form (where _127 has the value 0 or 1), so it cannot construct values
> such as 42. But the wide signed Boolean can have the three different
> values 1, 0, and -1, which I still think is at least one too many. :)

Yeah, I'd be interested in a testcase that shows this behavior.

Richard.


> I'll update my tool to complain if the value is outside the range [-1, 1].
>
>    /Krister

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-29 13:19       ` Richard Biener
@ 2023-06-29 19:08         ` Krister Walfridsson
  2023-06-29 19:32           ` Andrew Pinski
  0 siblings, 1 reply; 8+ messages in thread
From: Krister Walfridsson @ 2023-06-29 19:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: Krister Walfridsson, Michael Matz, gcc

On Thu, 29 Jun 2023, Richard Biener wrote:

> IIRC we have some simplification rules that turn bit operations into
> arithmetics.  Arithmetic is allowed if it keeps the values inside
> [-1,0] for signed bools or [0, 1] for unsigned bools.
>
>> I have now verified that all cases seems to be just one operation of this
>> form (where _127 has the value 0 or 1), so it cannot construct values
>> such as 42. But the wide signed Boolean can have the three different
>> values 1, 0, and -1, which I still think is at least one too many. :)
>
> Yeah, I'd be interested in a testcase that shows this behavior.

I created PR 110487 with one example.


>> I'll update my tool to complain if the value is outside the range [-1, 1].

It is likely that all issues I have seen so far are due to PR 110487, so 
I'll keep the current behavior that complains if the value is outside the 
range [-1, 0].

    /Krister

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: types in GIMPLE IR
  2023-06-29 19:08         ` Krister Walfridsson
@ 2023-06-29 19:32           ` Andrew Pinski
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Pinski @ 2023-06-29 19:32 UTC (permalink / raw)
  To: Krister Walfridsson; +Cc: Richard Biener, Michael Matz, gcc

On Thu, Jun 29, 2023 at 12:10 PM Krister Walfridsson via Gcc
<gcc@gcc.gnu.org> wrote:
>
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
> > IIRC we have some simplification rules that turn bit operations into
> > arithmetics.  Arithmetic is allowed if it keeps the values inside
> > [-1,0] for signed bools or [0, 1] for unsigned bools.
> >
> >> I have now verified that all cases seems to be just one operation of this
> >> form (where _127 has the value 0 or 1), so it cannot construct values
> >> such as 42. But the wide signed Boolean can have the three different
> >> values 1, 0, and -1, which I still think is at least one too many. :)
> >
> > Yeah, I'd be interested in a testcase that shows this behavior.
>
> I created PR 110487 with one example.
>
>
> >> I'll update my tool to complain if the value is outside the range [-1, 1].
>
> It is likely that all issues I have seen so far are due to PR 110487, so
> I'll keep the current behavior that complains if the value is outside the
> range [-1, 0].

Yes there are many similar to this all over GCC's folding.
In this case checking TYPE_PRECISION as described in the match.pd is
not even the right approach.
The whole TYPE_PRECISION on boolean types is definitely a big can of worms.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102622 is related but
that was signed boolean:1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106053 is another related case.

For this weekend, I am going to audit some of the match patterns for
these issues.


>
>     /Krister

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-29 19:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28  9:15 types in GIMPLE IR Krister Walfridsson
2023-06-28 15:46 ` Michael Matz
2023-06-29  6:21   ` Richard Biener
2023-06-29 12:06     ` Krister Walfridsson
2023-06-29 12:24       ` Michael Matz
2023-06-29 13:19       ` Richard Biener
2023-06-29 19:08         ` Krister Walfridsson
2023-06-29 19:32           ` Andrew Pinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).