[Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time
@ 2022-06-29 13:28 pavel.morozkin at gmail dot com
  2022-06-29 14:27 ` [Bug tree-optimization/106138] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: pavel.morozkin at gmail dot com @ 2022-06-29 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

            Bug ID: 106138
           Summary: Inefficient code generation for cases when results can
                    be deduced at compile time
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pavel.morozkin at gmail dot com
  Target Milestone: ---

_Bool f(char x)
{
    _Bool b1 = x == 4;
    _Bool b2 = x & 0x3;
    return b1 && b2;
}

Invocation: gcc -O3

Actual generated code:
f:
        test    dil, 3
        setne   al
        cmp     dil, 4
        sete    dl
        and     eax, edx
        ret

Expected generated code:
f:
        xor     eax, eax
        ret

The Summary needs to be adjusted perhaps.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation for cases when results can be deduced at compile time
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
@ 2022-06-29 14:27 ` pinskia at gcc dot gnu.org
  2022-06-30  7:26 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-06-29 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation for cases when results can be deduced at compile time
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
  2022-06-29 14:27 ` [Bug tree-optimization/106138] " pinskia at gcc dot gnu.org
@ 2022-06-30  7:26 ` rguenth at gcc dot gnu.org
  2022-06-30  7:45 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-06-30  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amacleod at redhat dot com

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So we're seeing

  b1_8 = x_7(D) == 4;
  # RANGE [0, 3] NONZERO 3
  _3 = x_7(D) & 3;
  b2_9 = _3 != 0;
  _5 = b1_8 & b2_9;

and fail to optimize that.  It somewhat looks like "relations" (on x_7), but
I'm not sure if suitable to handle for ranger.

Adding something in match.pd might be possible - the key is of course that
both b1 and b2 are derived from the same value.  I think we already have
a pattern looking at not intersecting nonzero bits.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation for cases when results can be deduced at compile time
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
  2022-06-29 14:27 ` [Bug tree-optimization/106138] " pinskia at gcc dot gnu.org
  2022-06-30  7:26 ` rguenth at gcc dot gnu.org
@ 2022-06-30  7:45 ` jakub at gcc dot gnu.org
  2022-07-01  3:02 ` [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false peter at cordes dot ca
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-30  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'd say reassoc would be a better place to handle it, so that we handle both
the logical op short circuit && and x == 4 && y == 25 && z != 16 && (x & 3) !=
0.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
                   ` (2 preceding siblings ...)
  2022-06-30  7:45 ` jakub at gcc dot gnu.org
@ 2022-07-01  3:02 ` peter at cordes dot ca
  2022-07-04 14:47 ` amacleod at redhat dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: peter at cordes dot ca @ 2022-07-01  3:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
Ideally, bitwise & of booleans should also be handled, not just &&.
A testcase (https://godbolt.org/z/qvosv8q7c) makes it easy to check both.

//#define LOGIC_AND 
_Bool f2(char x)
{
    _Bool b1 = x == 2;
    _Bool b2 = x & 1;

    #ifdef LOGIC_AND
      return b1 && b2;
    #else
      return b1 & b2;
    #endif
}

(Clang optimized it to return false for the && version, but not bitwise.  GCC
currently doesn't optimize either way.)
This was originally posted on Stack Overflow
(https://stackoverflow.com/q/72802469/224132), BTW.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
                   ` (3 preceding siblings ...)
  2022-07-01  3:02 ` [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false peter at cordes dot ca
@ 2022-07-04 14:47 ` amacleod at redhat dot com
  2022-07-07 20:39 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: amacleod at redhat dot com @ 2022-07-04 14:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

--- Comment #4 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Richard Biener from comment #1)
> So we're seeing
> 
>   b1_8 = x_7(D) == 4;
>   # RANGE [0, 3] NONZERO 3
>   _3 = x_7(D) & 3;
>   b2_9 = _3 != 0;
>   _5 = b1_8 & b2_9;
> 
> and fail to optimize that.  It somewhat looks like "relations" (on x_7), but
> I'm not sure if suitable to handle for ranger.
> 
> Adding something in match.pd might be possible - the key is of course that
> both b1 and b2 are derived from the same value.  I think we already have
> a pattern looking at not intersecting nonzero bits.

Its not suitable yet anyway. theres an order of indirection in that any
relation is based on whether b1_8 and b2_9 are both 1 or both 0...  If there
was  a branch we start to pick up information on the edges...  but without an
edge using _5, we don't yet try to paw back the same way.

perhaps we could optimize &/&& and |/|| to check if one of the operands doesnt
matter..

ie, if b1_8 is 1, then check if b2_9 is always 1 with the range of x_7([4,4]), 
likewise if b1_8 is 0, then b2_9 doesnt matter.  so we can simplify this to _5
= b1_8

it seems like something we could do with simplify_using_ranges quite easily by
utilizing gori perhaps?  easy enough to pick up a range for b2_9 based on the
range of x_7 from b1_8 == [1,1]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
                   ` (4 preceding siblings ...)
  2022-07-04 14:47 ` amacleod at redhat dot com
@ 2022-07-07 20:39 ` pinskia at gcc dot gnu.org
  2022-07-07 21:18 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-07 20:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-07-07

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> I'd say reassoc would be a better place to handle it, so that we handle both
> the logical op short circuit && and x == 4 && y == 25 && z != 16 && (x & 3)
> != 0.

Right now reassoc calls maybe_fold_and_comparisons and
maybe_fold_or_comparisons which are places where this could be added. Or even
match-and-simplify gets called (via those two functions).

You could add it either in combine_comparisons or match.pd.

Maybe something like (psedu code as I didn't add the wi part and it can be
extended and all):
(simplify
 (bit_and
  (eq @0 INTEGER_CST@1)
  (ne (bit_and @0 INTEGER_CST@2) zero_p))
 (if ((@1 & @2) == 0))
 ({true_bool_value;}))

You can change @1 and @2 to
with_possible_nonzero_bits/with_possible_nonzero_bits2 and then compare the
nonzero bits to see if they overlap.
Also extend it to handle ior too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
                   ` (5 preceding siblings ...)
  2022-07-07 20:39 ` pinskia at gcc dot gnu.org
@ 2022-07-07 21:18 ` pinskia at gcc dot gnu.org
  2022-07-08  9:47 ` pavel.morozkin at gmail dot com
  2022-07-08 19:35 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-07 21:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note for:
bool f(char x)
{
    bool b1 = x == 4;
    bool b2 = x & 4;
    return b1 && b2;
}


We should just get:
bool f(char x)
{
    return x == 4;
}

bit_ior should be handled too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
                   ` (6 preceding siblings ...)
  2022-07-07 21:18 ` pinskia at gcc dot gnu.org
@ 2022-07-08  9:47 ` pavel.morozkin at gmail dot com
  2022-07-08 19:35 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pavel.morozkin at gmail dot com @ 2022-07-08  9:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

--- Comment #7 from Pavel M <pavel.morozkin at gmail dot com> ---
May be useful: https://devblogs.microsoft.com/cppblog/new-code-optimizer.
Search for "Bit Estimator" section containing "Folding comparisons and
branches".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
  2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
                   ` (7 preceding siblings ...)
  2022-07-08  9:47 ` pavel.morozkin at gmail dot com
@ 2022-07-08 19:35 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-08 19:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Pavel M from comment #7)
> May be useful: https://devblogs.microsoft.com/cppblog/new-code-optimizer.
> Search for "Bit Estimator" section containing "Folding comparisons and
> branches".

All of the infrastructure for the above referenced is already done really. This
is what with_possible_nonzero_bits/get_nonzero_bits is used for inside
match.pd. My original example match.pd used a constant but does not need to.

In fact GCC already handles:
bool f(int x)
{
    if (x == 4)
      if (x & 0x3)
        return true;
    return false;
}
It is the & case where GCC does not handle (&& in the original code is
converted into & early on as both sides do NOT have side effects and such).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-08 19:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 13:28 [Bug c/106138] New: Inefficient code generation for cases when results can be deduced at compile time pavel.morozkin at gmail dot com
2022-06-29 14:27 ` [Bug tree-optimization/106138] " pinskia at gcc dot gnu.org
2022-06-30  7:26 ` rguenth at gcc dot gnu.org
2022-06-30  7:45 ` jakub at gcc dot gnu.org
2022-07-01  3:02 ` [Bug tree-optimization/106138] Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false peter at cordes dot ca
2022-07-04 14:47 ` amacleod at redhat dot com
2022-07-07 20:39 ` pinskia at gcc dot gnu.org
2022-07-07 21:18 ` pinskia at gcc dot gnu.org
2022-07-08  9:47 ` pavel.morozkin at gmail dot com
2022-07-08 19:35 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).