[Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
@ 2024-02-24  9:58 janschultke at googlemail dot com
  2024-02-24 10:14 ` [Bug middle-end/114086] " jakub at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: janschultke at googlemail dot com @ 2024-02-24  9:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

            Bug ID: 114086
           Summary: Boolean switches could have a lot better codegen,
                    possibly utilizing bit-vectors
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: janschultke at googlemail dot com
  Target Milestone: ---

https://godbolt.org/z/3acqbbn3E

enum struct E { a, b, c, d, e, f, g, h };

bool test_switch(E e) {
    switch (e) {
    case E::a:
    case E::c:
    case E::e:
    case E::g: return true;
    default: return false;
    }
}


Expected output
===============

test_switch(E):
  mov eax, edi
  and eax, 1
  ret



Actual output (-O3)
===================

test_switch(E):
  xor eax, eax
  cmp edi, 6
  ja .L1
  mov eax, 85
  bt rax, rdi
  setc al
.L1:
  ret


Explanation
===========

Boolean switches in general can be optimized a lot better than what GCC
currently does. Clang does find the optimization to a bitwise AND, although
this may be a big ask.

Generally, contiguous boolean switches (that is, switch statements where all
cases yield a boolean value and the labels are contiguous) can be optimized to
accessing a bit vector.

That switch could have been transformed into:

> return 0b01010101 >> int(e);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
@ 2024-02-24 10:14 ` jakub at gcc dot gnu.org
  2024-02-24 10:49 ` janschultke at googlemail dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-24 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
  mov eax, edi
  and eax, 1
  ret
seems wrong without -fstrict-enums, one could call test_switch(static_cast <E>
(9))
and it should return false in that case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
  2024-02-24 10:14 ` [Bug middle-end/114086] " jakub at gcc dot gnu.org
@ 2024-02-24 10:49 ` janschultke at googlemail dot com
  2024-02-24 11:02 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: janschultke at googlemail dot com @ 2024-02-24 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #2 from Jan Schultke <janschultke at googlemail dot com> ---
Yeah right, the actual optimal output (which clang finds) is:

> test_switch(E):
>   test edi, -7
>   sete al
>   ret


Testing with -7 also makes sure that the 8-bit and greater are all zero.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
  2024-02-24 10:14 ` [Bug middle-end/114086] " jakub at gcc dot gnu.org
  2024-02-24 10:49 ` janschultke at googlemail dot com
@ 2024-02-24 11:02 ` jakub at gcc dot gnu.org
  2024-02-24 11:22 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-24 11:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
And the rest boils down to what code to generate for
bool
foo (int x)
{
  return ((682 >> x) & 1);
}
Both that and switch from the #c0 testcase boil down to
  _1 = 682 >> x_2(D);
  _3 = (_Bool) _1;
or
  _6 = 682 >> _4;
  _8 = (_Bool) _6;
in GIMPLE dump.  Now, for the foo above, gcc emits
        movl    $682, %eax
        btl     %edi, %eax
        setc    %al
        ret
and clang emits the same:
        movl    $682, %eax                      # imm = 0x2AA
        btl     %edi, %eax
        setb    %al
        retq
Though, e.g. clang 14 emitted
        movl    %edi, %ecx
        movl    $682, %eax                      # imm = 0x2AA
        shrl    %cl, %eax
        andb    $1, %al
        retq
which is longer, dunno what is faster.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (2 preceding siblings ...)
  2024-02-24 11:02 ` jakub at gcc dot gnu.org
@ 2024-02-24 11:22 ` jakub at gcc dot gnu.org
  2024-02-24 11:27 ` janschultke at googlemail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-24 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
But sure, confirmed for both:

int
foo (int e)
{
  switch (e)
    {
    case 1:
    case 3:
    case 5:
    case 7:
    case 9:
    case 11:
    case 13:
      return 1;
    default:
      return 0;
    }
}

int
bar (int e)
{
  switch (e)
    {
    case 1:
    case 3:
    case 5:
    case 7:
    case 9:
    case 11:
    case 13:
    case 15:
      return 1;
    default:
      return 0;
    }
}

where in foo because we emit the guarding
        cmpl    $13, %edi
        ja      .L1
we could just simplify it to andl $1 when <= 13, and the bar case indeed can be
done
by (e & -15) != 0;
Now, the question is if either of these optimizations should be done in the
switch lowering, or if we should do it elsewhere where it would optimize also
hand written code like that, if user writes it as
int
foo2 (int e)
{
  if (e <= 13U)
    return (10922 >> e) & 1;
  else
    return 0;
}

int
bar2 (int e)
{
  if (e <= 15U)
    return (43690 >> e) & 1;
  else
    return 0;
}
Looking at clang, it can optimize bar, it can't optimize foo (uses switch table
rather than shift, that is worse than what gcc emits).  And emits pretty much
what gcc emits for foo2/bar2.
Perhaps phiopt could handle this for the bar2 case and match.pd using range
info for foo2?
Next question is what should be done if the 2 values aren't 1 and 0, but 0 and
1, or
some cst and cst + 1 or cst and cst - 1 for some arbitrary constant cst, or cst
and 0,
or 0 and cst or cst1 and cst2, whether to emit e.g. a conditional move etc.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (3 preceding siblings ...)
  2024-02-24 11:22 ` jakub at gcc dot gnu.org
@ 2024-02-24 11:27 ` janschultke at googlemail dot com
  2024-02-24 11:52 ` [Bug tree-optimization/114086] " jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: janschultke at googlemail dot com @ 2024-02-24 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #5 from Jan Schultke <janschultke at googlemail dot com> ---
Well, it's not quite equivalent to either of the bit-shifts we've posted. To
account for shifting more than the operand size, it would be:

bool foo (int x)
{
  return x > 6 ? 0 : ((85 >> x) & 1);
}


This is exactly what GCC does and the branch can be explained by this range
check.

So I guess GCC already does optimize this to a bit-vector, it just doesn't find
the optimization to:

bool foo(int x)
{
    return (x & -7) == 0;
}


This is very specific to this particular switch statement though. You could do
better than having a branch if the hardware supported a saturating shift, but
probably not on x86_64.

Nevermind that; if anything, this isn't middle-end.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (4 preceding siblings ...)
  2024-02-24 11:27 ` janschultke at googlemail dot com
@ 2024-02-24 11:52 ` jakub at gcc dot gnu.org
  2024-02-24 12:13 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-24 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |tree-optimization

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jan Schultke from comment #5)
> Well, it's not quite equivalent to either of the bit-shifts we've posted.

The #c4 foo2/bar2 are functionally equivalent to #c4 foo/bar, it is what gcc
actually emits for the latter.
x > 6 ? 0 : ((85 >> x) & 1)
isn't functionally equivalent to anything mentioned so far here, as it handles
negative values differently.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (5 preceding siblings ...)
  2024-02-24 11:52 ` [Bug tree-optimization/114086] " jakub at gcc dot gnu.org
@ 2024-02-24 12:13 ` jakub at gcc dot gnu.org
  2024-02-26  8:38 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-24 12:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Now, suppose we optimize the (0xaaaaaaaa >> x) & 1 case etc. provided suitable
range
of x to x & 1.
For
int
bar3 (int e)
{
  if (e <= 15U)
    return e & 1;
  else
    return 0;
}
phiopt optimizes this into
  return e & 1 & (e <= 15U);
so, guess we want another match.pd optimization which would turn that into e &
-15.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (6 preceding siblings ...)
  2024-02-24 12:13 ` jakub at gcc dot gnu.org
@ 2024-02-26  8:38 ` rguenth at gcc dot gnu.org
  2024-02-26 12:04 ` jakub at gcc dot gnu.org
  2024-02-26 14:58 ` amacleod at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-26  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-02-26

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (7 preceding siblings ...)
  2024-02-26  8:38 ` rguenth at gcc dot gnu.org
@ 2024-02-26 12:04 ` jakub at gcc dot gnu.org
  2024-02-26 14:58 ` amacleod at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-26 12:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldyh at gcc dot gnu.org,
                   |                            |amacleod at redhat dot com

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Unfortunately doing the ((682 >> x) & 1) to x & 1 optimization in match.pd
isn't possible, we can only use global ranges there and we need path specific
range here.
Can it be done in VRP pass?  Though, I'm afraid I'm quite lost where it
actually has
the statement optimizations (rather than mere computing of ranges),
Aldy/Andrew, any hints?  I mean like what old tree-vrp.c was doing in
simplify_stmt_using_ranges.
Guess we could duplicate that in match.pd for the case which can use global
range or
doesn't need any range at all.
I mean
unsigned int
foo (int x)
{
  return (0xaaaaaaaaU >> x) & 1;
}

unsigned int
bar (int x)
{
  return (0x55555555U >> x) & 1;
}

unsigned int
baz (int x)
{
  if (x >= 22) __builtin_unreachable ();
  return (0x5aaaaaU >> x) & 1;
}
can be optimized even with global ranges (or the first one with no ranges).
foo and baz equivalent is x & 1, while bar is (~x) & 1 or (x & 1) ^ 1, dunno
what is more canonical.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
  2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
                   ` (8 preceding siblings ...)
  2024-02-26 12:04 ` jakub at gcc dot gnu.org
@ 2024-02-26 14:58 ` amacleod at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: amacleod at redhat dot com @ 2024-02-26 14:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #9 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Jakub Jelinek from comment #8)
> Unfortunately doing the ((682 >> x) & 1) to x & 1 optimization in match.pd
> isn't possible, we can only use global ranges there and we need path
> specific range here.
> Can it be done in VRP pass?  Though, I'm afraid I'm quite lost where it
> actually has
> the statement optimizations (rather than mere computing of ranges),
> Aldy/Andrew, any hints?  I mean like what old tree-vrp.c was doing in
> simplify_stmt_using_ranges.

I don't think much has changed there... We still call into all the code in
vr-values.cc to do simplifications.  I think Aldy changed it all to be
contained in class 'simplify_using_ranges'.. but those routines are all still
in vr-values.cc.   tree-vrp calls into the top level simplfy() routine.

  bool fold_stmt (gimple_stmt_iterator *gsi) override
  {
    bool ret = m_simplifier.simplify (gsi);
    if (!ret)
      ret = ::fold_stmt (gsi, follow_single_use_edges);
    return ret;
  }


If that fails, then rangers fold_stmt() is invoked.  That is merely a
contextual wrapper around a call to gimple-fold::fold_stmt to see if normal
folding can find anything.  Under the covers I believe that invokes match.pd
which, if it was using the current range_query, would get contextual info.



> Guess we could duplicate that in match.pd for the case which can use global
> range or
> doesn't need any range at all.
> I mean
> unsigned int
> foo (int x)
> {
>   return (0xaaaaaaaaU >> x) & 1;
> }
> 
> unsigned int
> bar (int x)
> {
>   return (0x55555555U >> x) & 1;
> }
> 
> unsigned int
> baz (int x)
> {
>   if (x >= 22) __builtin_unreachable ();
>   return (0x5aaaaaU >> x) & 1;
> }
> can be optimized even with global ranges (or the first one with no ranges).
> foo and baz equivalent is x & 1, while bar is (~x) & 1 or (x & 1) ^ 1, dunno
> what is more canonical.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-02-26 14:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-24  9:58 [Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors janschultke at googlemail dot com
2024-02-24 10:14 ` [Bug middle-end/114086] " jakub at gcc dot gnu.org
2024-02-24 10:49 ` janschultke at googlemail dot com
2024-02-24 11:02 ` jakub at gcc dot gnu.org
2024-02-24 11:22 ` jakub at gcc dot gnu.org
2024-02-24 11:27 ` janschultke at googlemail dot com
2024-02-24 11:52 ` [Bug tree-optimization/114086] " jakub at gcc dot gnu.org
2024-02-24 12:13 ` jakub at gcc dot gnu.org
2024-02-26  8:38 ` rguenth at gcc dot gnu.org
2024-02-26 12:04 ` jakub at gcc dot gnu.org
2024-02-26 14:58 ` amacleod at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).