public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/110481] New: Possible improvements in dense switch statement returning values
@ 2023-06-29  8:18 tkoenig at gcc dot gnu.org
  0 siblings, 0 replies; only message in thread
From: tkoenig at gcc dot gnu.org @ 2023-06-29  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110481

            Bug ID: 110481
           Summary: Possible improvements in dense switch statement
                    returning values
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Putting this provisionally into tree-optimization, although there may
be other aspects.

Consider

unsigned int foo(unsigned int a)
{
  switch ((a >> 10) & 3)
    {
    case 0:
      return 8;
    case 1:
      return 16;
    case 2:
      return 32;
    case 3:
      return 64;
    }
}

unsigned int bar(unsigned int a)
{
  return 1u << (((a >> 10) & 3) + 3);
}

unsigned int baz (unsigned int a)
{
  switch (a & (3 << 10))
    {
    case 0:
      return 8;
    case 1 << 10:
      return 16;
    case 2 << 10:
      return 32;
    case 3 << 10:
      return 64;
    }
}

which all do the same thing.

The code for bar is

bar:
.LFB1:
        .cfi_startproc
        shrl    $10, %edi
        movl    $1, %eax
        movl    %edi, %ecx
        andl    $3, %ecx
        addl    $3, %ecx
        sall    %cl, %eax
        ret

which is optimum except for the register move (submitted as PR110479).

The compiler does not recognize that foo or baz are equivalent to bar,
but that may be too much of a special case to really consider. 

The code for foo is

foo:
.LFB0:
        .cfi_startproc
        shrl    $10, %edi
        movl    $8, %eax
        andl    $3, %edi
        decl    %edi
        cmpl    $2, %edi
        ja      .L1
        movzbl  CSWTCH.1(%rdi), %eax
.L1:
        ret
        .cfi_endproc

[...]

CSWTCH.1:
        .byte   16
        .byte   32
        .byte   64

where it seems strange that there is a comparison and conditional
jump around the load.  A look at *.optimized shows

<bb 2> [local count: 1073741824]:
  _1 = a_4(D) >> 10;
  _2 = _1 & 3;
  _8 = _2 + 4294967295;
  if (_8 <= 2)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  _6 = CSWTCH.1[_8];
  _5 = (unsigned int) _6;

  <bb 4> [local count: 1073741824]:
  # _3 = PHI <_5(3), 8(2)>
  return _3;

which assigns a probability of 50% to (a>>10)& 3 being zero.
Where this comes from is unclear to me.  A straightforward load
from a table which also includes the 8 seems more logical to me
(especially with -Os).

Finally, baz generates

baz:
.LFB2:
        .cfi_startproc
        andl    $3072, %edi
        movl    $32, %eax
        cmpl    $2048, %edi
        je      .L6
        ja      .L8
        movl    $8, %eax
        testl   %edi, %edi
        je      .L6
        movl    $16, %eax
        ret
.L8:
        movl    $64, %eax
.L6:
        ret

when transforming into something equivalent to foo (or even bar)
would seem advantageous.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-06-29  8:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29  8:18 [Bug tree-optimization/110481] New: Possible improvements in dense switch statement returning values tkoenig at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).