public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* optimization of switch statements on i386
       [not found] <719dced30802081629g19f67f6fi76dfaa0ede35b7aa@mail.gmail.com>
@ 2008-02-09  0:31 ` Godmar Back
  2008-02-09  2:05   ` Ian Lance Taylor
  0 siblings, 1 reply; 11+ messages in thread
From: Godmar Back @ 2008-02-09  0:31 UTC (permalink / raw)
  To: gcc-help

Hi,

I have a question regarding the compilation of switch statements for
i386 targets. What heuristics or algorithm does gcc use to decide
whether to use a binary search or a jump table? I'm specifically
interested in switch statements for a dense range of values and in
which the default: branch is unreachable.

I've tried a number of approaches in order to obtain a jump table, but
it seems gcc 4.x always ends up creating binary searches. For
instance, I've tried casting the switch value to a limited range enum
and placing a gcc_unreachable() in the default: case. A cursory
inspection of the source code also didn't reveal any documentation of
the strategy used. Does gcc have reason to believe that a binary
search is always faster? If so, is this true for all variants of i386?
Would it depend on the number of case arms?

Thanks for any insights you could provide.

 - Godmar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-09  0:31 ` optimization of switch statements on i386 Godmar Back
@ 2008-02-09  2:05   ` Ian Lance Taylor
  2008-02-09  2:43     ` Godmar Back
  2008-02-09  3:02     ` Godmar Back
  0 siblings, 2 replies; 11+ messages in thread
From: Ian Lance Taylor @ 2008-02-09  2:05 UTC (permalink / raw)
  To: Godmar Back; +Cc: gcc-help

"Godmar Back" <godmar@gmail.com> writes:

> I have a question regarding the compilation of switch statements for
> i386 targets. What heuristics or algorithm does gcc use to decide
> whether to use a binary search or a jump table? I'm specifically
> interested in switch statements for a dense range of values and in
> which the default: branch is unreachable.

Look at expand_case in gcc/stmt.c.

> I've tried a number of approaches in order to obtain a jump table, but
> it seems gcc 4.x always ends up creating binary searches. For
> instance, I've tried casting the switch value to a limited range enum
> and placing a gcc_unreachable() in the default: case. A cursory
> inspection of the source code also didn't reveal any documentation of
> the strategy used. Does gcc have reason to believe that a binary
> search is always faster?

No, it uses a heuristic to choose.  Probably the most relevant one
here is that if the difference between the maximum and minimum case
labels is more than 10 * the number of case labels, gcc will use
comparisons rather than a table.

> If so, is this true for all variants of i386?

I believe that all i386 variants are handled in the same way.

> Would it depend on the number of case arms?

Yes.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-09  2:05   ` Ian Lance Taylor
@ 2008-02-09  2:43     ` Godmar Back
  2008-02-09  2:52       ` Ian Lance Taylor
  2008-02-09  3:02     ` Godmar Back
  1 sibling, 1 reply; 11+ messages in thread
From: Godmar Back @ 2008-02-09  2:43 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Thanks. I received two answers that suggest that gcc uses a jump table
for consecutive values. I'm unable to reproduce this with gcc4.1.2.
(My problem is that it appears to always use a binary search.)

Example:

gback@top [311](~/tmp) > gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
--with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)
gback@top [324](~/tmp) > cat switch.c
enum four { A, B, C, D };
extern void g(int);

void f(enum four x)
{
  switch (x)
    {
      case A: g(1); break;
      case B: g(2); break;
      case C: g(3); break;
      case D: g(4); break;
      default: gcc_unreachable();
    }
}
gback@top [325](~/tmp) > gcc -S -O4 switch.c
gback@top [326](~/tmp) > cat switch.s
        .file   "switch.c"
        .text
        .p2align 4,,15
.globl f
        .type   f, @function
f:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        cmpl    $1, %eax
        je      .L4
        jb      .L3
        cmpl    $2, %eax
        je      .L5
        cmpl    $3, %eax
        .p2align 4,,7
        je      .L11
        popl    %ebp
        .p2align 4,,6
        jmp     gcc_unreachable
        .p2align 4,,7
.L3:
        movl    $1, 8(%ebp)
        popl    %ebp
        jmp     g
        .p2align 4,,7
.L4:
        movl    $2, 8(%ebp)
        popl    %ebp
        jmp     g
.L11:
        movl    $4, 8(%ebp)
        popl    %ebp
        jmp     g
        .p2align 4,,7
.L5:
        movl    $3, 8(%ebp)
        popl    %ebp
        jmp     g
        .size   f, .-f
        .ident  "GCC: (GNU) 4.1.2 20070626 (Red Hat 4.1.2-13)"
        .section        .note.GNU-stack,"",@progbits

---

which is a binary search, not a jump table (unless I'm missing something.)

Thanks!

 - Godmar

On 08 Feb 2008 18:04:23 -0800, Ian Lance Taylor <iant@google.com> wrote:
> "Godmar Back" <godmar@gmail.com> writes:
>
> > I have a question regarding the compilation of switch statements for
> > i386 targets. What heuristics or algorithm does gcc use to decide
> > whether to use a binary search or a jump table? I'm specifically
> > interested in switch statements for a dense range of values and in
> > which the default: branch is unreachable.
>
> Look at expand_case in gcc/stmt.c.
>
> > I've tried a number of approaches in order to obtain a jump table, but
> > it seems gcc 4.x always ends up creating binary searches. For
> > instance, I've tried casting the switch value to a limited range enum
> > and placing a gcc_unreachable() in the default: case. A cursory
> > inspection of the source code also didn't reveal any documentation of
> > the strategy used. Does gcc have reason to believe that a binary
> > search is always faster?
>
> No, it uses a heuristic to choose.  Probably the most relevant one
> here is that if the difference between the maximum and minimum case
> labels is more than 10 * the number of case labels, gcc will use
> comparisons rather than a table.
>
> > If so, is this true for all variants of i386?
>
> I believe that all i386 variants are handled in the same way.
>
> > Would it depend on the number of case arms?
>
> Yes.
>
> Ian
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-09  2:43     ` Godmar Back
@ 2008-02-09  2:52       ` Ian Lance Taylor
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Lance Taylor @ 2008-02-09  2:52 UTC (permalink / raw)
  Cc: gcc-help

"Godmar Back" <godmar@gmail.com> writes:

> gback@top [324](~/tmp) > cat switch.c
> enum four { A, B, C, D };
> extern void g(int);
> 
> void f(enum four x)
> {
>   switch (x)
>     {
>       case A: g(1); break;
>       case B: g(2); break;
>       case C: g(3); break;
>       case D: g(4); break;
>       default: gcc_unreachable();
>     }
> }

You don't have enough cases.  gcc will always uses a binary search if
there are fewer than five cases.

Try this one.

Ian

enum four { A, B, C, D, E, F, G, H, I, J, K, L, M, N };
extern void g(int);

void f(enum four x)
{
  switch (x)
    {
      case A: g(1); break;
      case B: g(2); break;
      case C: g(3); break;
      case D: g(4); break;
      case E: g(5); break;
      case F: g(6); break;
      case G: g(7); break;
      case H: g(8); break;
      case I: g(9); break;
      case J: g(10); break;
      case K: g(11); break;
      case L: g(12); break;
      case M: g(13); break;
      case N: g(14); break;
      default: gcc_unreachable();
    }
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-09  2:05   ` Ian Lance Taylor
  2008-02-09  2:43     ` Godmar Back
@ 2008-02-09  3:02     ` Godmar Back
  2008-02-11 17:55       ` Ian Lance Taylor
  1 sibling, 1 reply; 11+ messages in thread
From: Godmar Back @ 2008-02-09  3:02 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

On 08 Feb 2008 18:04:23 -0800, Ian Lance Taylor <iant@google.com> wrote:
>
> Look at expand_case in gcc/stmt.c.
>

Thanks. So I did that, and that explains why my example with 4 cases
does not use a table.
gcc/expr.c says:

/* If the machine does not have a case insn that compares the bounds,
   this means extra overhead for dispatch tables, which raises the
   threshold for using them.  */
#ifndef CASE_VALUES_THRESHOLD
#define CASE_VALUES_THRESHOLD (HAVE_casesi ? 4 : 5)
#endif /* CASE_VALUES_THRESHOLD */

and a binary search is used if #arms < CASE_VALUES_THRESHOLD.
This leads to my next question.

I increased the number of arms to 5, and retained the "default:
gcc_unreachable()".
Now gcc generates a bounds check, followed by a table jump. Good. Now
how do I get rid of the bounds checks?

Is there a way to inform gcc that the default branch is never taken?
(Something along the lines of MSVC's __assert(0)?   I had thought
gcc_unreachable() would do that - but I may be misinformed here.)

Thanks,

 - Godmar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-09  3:02     ` Godmar Back
@ 2008-02-11 17:55       ` Ian Lance Taylor
  2008-02-11 18:13         ` Godmar Back
  2008-02-11 18:47         ` Diego Novillo
  0 siblings, 2 replies; 11+ messages in thread
From: Ian Lance Taylor @ 2008-02-11 17:55 UTC (permalink / raw)
  To: Godmar Back; +Cc: gcc-help

"Godmar Back" <godmar@gmail.com> writes:

> I increased the number of arms to 5, and retained the "default:
> gcc_unreachable()".
> Now gcc generates a bounds check, followed by a table jump. Good. Now
> how do I get rid of the bounds checks?
> 
> Is there a way to inform gcc that the default branch is never taken?
> (Something along the lines of MSVC's __assert(0)?   I had thought
> gcc_unreachable() would do that - but I may be misinformed here.)

gcc_unreachable means nothing special to gcc.  If you look at the
resulting code, you will see that it just compiles into a function
call.  If you use -Wall you will see that the function was not
declared before being called.

gcc_unreachable is used in the gcc source code itself, but that is
different from being recognized in gcc.  It is defined as a macro in
gcc/system.h for use when building gcc itself:
#define gcc_unreachable() (fancy_abort (__FILE__, __LINE__, __FUNCTION__))

There is no way to tell gcc that the default branch is never taken,
although it will omit the default branch if it can prove that the
value is never out of range based on earlier conditional checks.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-11 17:55       ` Ian Lance Taylor
@ 2008-02-11 18:13         ` Godmar Back
  2008-02-11 19:31           ` Ian Lance Taylor
  2008-02-11 18:47         ` Diego Novillo
  1 sibling, 1 reply; 11+ messages in thread
From: Godmar Back @ 2008-02-11 18:13 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

On 11 Feb 2008 09:54:19 -0800, Ian Lance Taylor <iant@google.com> wrote:
> "Godmar Back" <godmar@gmail.com> writes:
>
> > I increased the number of arms to 5, and retained the "default:
> > gcc_unreachable()".
> > Now gcc generates a bounds check, followed by a table jump. Good. Now
> > how do I get rid of the bounds checks?
> >
> > Is there a way to inform gcc that the default branch is never taken?
> > (Something along the lines of MSVC's __assert(0)?   I had thought
> > gcc_unreachable() would do that - but I may be misinformed here.)
>
> gcc_unreachable means nothing special to gcc.  If you look at the
> resulting code, you will see that it just compiles into a function
> call.  If you use -Wall you will see that the function was not
> declared before being called.
>
> gcc_unreachable is used in the gcc source code itself, but that is
> different from being recognized in gcc.  It is defined as a macro in
> gcc/system.h for use when building gcc itself:
> #define gcc_unreachable() (fancy_abort (__FILE__, __LINE__, __FUNCTION__))
>

Thanks. I did see that gcc_unreachable() showed up as a symbol in the
assembly (consistent with it not meaning anything special); I suppose
I was misled by this comment in the gcc coding conventions at
http://gcc.gnu.org/codingconventions.html

"Use gcc_unreachable() to mark places that should never be reachable
(such as an unreachable default case of a switch). Do not use
gcc_assert(0) for such purposes, as gcc_unreachable gives the compiler
more information."

Apparently, this discussion refers to the (currently executing)
compiler, not the compiler used to compile the gcc code.

> There is no way to tell gcc that the default branch is never taken,
> although it will omit the default branch if it can prove that the
> value is never out of range based on earlier conditional checks.
>

Good to know that I'm not looking for something that doesn't exist.

I assume a corollary of that statement is that there is no way to
trick the compiler into omitting the default branch without incurring
runtime checks (or is there a clever way I'm not realizing)?

 - Godmar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-11 17:55       ` Ian Lance Taylor
  2008-02-11 18:13         ` Godmar Back
@ 2008-02-11 18:47         ` Diego Novillo
  2008-02-11 19:52           ` Godmar Back
  1 sibling, 1 reply; 11+ messages in thread
From: Diego Novillo @ 2008-02-11 18:47 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Godmar Back, gcc-help

On 11 Feb 2008 09:54:19 -0800, Ian Lance Taylor <iant@google.com> wrote:

>  gcc_unreachable means nothing special to gcc.

But it does have control-flow connotations.  The function ultimately
expands into fancy_abort, which is marked noreturn. We do use that
fact in the optimizer.


Diego.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-11 18:13         ` Godmar Back
@ 2008-02-11 19:31           ` Ian Lance Taylor
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Lance Taylor @ 2008-02-11 19:31 UTC (permalink / raw)
  To: Godmar Back; +Cc: gcc-help

"Godmar Back" <godmar@gmail.com> writes:

> Thanks. I did see that gcc_unreachable() showed up as a symbol in the
> assembly (consistent with it not meaning anything special); I suppose
> I was misled by this comment in the gcc coding conventions at
> http://gcc.gnu.org/codingconventions.html
> 
> "Use gcc_unreachable() to mark places that should never be reachable
> (such as an unreachable default case of a switch). Do not use
> gcc_assert(0) for such purposes, as gcc_unreachable gives the compiler
> more information."
> 
> Apparently, this discussion refers to the (currently executing)
> compiler, not the compiler used to compile the gcc code.

Those coding convention are for people working on gcc itself, not for
people using gcc.


> I assume a corollary of that statement is that there is no way to
> trick the compiler into omitting the default branch without incurring
> runtime checks (or is there a clever way I'm not realizing)?

In general, yes.

There has been some discussion of implementing __builtin_unreachable()
which would direct the optimizers to assume that the code path was
never taken.  Howver, as far as I know no actual woek has been done on
this.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-11 18:47         ` Diego Novillo
@ 2008-02-11 19:52           ` Godmar Back
  2008-02-11 20:01             ` Ian Lance Taylor
  0 siblings, 1 reply; 11+ messages in thread
From: Godmar Back @ 2008-02-11 19:52 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Ian Lance Taylor, gcc-help

Even if I placed a call to a 'noreturn' function in the default: case,
the compiler couldn't optimize the default branch away. (It won't
produce any code to continue after the function call, but my desire is
to eliminate the test leading to the default: branch.)

Following on the original topic of discussion, how about if I use
computed gotos?

If the reasoning of gcc's developers is correct, and it's the lack of
a casesi instruction on x86 that set the minimum of arm to use a table
jump to 5, then I may be able to produce faster code for 4 arms using
computed gotos; especially since I cannot convince gcc to eliminate
the check for the default case. Do you agree with that?

 - Godmar

On Feb 11, 2008 1:47 PM, Diego Novillo <dnovillo@google.com> wrote:
> On 11 Feb 2008 09:54:19 -0800, Ian Lance Taylor <iant@google.com> wrote:
>
> >  gcc_unreachable means nothing special to gcc.
>
> But it does have control-flow connotations.  The function ultimately
> expands into fancy_abort, which is marked noreturn. We do use that
> fact in the optimizer.
>
>
> Diego.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: optimization of switch statements on i386
  2008-02-11 19:52           ` Godmar Back
@ 2008-02-11 20:01             ` Ian Lance Taylor
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Lance Taylor @ 2008-02-11 20:01 UTC (permalink / raw)
  To: Godmar Back; +Cc: Diego Novillo, gcc-help

"Godmar Back" <godmar@gmail.com> writes:

> If the reasoning of gcc's developers is correct, and it's the lack of
> a casesi instruction on x86 that set the minimum of arm to use a table
> jump to 5, then I may be able to produce faster code for 4 arms using
> computed gotos; especially since I cannot convince gcc to eliminate
> the check for the default case. Do you agree with that?

Given the branch prediction cache and out-of-order execution of modern
x86 processors, it's unlikely that you'll see much or any speedup.
But the only way to know for sure is to measure the execution time of
the code.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-02-11 20:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <719dced30802081629g19f67f6fi76dfaa0ede35b7aa@mail.gmail.com>
2008-02-09  0:31 ` optimization of switch statements on i386 Godmar Back
2008-02-09  2:05   ` Ian Lance Taylor
2008-02-09  2:43     ` Godmar Back
2008-02-09  2:52       ` Ian Lance Taylor
2008-02-09  3:02     ` Godmar Back
2008-02-11 17:55       ` Ian Lance Taylor
2008-02-11 18:13         ` Godmar Back
2008-02-11 19:31           ` Ian Lance Taylor
2008-02-11 18:47         ` Diego Novillo
2008-02-11 19:52           ` Godmar Back
2008-02-11 20:01             ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).