public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug inline-asm/40124]  New: Inline asm should support limited control flow
@ 2009-05-12 16:01 scovich at gmail dot com
  2009-05-12 16:05 ` [Bug inline-asm/40124] " pinskia at gcc dot gnu dot org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: scovich at gmail dot com @ 2009-05-12 16:01 UTC (permalink / raw)
  To: gcc-bugs

Right now gcc does not officially support local control flow of any kind. The
user is free to jump around within an asm block using local labels but this is
cumbersome and bug-prone. Jumping out of an asm block in any way is
(un?)officially verboten. 

This RFE is for support of jumping from an asm block to a label defined
somewhere in the enclosing function. It does not deal with jumps between asm
blocks or non-local control flow like function calls (both of which are
understandably nasty).

Rationale:

1. There is demand for this feature. MSVC supports it (see
http://msdn.microsoft.com/en-us/library/aa279405(VS.60).aspx), and people
currently kludge it in gcc by defining assembler labels in other asm blocks and
jumping to them, even though the gcc docs explicitly forbid it (e.g.
http://stackoverflow.com/questions/744055/gcc-inline-assembly-jump-to-label-outside-block).
Jumping to a C label is less error-prone, more general, and significantly
easier to implement than asm-asm jumps (see #4).

2. It's already implemented. The assembler label corresponding to a C label is
accessible like this -- asm("jmp %l0" ::"i"(&&label)) -- where "%l" emits the
label "with the syntax expected of a jump target" (Brennan's inline asm guide).
All that's left is for the compiler to mark such labels as
used-by-computed-goto so the optimizer plays nice; even that can be faked with
judicious use of computed gotos (see example code, below).

3. Its effect is basically the same as for a computed goto, which is already
supported. I suspect that every argument that jumping from asm to C is
inherently unsafe or difficult to implement applies equally well to computed
gotos (e.g. Bug #37722).

4. It would encourage smaller asm blocks because internal control flow would
seldom be necessary any more. The resulting code would be easier to write and
debug, more portable, and less fragile. Further, the example below suggests
that the compiler may generate better code as well. 

Example: Efficient use of condition codes

Currently the only way to use condition codes generated within an asm block is
to either convert them to an output value (which requires extra instructions to
create and test) or to embed the if/else inside the asm block (forcing the user
to write extra assembly code even if it was otherwise portable). Support for
jumping from asm to local labels would allow a better idiom:

void handle_overflow(int a, int b, int* dest);

int ideal_foo(int a, int b) {
    int rval = a;
    asm("adc    %[src], %[dest]\n\t"
        "jno    %l[no_overflow]"
        : [dest] "+r"(rval)
        : [src] "r"(b), [no_overflow] "i"(&&done));
        handle_overflow(a, b, &rval);
 done:
    return rval;
}

int current_foo(int a, int b) {
    int overflow = 0;
    int rval = a;
    asm("adc    %[src], %[dest]\n\t"
        "cmovo  %[true], %[overflow]"
        : [dest] "+r"(rval), [overflow] "+r"(overflow)
        : [src] "r"(b), [true] "r"(1));
    if(overflow)
        handle_overflow(a, b, &rval);
    return rval;
}

int hacked_foo(int a, int b) {
    static void* volatile labels[] = {&&overflow, &&no_overflow};
    int rval=a;
    asm("adc    %[src], %[dest]\n\t"
        "jno    %l[no_overflow]\n\t"
        "jmp    %l[overflow]"
        : [dest] "=r"(rval)
        : [src] "r"(b), "[dest]"(rval),
        [no_overflow] "i"(&&no_overflow), [overflow] "i"( &&overflow));
    goto *labels[0];
 overflow:
        handle_overflow(a, b, &rval);
 no_overflow:
    return rval;
}


Extracts of x86_64 assembler output for "x86_64-unknown-linux-gnu-gcc-4.2.2 -S
-O3" follows. Notice that ideal_foo is clearly the best, *except* that the
optimizer broke it by moving .L9 to the top of the loop (it should be just
above the addq). If it were optimized better it would take only four
instructions on the short path. hacked_foo would also beat current_foo by quite
a bit, except the compiler decided to set up a stack frame for it:


ideal_foo:
.L9:
        subq    $16, %rsp
        movl    %edi, %eax
        leaq    12(%rsp), %rdx
        movl    %edi, 12(%rsp)
#APP
        adc     %esi, %eax
        jno     .L9
#NO_APP
        movl    %eax, 12(%rsp)
        call    handle_overflow
        movl    12(%rsp), %eax
        addq    $16, %rsp
        ret


current_foo:
        pushq   %rbx
        xorl    %eax, %eax
        movl    $1, %edx
        movl    %eax, %ebx
        movl    %edi, %ecx
        subq    $16, %rsp
#APP
        adc     %esi, %ecx
        cmovo   %edx, %ebx
#NO_APP
        testl   %ebx, %ebx
        movl    %edi, 12(%rsp)
        movl    %ecx, %eax
        movl    %ecx, 12(%rsp)
        je      .L12
        leaq    12(%rsp), %rdx
        call    handle_overflow
        movl    12(%rsp), %eax
.L12:
        addq    $16, %rsp
        popq    %rbx
        ret


hacked_foo:
        movq    %rbx, -16(%rsp)
        movq    %rbp, -8(%rsp)
        subq    $32, %rsp
        movl    %edi, 12(%rsp)
        movl    %edi, %eax
#APP
        adc     %esi, %eax
        jno     .L4
        jmp     .L5
#NO_APP
        movl    %eax, 12(%rsp)
        movq    labels.1894(%rip), %rax
        jmp     *%rax
        .p2align 4,,7
.L5:
        leaq    12(%rsp), %rdx
        call    handle_overflow
.L4:
        movl    12(%rsp), %eax
        movq    16(%rsp), %rbx
        movq    24(%rsp), %rbp
        addq    $32, %rsp
        ret


-- 
           Summary: Inline asm should support limited control flow
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: inline-asm
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
@ 2009-05-12 16:05 ` pinskia at gcc dot gnu dot org
  2009-05-12 16:13 ` scovich at gmail dot com
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-05-12 16:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2009-05-12 16:05 -------
GCC can already produce good code for overflow catching on x86.  I think it is
better if you don't use inline-asm for this case at all and just write the
overflow detection in C90/C99 (which means don't cause an overflow).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
  2009-05-12 16:05 ` [Bug inline-asm/40124] " pinskia at gcc dot gnu dot org
@ 2009-05-12 16:13 ` scovich at gmail dot com
  2009-05-12 16:23 ` hjl dot tools at gmail dot com
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: scovich at gmail dot com @ 2009-05-12 16:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from scovich at gmail dot com  2009-05-12 16:13 -------
Overflow and adc were only examples. Other instructions that set cc, or other
conditions (e.g. parity) would not have that optimization.

Another use is the ability to jump out of an inline asm to handle an uncommon
case (if writing hand-tuned asm for speed, for example).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
  2009-05-12 16:05 ` [Bug inline-asm/40124] " pinskia at gcc dot gnu dot org
  2009-05-12 16:13 ` scovich at gmail dot com
@ 2009-05-12 16:23 ` hjl dot tools at gmail dot com
  2009-05-12 16:36 ` scovich at gmail dot com
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-05-12 16:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from hjl dot tools at gmail dot com  2009-05-12 16:23 -------
We are investigating if we need to add non-SSE intrinsics into gcc.
If you can show me an example where gcc can't generate similar
code automatically vs inline asm statement, I will look into if
we should add some appropriate intrinsics.


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl dot tools at gmail dot
                   |                            |com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (2 preceding siblings ...)
  2009-05-12 16:23 ` hjl dot tools at gmail dot com
@ 2009-05-12 16:36 ` scovich at gmail dot com
  2009-05-12 16:42 ` hjl dot tools at gmail dot com
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: scovich at gmail dot com @ 2009-05-12 16:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from scovich at gmail dot com  2009-05-12 16:36 -------
I'm actually running sparcv9-sun-solaris2.10 (the examples used x86 because
more people know it and its asm is easier to read).

My use case is the following: I'm implementing high-performance synchronization
primitives and the compiler isn't generating good enough code -- partly because
it doesn't pipeline spinloops, and partly because it has no way to know what
stuff is truly critical path and what just needs to happen "eventually."

Here's a basic idea of what I've been looking at:

long mcs_lock_acquire(mcs_lock* lock, mcs_qnode* me) {
 again:
    /* initialize qnode, etc */
    membar_producer();
    mcs_qnode* pred = atomic_swap(&lock->tail, me);
    if(pred) {
        pred->next = me;
        while(int flags=me->wait_flags) {
            if(flags & ERROR) {
                /* recovery code */
                goto again;
            }
        }
    }
    membar_enter();
    return (long) pred;
}

This code is absolutely performance-critical because every instruction on the
critical path delays O(N) other threads -- even a single extra load or store
causes noticeable delays. I was trying to rewrite just the while loop above in
asm to be more efficient, but it is hard because of that goto inside.
Basically, the error isn't going anywhere once it shows up, so we don't have to
check it nearly as often as the flags==0 case, and it can be interleaved across
as many loop iterations as needed to make its overhead disappear. Manually
unrolling and pipelining the loop helped a bit, but the compiler still tended
to cluster things together more than was strictly necessary (leading to bursts
of saturated pipeline alternating with slack).

For CC stuff, especially x86-related, I bet places like fftw and gmp are good
sources of frustration to mine.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (3 preceding siblings ...)
  2009-05-12 16:36 ` scovich at gmail dot com
@ 2009-05-12 16:42 ` hjl dot tools at gmail dot com
  2009-05-12 16:48 ` pinskia at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-05-12 16:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from hjl dot tools at gmail dot com  2009-05-12 16:42 -------
(In reply to comment #4)
> I'm actually running sparcv9-sun-solaris2.10 (the examples used x86 because
> more people know it and its asm is easier to read).
> 
> My use case is the following: I'm implementing high-performance synchronization
> primitives and the compiler isn't generating good enough code -- partly because
> it doesn't pipeline spinloops, and partly because it has no way to know what
> stuff is truly critical path and what just needs to happen "eventually."
> 

I think intrinsic is a better solution than inline asm statement for
this. If you find any x86 intrinsics are missing, please let me know.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (4 preceding siblings ...)
  2009-05-12 16:42 ` hjl dot tools at gmail dot com
@ 2009-05-12 16:48 ` pinskia at gcc dot gnu dot org
  2009-05-12 17:01 ` scovich at gmail dot com
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-05-12 16:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2009-05-12 16:48 -------
Sounds like you want to use __builtin_expect.

Also GCC uses heuristics to figure out:
    if(pred)
is taken most of the time because it is comparing a pointer to NULL.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (5 preceding siblings ...)
  2009-05-12 16:48 ` pinskia at gcc dot gnu dot org
@ 2009-05-12 17:01 ` scovich at gmail dot com
  2009-05-12 17:08 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: scovich at gmail dot com @ 2009-05-12 17:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from scovich at gmail dot com  2009-05-12 17:01 -------
Isn't __builtin_expect a way to send branch prediction hints? I'm not having
trouble with that AFAIK. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (6 preceding siblings ...)
  2009-05-12 17:01 ` scovich at gmail dot com
@ 2009-05-12 17:08 ` pinskia at gcc dot gnu dot org
  2009-05-13  7:56 ` scovich at gmail dot com
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-05-12 17:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from pinskia at gcc dot gnu dot org  2009-05-12 17:08 -------
(In reply to comment #7)
> Isn't __builtin_expect a way to send branch prediction hints?

Not just prediction hints, it helps other optimizations of not moving code
around inter block which seems like what you need.

Maybe it is better to file a bug about the code you want optimized and the way
you want it optimized instead of asking for another feature.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (7 preceding siblings ...)
  2009-05-12 17:08 ` pinskia at gcc dot gnu dot org
@ 2009-05-13  7:56 ` scovich at gmail dot com
  2009-05-13  8:19 ` jakub at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: scovich at gmail dot com @ 2009-05-13  7:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from scovich at gmail dot com  2009-05-13 07:55 -------
RE: __builtin_expect -- Thanks! It did help quite a bit, even though the
compiler was already emitting not-taken branch hints on its own.

RE: Filing bugs -- I have. This RFE arose out of Bug #40078, which was
triggered by attempts to work around Bug #40067. I still have some issues with
overconservative use of branch delay slots and possibly loop pipelining, but
haven't gotten to filing them yet. I've also filed other bugs in the past where
it would have been nice to work around using inline asm but control flow was a
pain.

In the end, is there any particular reason *not* to make inline asm easier to
use and more transparent to the compiler, given points #1 and #2? Invoking
point #3, what significant uses of computed gotos exist, other than to work
around switch statements that compile suboptimally? The docs don't mention any,
and yet we have them instead of (or in addition to) bug reports. 

I'd take a stab at implementing this myself -- it's probably a one-liner -- but
I've never hacked gcc before and have no clue where that one line might lurk. 

BTW, how does one exploit the compiler's overflow catching? I tried testing a+b
< a and a+b < b (for unsigned ints) with no luck, and there's no __builtin test
for overflow or carry. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (8 preceding siblings ...)
  2009-05-13  7:56 ` scovich at gmail dot com
@ 2009-05-13  8:19 ` jakub at gcc dot gnu dot org
  2009-05-13  9:51 ` scovich at gmail dot com
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-05-13  8:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from jakub at gcc dot gnu dot org  2009-05-13 08:19 -------
Of course there is a very important reason.  If you allow inline asms to change
control flow, even just to labels whose address has been taken through &&label,
you penalize a lot of code which doesn't change the control flow, as the
compiler will have to assume each inline asm which could possibly get at an
label address (not just directly, but through global variables, pointers etc.)
can jump to it.  That's an awful lot of extra flow graph edges and optimization
limitations.  Unless you'd invent some new syntax to say if an inline asm can
change control flow and by default assume it can't.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (9 preceding siblings ...)
  2009-05-13  8:19 ` jakub at gcc dot gnu dot org
@ 2009-05-13  9:51 ` scovich at gmail dot com
  2009-05-13  9:58 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: scovich at gmail dot com @ 2009-05-13  9:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from scovich at gmail dot com  2009-05-13 09:51 -------
> If you allow inline asms to change control flow, even just 
> to labels whose address has been taken through &&label, you 
> penalize a lot of code which doesn't change the control 
> flow, as the compiler will have to assume each inline asm 
> which could possibly get at an label address (not just 
> directly, but through global variables, pointers etc.) can
> jump to it.

I'm going to invoke #3 again to respond to these concerns:

a. This RFE is specifically limited to local control flow only, so the compiler
can safely ignore any label not in the asm's enclosing function, as well as
labels whose addresses are never taken (or provably never used). Computed gotos
appear to make the same assumptions, based on the docs' strong warning not
allow labels to leak out of their enclosing function in any way. 

b. While it's always possible that an asm could jump to a value loaded from an
arbitrary, dynamically-generated address, the same is true for computed gotos.
Either way, compiler analysis or not, doing so would almost certainly send you
to la-la land because label values aren't known until assembler time or later
and have no guaranteed relationship with each other. The only way to get a
valid label address is using one directly, or computing it with some sort of
base+(label-base). Either way requires taking the address of the desired label
at some point and tipping off the compiler.

c. It's pretty easy to write functions whose computed gotos defy static
analysis, but most of the time the compiler does pretty well. Well-written asm
blocks should access memory via "m" constraints -- which the compiler can
analyze -- rather than manually dereferencing a pointer passed in with an "r"
constraint. This is especially true for asm blocks with no internal control
flow, which this RFE encourages. 

d. If a code path is short/simple enough that incoming jumps penalize it
heavily (whether from computed gotos or jumps from asm), it's probably also
small enough that the compiler (or programmer, if need be) can duplicate it for
the short path. A big, ugly code path probably wouldn't even notice an extra
control flow arc or two.

In the end, a big goal of this RFE is to allow programmers to make the compiler
aware of control flow arcs they're already adding (or tempted to add) behind
its back. It therefore wouldn't strike me as much of a limitation if "jumps to
labels not explicitly passed to the asm are unsupported and may lead to
undefined behavior."


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (10 preceding siblings ...)
  2009-05-13  9:51 ` scovich at gmail dot com
@ 2009-05-13  9:58 ` rguenth at gcc dot gnu dot org
  2009-11-21  9:12 ` pinskia at gcc dot gnu dot org
  2009-11-21  9:13 ` pinskia at gcc dot gnu dot org
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-13  9:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from rguenth at gcc dot gnu dot org  2009-05-13 09:58 -------
I strongly oppose to making asm handling even more complex.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |WONTFIX


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (11 preceding siblings ...)
  2009-05-13  9:58 ` rguenth at gcc dot gnu dot org
@ 2009-11-21  9:12 ` pinskia at gcc dot gnu dot org
  2009-11-21  9:13 ` pinskia at gcc dot gnu dot org
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-21  9:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from pinskia at gcc dot gnu dot org  2009-11-21 09:12 -------
Reopening to ...


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|WONTFIX                     |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug inline-asm/40124] Inline asm should support limited control flow
  2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
                   ` (12 preceding siblings ...)
  2009-11-21  9:12 ` pinskia at gcc dot gnu dot org
@ 2009-11-21  9:13 ` pinskia at gcc dot gnu dot org
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-21  9:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from pinskia at gcc dot gnu dot org  2009-11-21 09:13 -------
Mark this as fixed for 4.5:
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended%20asm%20with%20gotoAs
of GCC version 4.5, asm goto may be used to have the assembly jump to one or
more C labels. In this form, a fifth section after the clobber list contains a
list of all C labels to which the assembly may jump. Each label operand is
implicitly self-named. The asm is also assumed to fall through to the next
statement.

This form of asm is restricted to not have outputs. This is due to a internal
restriction in the compiler that control transfer instructions cannot have
outputs. This restriction on asm goto may be lifted in some future version of
the compiler. In the mean time, asm goto may include a memory clobber, and so
leave outputs in memory.

     int frob(int x)
     {
       int y;
       asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
                 : : "r"(x), "r"(&y) : "r5", "memory" : error);
       return y;
      error:
       return -1;
     }
In this (inefficient) example, the frob instruction sets the carry bit to
indicate an error. The jc instruction detects this and branches to the error
label. Finally, the output of the frob instruction (%r5) is stored into the
memory for variable y, which is later read by the return statement.

     void doit(void)
     {
       int i = 0;
       asm goto ("mfsr %%r1, 123; jmp %%r1;"
                 ".pushsection doit_table;"
            ".long %l0, %l1, %l2, %l3;"
            ".popsection"
            : : : "r1" : label1, label2, label3, label4);
       __builtin_unreachable ();

      label1:
       f1();
       return;
      label2:
       f2();
       return;
      label3:
       i = 1;
      label4:
       f3(i);
     }
In this (also inefficient) example, the mfsr instruction reads an address from
some out-of-band machine register, and the following jmp instruction branches
to that address. The address read by the mfsr instruction is assumed to have
been previously set via some application-specific mechanism to be one of the
four values stored in the doit_table section. Finally, the asm is followed by a
call to __builtin_unreachable to indicate that the asm does not in fact fall
through.

     #define TRACE1(NUM)                         \
       do {                                      \
         asm goto ("0: nop;"                     \
                   ".pushsection trace_table;"   \
                   ".long 0b, %l0;"              \
                   ".popsection"                 \
                   : : : : trace#NUM);           \
         if (0) { trace#NUM: trace(); }          \
       } while (0)
     #define TRACE  TRACE1(__COUNTER__)
In this example (which in fact inspired the asm goto feature) we want on rare
occasions to call the trace function; on other occasions we'd like to keep the
overhead to the absolute minimum. The normal code path consists of a single nop
instruction. However, we record the address of this nop together with the
address of a label that calls the trace function. This allows the nop
instruction to be patched at runtime to be an unconditional branch to the
stored label. It is assumed that an optimizing compiler will move the labeled
block out of line, to optimize the fall through path from the asm.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-11-21  9:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-12 16:01 [Bug inline-asm/40124] New: Inline asm should support limited control flow scovich at gmail dot com
2009-05-12 16:05 ` [Bug inline-asm/40124] " pinskia at gcc dot gnu dot org
2009-05-12 16:13 ` scovich at gmail dot com
2009-05-12 16:23 ` hjl dot tools at gmail dot com
2009-05-12 16:36 ` scovich at gmail dot com
2009-05-12 16:42 ` hjl dot tools at gmail dot com
2009-05-12 16:48 ` pinskia at gcc dot gnu dot org
2009-05-12 17:01 ` scovich at gmail dot com
2009-05-12 17:08 ` pinskia at gcc dot gnu dot org
2009-05-13  7:56 ` scovich at gmail dot com
2009-05-13  8:19 ` jakub at gcc dot gnu dot org
2009-05-13  9:51 ` scovich at gmail dot com
2009-05-13  9:58 ` rguenth at gcc dot gnu dot org
2009-11-21  9:12 ` pinskia at gcc dot gnu dot org
2009-11-21  9:13 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).