public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [BUG?] GCC inline assembler optimization issue
@ 2009-12-27  0:28 Dan Kruchinin
  2009-12-27  0:32 ` Andrew Haley
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Kruchinin @ 2009-12-27  0:28 UTC (permalink / raw)
  To: gcc-help

> You need a memory clobber here.
> You need to add a clobber to Operand 1.

I added "memory" to the clobber list of spin_unlock, but it doesn't help.
I still have the same problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG?] GCC inline assembler optimization issue
  2009-12-27  0:28 [BUG?] GCC inline assembler optimization issue Dan Kruchinin
@ 2009-12-27  0:32 ` Andrew Haley
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Haley @ 2009-12-27  0:32 UTC (permalink / raw)
  To: gcc-help

On 12/26/2009 11:11 PM, Dan Kruchinin wrote:
>> You need a memory clobber here.
>> You need to add a clobber to Operand 1.
> 
> I added "memory" to the clobber list of spin_unlock, but it doesn't help.
> I still have the same problem.

You need to add a clobber to Operand 1 too.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG?] GCC inline assembler optimization issue
  2009-12-27  0:28 Dan Kruchinin
@ 2009-12-28  3:22 ` Andrew Haley
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Haley @ 2009-12-28  3:22 UTC (permalink / raw)
  To: gcc-help

On 12/26/2009 11:56 PM, Dan Kruchinin wrote:
> As I wrote earlier, memory clobber didn't help, but there is an interesting thing:
> I replaced description of the third argument of spin_lock function from "rI" to "r" and it fixed my problem.
> AFAIK the only one thing "I" does and that is describing my argument as a constant in range 0 - 31.
> I don't understand why it so radically changes generated code.

It doesn't.  Your problem is the missing clobber on Operand 1
in the asm in spin_unlock.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG?] GCC inline assembler optimization issue
  2009-12-26 23:06 Dan Kruchinin
@ 2009-12-26 23:43 ` Andrew Haley
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Haley @ 2009-12-26 23:43 UTC (permalink / raw)
  To: gcc-help

On 12/26/2009 09:56 PM, Dan Kruchinin wrote:

> I'm not sure if I found out a bug or I just made a mistake in gcc inline assembler.
> I think that more probably it's a bug, because my code works well when I don't use compile time optimization.
> I have the following code for x86_64 target:
> ===
> #include <stdio.h>
> 
> #define SPIN_LOCKED_V   1
> #define SPIN_UNLOCKED_V 0
> 
> struct spin {
>     unsigned long val;
> };
> 
> static inline void spin_lock(struct spin *lock)
> {
>     __asm__ volatile ("movq %2, %%rax\n\t"
>                       "1: lock cmpxchg %1, %0\n\t"
>                       "cmpq %2, %%rax\n\t"
>                       "jnz 1b\n"
>                       : "+m" (lock->val)
>                       : "r" ((volatile long)SPIN_LOCKED_V),
>                         "rI" ((volatile long)SPIN_UNLOCKED_V)
>                       : "%rax", "memory");
> }

> static inline void spin_unlock(struct spin *lock)
> {
>   __asm__ volatile ("lock xchgq %1, %0\n"
>                     : "+m" (lock->val)
>                     : "r" ((volatile long)SPIN_UNLOCKED_V));
> }

You need a memory clobber here.
You need to add a clobber to Operand 1.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [BUG?] GCC inline assembler optimization issue
@ 2009-12-26 23:06 Dan Kruchinin
  2009-12-26 23:43 ` Andrew Haley
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Kruchinin @ 2009-12-26 23:06 UTC (permalink / raw)
  To: gcc-help

Hi, list.

I'm not sure if I found out a bug or I just made a mistake in gcc inline assembler.
I think that more probably it's a bug, because my code works well when I don't use compile time optimization.
I have the following code for x86_64 target:
===
#include <stdio.h>

#define SPIN_LOCKED_V   1
#define SPIN_UNLOCKED_V 0

struct spin {
    unsigned long val;
};

static inline void spin_lock(struct spin *lock)
{
    __asm__ volatile ("movq %2, %%rax\n\t"
                      "1: lock cmpxchg %1, %0\n\t"
                      "cmpq %2, %%rax\n\t"
                      "jnz 1b\n"
                      : "+m" (lock->val)
                      : "r" ((volatile long)SPIN_LOCKED_V),
                        "rI" ((volatile long)SPIN_UNLOCKED_V)
                      : "%rax", "memory");
}

static inline void spin_unlock(struct spin *lock)
{
  __asm__ volatile ("lock xchgq %1, %0\n"
                    : "+m" (lock->val)
                    : "r" ((volatile long)SPIN_UNLOCKED_V));
}

int main(void)
{
    struct spin spin;
    int i = 0;

    spin.val = SPIN_UNLOCKED_V;
    for (i = 0; i < 10; i++) {
        printf("[0][UNLOCKED] spin_val = %#x\n", spin.val);
        spin_lock(&spin);
        printf("[1][LOCKED] spin_val = %#x\n", spin.val);
        spin_unlock(&spin);
        printf("[2][UNLOCKED] spin_val = %#x\n", spin.val);
    }

    return 0;
}
===

The code above works very well when I compile it with -O0 or -g options, but when I ask gcc to optimize this usign -O1 or -O2 options,
my application hangs after first iteration.
I have the following output:
[0][UNLOCKED] spin_val = 0
[1][LOCKED] spin_val = 0x1
[2][UNLOCKED] spin_val = 0
[0][UNLOCKED] spin_val = 0
[1][LOCKED] spin_val = 0x1
[2][UNLOCKED] spin_val = 0x1
[0][UNLOCKED] spin_val = 0x1

It hangs because gcc generates the following assembler code:
===
  4004f0:       41 54                   push   %r12
  4004f2:       41 bc 01 00 00 00       mov    $0x1,%r12d
  4004f8:       55                      push   %rbp
  4004f9:       31 ed                   xor    %ebp,%ebp
  4004fb:       53                      push   %rbx
  4004fc:       31 db                   xor    %ebx,%ebx
  4004fe:       48 83 ec 10             sub    $0x10,%rsp
  400502:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  400509:       00 
  40050a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  400510:       48 8b 34 24             mov    (%rsp),%rsi        // <- Here is my for(i = 0; i < 10; i++) loop
  400514:       bf 5c 06 40 00          mov    $0x40065c,%edi
  400519:       31 c0                   xor    %eax,%eax
  40051b:       e8 c0 fe ff ff          callq  4003e0 <printf@plt>
  400520:       48 c7 c0 00 00 00 00    mov    $0x0,%rax
  400527:       f0 4c 0f b1 24 24       lock cmpxchg %r12,(%rsp)
  40052d:       48 83 f8 00             cmp    $0x0,%rax
  400531:       75 f4                   jne    400527 <main+0x37>
  400533:       48 8b 34 24             mov    (%rsp),%rsi
  400537:       bf 7a 06 40 00          mov    $0x40067a,%edi
  40053c:       31 c0                   xor    %eax,%eax
  40053e:       e8 9d fe ff ff          callq  4003e0 <printf@plt>
  400543:       f0 48 87 2c 24          lock xchg %rbp,(%rsp)
  400548:       48 8b 34 24             mov    (%rsp),%rsi
  40054c:       31 c0                   xor    %eax,%eax
  40054e:       bf 96 06 40 00          mov    $0x400696,%edi
  400553:       83 c3 01                add    $0x1,%ebx
  400556:       e8 85 fe ff ff          callq  4003e0 <printf@plt>
  40055b:       83 fb 0a                cmp    $0xa,%ebx
  40055e:       75 b0                   jne    400510 <main+0x20> // <- And here it ends
===

As you can see, gcc initializes registers %r12 and %rbp that are used by spin_lock and spin_unlock functions respectively 
only once before entering to the loop. Looking at this assembler code I can say that after my spinlock has been locked once it never be unlocked
because %r12 and %rbp aren't reinitialized after each iteration by values I explicitly described in spin_lock and spin_unlock functions.
Is it ubnormal behavior of compiler or I made a mistake in gcc inline assembler?

Extra information:
1) GCC: I tried both 4.3.4 and 3.4.6
1.1) gcc-3.4 --version
gcc-3.4 (GCC) 3.4.6 (Debian 3.4.6-10)
1.2) gcc --version
gcc (Debian 4.3.4-6) 4.3.4

2) uname -a
Linux godel 2.6.30-2-amd64 #1 SMP Mon Dec 7 05:21:45 UTC 2009 x86_64 GNU/Linux 

3) cat /proc/cpuinfo | grep 'model ' | head -1
model name	: Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GH

Kind regards.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-27  0:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-27  0:28 [BUG?] GCC inline assembler optimization issue Dan Kruchinin
2009-12-27  0:32 ` Andrew Haley
  -- strict thread matches above, loose matches on Subject: below --
2009-12-27  0:28 Dan Kruchinin
2009-12-28  3:22 ` Andrew Haley
2009-12-26 23:06 Dan Kruchinin
2009-12-26 23:43 ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).