From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-39657-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24540 invoked by alias); 26 Dec 2009 22:29:35 -0000
Received: (qmail 24529 invoked by uid 22791); 26 Dec 2009 22:29:34 -0000
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 	tests=BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail-fx0-f161.google.com (HELO mail-fx0-f161.google.com) (209.85.220.161)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 26 Dec 2009 22:29:29 +0000
Received: by fxm1 with SMTP id 1so3735806fxm.16         for <gcc-help@gcc.gnu.org>; Sat, 26 Dec 2009 14:29:27 -0800 (PST)
Received: by 10.103.126.21 with SMTP id d21mr750847mun.47.1261866565870;         Sat, 26 Dec 2009 14:29:25 -0800 (PST)
Received: from godel (m83-188-0-173.cust.tele2.ru [83.188.0.173])         by mx.google.com with ESMTPS id j2sm12510780mue.5.2009.12.26.14.29.21         (version=TLSv1/SSLv3 cipher=RC4-MD5);         Sat, 26 Dec 2009 14:29:25 -0800 (PST)
Date: Sat, 26 Dec 2009 23:06:00 -0000
From: Dan Kruchinin <dan.kruchinin@gmail.com>
To: gcc-help@gcc.gnu.org
Subject: [BUG?] GCC inline assembler optimization issue
Message-ID: <20091227005655.560be13d@godel>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-help.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2009-12/txt/msg00344.txt.bz2

Hi, list.

I'm not sure if I found out a bug or I just made a mistake in gcc inline assembler.
I think that more probably it's a bug, because my code works well when I don't use compile time optimization.
I have the following code for x86_64 target:
===
#include <stdio.h>

#define SPIN_LOCKED_V   1
#define SPIN_UNLOCKED_V 0

struct spin {
    unsigned long val;
};

static inline void spin_lock(struct spin *lock)
{
    __asm__ volatile ("movq %2, %%rax\n\t"
                      "1: lock cmpxchg %1, %0\n\t"
                      "cmpq %2, %%rax\n\t"
                      "jnz 1b\n"
                      : "+m" (lock->val)
                      : "r" ((volatile long)SPIN_LOCKED_V),
                        "rI" ((volatile long)SPIN_UNLOCKED_V)
                      : "%rax", "memory");
}

static inline void spin_unlock(struct spin *lock)
{
  __asm__ volatile ("lock xchgq %1, %0\n"
                    : "+m" (lock->val)
                    : "r" ((volatile long)SPIN_UNLOCKED_V));
}

int main(void)
{
    struct spin spin;
    int i = 0;

    spin.val = SPIN_UNLOCKED_V;
    for (i = 0; i < 10; i++) {
        printf("[0][UNLOCKED] spin_val = %#x\n", spin.val);
        spin_lock(&spin);
        printf("[1][LOCKED] spin_val = %#x\n", spin.val);
        spin_unlock(&spin);
        printf("[2][UNLOCKED] spin_val = %#x\n", spin.val);
    }

    return 0;
}
===

The code above works very well when I compile it with -O0 or -g options, but when I ask gcc to optimize this usign -O1 or -O2 options,
my application hangs after first iteration.
I have the following output:
[0][UNLOCKED] spin_val = 0
[1][LOCKED] spin_val = 0x1
[2][UNLOCKED] spin_val = 0
[0][UNLOCKED] spin_val = 0
[1][LOCKED] spin_val = 0x1
[2][UNLOCKED] spin_val = 0x1
[0][UNLOCKED] spin_val = 0x1

It hangs because gcc generates the following assembler code:
===
  4004f0:       41 54                   push   %r12
  4004f2:       41 bc 01 00 00 00       mov    $0x1,%r12d
  4004f8:       55                      push   %rbp
  4004f9:       31 ed                   xor    %ebp,%ebp
  4004fb:       53                      push   %rbx
  4004fc:       31 db                   xor    %ebx,%ebx
  4004fe:       48 83 ec 10             sub    $0x10,%rsp
  400502:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  400509:       00 
  40050a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  400510:       48 8b 34 24             mov    (%rsp),%rsi        // <- Here is my for(i = 0; i < 10; i++) loop
  400514:       bf 5c 06 40 00          mov    $0x40065c,%edi
  400519:       31 c0                   xor    %eax,%eax
  40051b:       e8 c0 fe ff ff          callq  4003e0 <printf@plt>
  400520:       48 c7 c0 00 00 00 00    mov    $0x0,%rax
  400527:       f0 4c 0f b1 24 24       lock cmpxchg %r12,(%rsp)
  40052d:       48 83 f8 00             cmp    $0x0,%rax
  400531:       75 f4                   jne    400527 <main+0x37>
  400533:       48 8b 34 24             mov    (%rsp),%rsi
  400537:       bf 7a 06 40 00          mov    $0x40067a,%edi
  40053c:       31 c0                   xor    %eax,%eax
  40053e:       e8 9d fe ff ff          callq  4003e0 <printf@plt>
  400543:       f0 48 87 2c 24          lock xchg %rbp,(%rsp)
  400548:       48 8b 34 24             mov    (%rsp),%rsi
  40054c:       31 c0                   xor    %eax,%eax
  40054e:       bf 96 06 40 00          mov    $0x400696,%edi
  400553:       83 c3 01                add    $0x1,%ebx
  400556:       e8 85 fe ff ff          callq  4003e0 <printf@plt>
  40055b:       83 fb 0a                cmp    $0xa,%ebx
  40055e:       75 b0                   jne    400510 <main+0x20> // <- And here it ends
===

As you can see, gcc initializes registers %r12 and %rbp that are used by spin_lock and spin_unlock functions respectively 
only once before entering to the loop. Looking at this assembler code I can say that after my spinlock has been locked once it never be unlocked
because %r12 and %rbp aren't reinitialized after each iteration by values I explicitly described in spin_lock and spin_unlock functions.
Is it ubnormal behavior of compiler or I made a mistake in gcc inline assembler?

Extra information:
1) GCC: I tried both 4.3.4 and 3.4.6
1.1) gcc-3.4 --version
gcc-3.4 (GCC) 3.4.6 (Debian 3.4.6-10)
1.2) gcc --version
gcc (Debian 4.3.4-6) 4.3.4

2) uname -a
Linux godel 2.6.30-2-amd64 #1 SMP Mon Dec 7 05:21:45 UTC 2009 x86_64 GNU/Linux 

3) cat /proc/cpuinfo | grep 'model ' | head -1
model name	: Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GH

Kind regards.