From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24540 invoked by alias); 26 Dec 2009 22:29:35 -0000 Received: (qmail 24529 invoked by uid 22791); 26 Dec 2009 22:29:34 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-fx0-f161.google.com (HELO mail-fx0-f161.google.com) (209.85.220.161) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 26 Dec 2009 22:29:29 +0000 Received: by fxm1 with SMTP id 1so3735806fxm.16 for ; Sat, 26 Dec 2009 14:29:27 -0800 (PST) Received: by 10.103.126.21 with SMTP id d21mr750847mun.47.1261866565870; Sat, 26 Dec 2009 14:29:25 -0800 (PST) Received: from godel (m83-188-0-173.cust.tele2.ru [83.188.0.173]) by mx.google.com with ESMTPS id j2sm12510780mue.5.2009.12.26.14.29.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 26 Dec 2009 14:29:25 -0800 (PST) Date: Sat, 26 Dec 2009 23:06:00 -0000 From: Dan Kruchinin To: gcc-help@gcc.gnu.org Subject: [BUG?] GCC inline assembler optimization issue Message-ID: <20091227005655.560be13d@godel> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2009-12/txt/msg00344.txt.bz2 Hi, list. I'm not sure if I found out a bug or I just made a mistake in gcc inline assembler. I think that more probably it's a bug, because my code works well when I don't use compile time optimization. I have the following code for x86_64 target: === #include #define SPIN_LOCKED_V 1 #define SPIN_UNLOCKED_V 0 struct spin { unsigned long val; }; static inline void spin_lock(struct spin *lock) { __asm__ volatile ("movq %2, %%rax\n\t" "1: lock cmpxchg %1, %0\n\t" "cmpq %2, %%rax\n\t" "jnz 1b\n" : "+m" (lock->val) : "r" ((volatile long)SPIN_LOCKED_V), "rI" ((volatile long)SPIN_UNLOCKED_V) : "%rax", "memory"); } static inline void spin_unlock(struct spin *lock) { __asm__ volatile ("lock xchgq %1, %0\n" : "+m" (lock->val) : "r" ((volatile long)SPIN_UNLOCKED_V)); } int main(void) { struct spin spin; int i = 0; spin.val = SPIN_UNLOCKED_V; for (i = 0; i < 10; i++) { printf("[0][UNLOCKED] spin_val = %#x\n", spin.val); spin_lock(&spin); printf("[1][LOCKED] spin_val = %#x\n", spin.val); spin_unlock(&spin); printf("[2][UNLOCKED] spin_val = %#x\n", spin.val); } return 0; } === The code above works very well when I compile it with -O0 or -g options, but when I ask gcc to optimize this usign -O1 or -O2 options, my application hangs after first iteration. I have the following output: [0][UNLOCKED] spin_val = 0 [1][LOCKED] spin_val = 0x1 [2][UNLOCKED] spin_val = 0 [0][UNLOCKED] spin_val = 0 [1][LOCKED] spin_val = 0x1 [2][UNLOCKED] spin_val = 0x1 [0][UNLOCKED] spin_val = 0x1 It hangs because gcc generates the following assembler code: === 4004f0: 41 54 push %r12 4004f2: 41 bc 01 00 00 00 mov $0x1,%r12d 4004f8: 55 push %rbp 4004f9: 31 ed xor %ebp,%ebp 4004fb: 53 push %rbx 4004fc: 31 db xor %ebx,%ebx 4004fe: 48 83 ec 10 sub $0x10,%rsp 400502: 48 c7 04 24 00 00 00 movq $0x0,(%rsp) 400509: 00 40050a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 400510: 48 8b 34 24 mov (%rsp),%rsi // <- Here is my for(i = 0; i < 10; i++) loop 400514: bf 5c 06 40 00 mov $0x40065c,%edi 400519: 31 c0 xor %eax,%eax 40051b: e8 c0 fe ff ff callq 4003e0 400520: 48 c7 c0 00 00 00 00 mov $0x0,%rax 400527: f0 4c 0f b1 24 24 lock cmpxchg %r12,(%rsp) 40052d: 48 83 f8 00 cmp $0x0,%rax 400531: 75 f4 jne 400527 400533: 48 8b 34 24 mov (%rsp),%rsi 400537: bf 7a 06 40 00 mov $0x40067a,%edi 40053c: 31 c0 xor %eax,%eax 40053e: e8 9d fe ff ff callq 4003e0 400543: f0 48 87 2c 24 lock xchg %rbp,(%rsp) 400548: 48 8b 34 24 mov (%rsp),%rsi 40054c: 31 c0 xor %eax,%eax 40054e: bf 96 06 40 00 mov $0x400696,%edi 400553: 83 c3 01 add $0x1,%ebx 400556: e8 85 fe ff ff callq 4003e0 40055b: 83 fb 0a cmp $0xa,%ebx 40055e: 75 b0 jne 400510 // <- And here it ends === As you can see, gcc initializes registers %r12 and %rbp that are used by spin_lock and spin_unlock functions respectively only once before entering to the loop. Looking at this assembler code I can say that after my spinlock has been locked once it never be unlocked because %r12 and %rbp aren't reinitialized after each iteration by values I explicitly described in spin_lock and spin_unlock functions. Is it ubnormal behavior of compiler or I made a mistake in gcc inline assembler? Extra information: 1) GCC: I tried both 4.3.4 and 3.4.6 1.1) gcc-3.4 --version gcc-3.4 (GCC) 3.4.6 (Debian 3.4.6-10) 1.2) gcc --version gcc (Debian 4.3.4-6) 4.3.4 2) uname -a Linux godel 2.6.30-2-amd64 #1 SMP Mon Dec 7 05:21:45 UTC 2009 x86_64 GNU/Linux 3) cat /proc/cpuinfo | grep 'model ' | head -1 model name : Intel(R) Core(TM)2 Duo CPU L7500 @ 1.60GH Kind regards.