From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31981 invoked by alias); 12 Sep 2011 08:10:58 -0000 Received: (qmail 31970 invoked by uid 22791); 12 Sep 2011 08:10:56 -0000 X-SWARE-Spam-Status: No, hits=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 12 Sep 2011 08:10:35 +0000 Received: by wyf19 with SMTP id 19so3258538wyf.20 for ; Mon, 12 Sep 2011 01:10:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.131.67 with SMTP id l45mr860771wei.26.1315815034310; Mon, 12 Sep 2011 01:10:34 -0700 (PDT) Received: by 10.216.27.8 with HTTP; Mon, 12 Sep 2011 01:10:34 -0700 (PDT) Date: Mon, 12 Sep 2011 08:10:00 -0000 Message-ID: Subject: Incorrect optimized (-O2) linked list code with 4.3.2 From: pavan tc To: gcc@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-09/txt/msg00120.txt.bz2 Hi, I would like to know if there have been issues with optimized linked list code with GCC 4.3.2. [optiimization flag : -O2] The following is the inlined code that has the problem: static inline void list_add_tail (struct list_head *new, struct list_head *head) { new->next = head; new->prev = head->prev; new->prev->next = new; new->next->prev = new; } The above code has been used in the loop as below: pool = GF_CALLOC (count, padded_sizeof_type, gf_common_mt_long); if (!pool) { GF_FREE (mem_pool); return NULL; } for (i = 0; i < count; i++) { list = pool + (i * (padded_sizeof_type)); INIT_LIST_HEAD (list); list_add_tail (list, &mem_pool->list); ^^^^^^^^^^^^ } '&mem_pool-> list' is used as the list head. mem_pool is a pointer to type : struct mem_pool { struct list_head list; int hot_count; int cold_count; gf_lock_t lock; unsigned long padded_sizeof_type; void *pool; void *pool_end; int real_sizeof_type; }; 'list' is the new member being added to the tail of the list pointed to by head. It is a pointer to type: struct list_head { struct list_head *next; struct list_head *prev; }; The generated assembly for the loop (with the linined list_add_tail()) is as below: 40a1c: e8 0f 03 fd ff callq 10d30 <__gf_calloc@plt> 40a21: 48 85 c0 test %rax,%rax 40a24: 48 89 c7 mov %rax,%rdi 40a27: 0f 84 bf 00 00 00 je 40aec 40a2d: 48 8b 73 08 mov 0x8(%rbx),%rsi 40a31: 4d 8d 44 24 01 lea 0x1(%r12),%r8 40a36: 31 c0 xor %eax,%eax 40a38: b9 01 00 00 00 mov $0x1,%ecx 40a3d: 0f 1f 00 nopl (%rax) 40a40: 49 0f af c5 imul %r13,%rax <=== loop start 40a44: 48 8d 04 07 lea (%rdi,%rax,1),%rax 40a48: 48 89 18 mov %rbx,(%rax) # list->next = head 40a4b: 48 89 06 mov %rax,(%rsi) # head->prev->next = list 40a4e: 48 8b 10 mov (%rax),%rdx # rdx holds list->next 40a51: 48 89 70 08 mov %rsi,0x8(%rax) # list->prev = head->prev; 40a55: 48 89 42 08 mov %rax,0x8(%rdx) # list->next->prev = list 40a59: 48 89 c8 mov %rcx,%rax 40a5c: 48 83 c1 01 add $0x1,%rcx 40a60: 4c 39 c1 cmp %r8,%rcx 40a63: 75 db jne 40a40 In the assembly above, %rbx holds the address of 'head'. %rsi holds the value of head->prev. This is assigned outside the loop and the compiler classifies it as a loop invariant, which is where, I think, the problem is. This line of code should have been inside the loop. - %rsi still holds the value of head->prev that was assigned outside the loop. The following experiments eliminate the problem: 1. Using 'volatile' on the address that 'head' points to. 2. Using a function call (logging calls, for example) inside the loop. 3. Using the direct libc calloc instead of the GF_CALLOC. [GF_CALLOC does some accounting when accounting is enabled. Calls vanilla libc calloc() otherwise]. So, anything that necessitates a different usage of %rsi seems to be correcting the behaviour. 4. Using gcc 4.4.3 [ The obvious solution would then be to use 4.4.3, but I would like to understand if this is a known problem with 4.3.2. Small programs written to emulate this problem do not exhibit the erroneous behaviour.] Please let me know if any more details about this behaviour are required. I'll be glad to provide them. TIA, Pavan