* Incorrect optimized (-O2) linked list code with 4.3.2
@ 2011-09-12 8:10 pavan tc
2011-09-23 19:56 ` Richard Guenther
0 siblings, 1 reply; 2+ messages in thread
From: pavan tc @ 2011-09-12 8:10 UTC (permalink / raw)
To: gcc
Hi,
I would like to know if there have been issues with optimized linked
list code with GCC 4.3.2. [optiimization flag : -O2]
The following is the inlined code that has the problem:
static inline void
list_add_tail (struct list_head *new, struct list_head *head)
{
new->next = head;
new->prev = head->prev;
new->prev->next = new;
new->next->prev = new;
}
The above code has been used in the loop as below:
pool = GF_CALLOC (count, padded_sizeof_type, gf_common_mt_long);
if (!pool) {
GF_FREE (mem_pool);
return NULL;
}
for (i = 0; i < count; i++) {
list = pool + (i * (padded_sizeof_type));
INIT_LIST_HEAD (list);
list_add_tail (list, &mem_pool->list);
^^^^^^^^^^^^
}
'&mem_pool-> list' is used as the list head. mem_pool is a pointer to type :
struct mem_pool {
struct list_head list;
int hot_count;
int cold_count;
gf_lock_t lock;
unsigned long padded_sizeof_type;
void *pool;
void *pool_end;
int real_sizeof_type;
};
'list' is the new member being added to the tail of the list pointed to by head.
It is a pointer to type:
struct list_head {
struct list_head *next;
struct list_head *prev;
};
The generated assembly for the loop (with the linined list_add_tail())
is as below:
40a1c: e8 0f 03 fd ff callq 10d30 <__gf_calloc@plt>
40a21: 48 85 c0 test %rax,%rax
40a24: 48 89 c7 mov %rax,%rdi
40a27: 0f 84 bf 00 00 00 je 40aec <mem_pool_new_fn+0x14c>
40a2d: 48 8b 73 08 mov 0x8(%rbx),%rsi
40a31: 4d 8d 44 24 01 lea 0x1(%r12),%r8
40a36: 31 c0 xor %eax,%eax
40a38: b9 01 00 00 00 mov $0x1,%ecx
40a3d: 0f 1f 00 nopl (%rax)
40a40: 49 0f af c5 imul %r13,%rax
<=== loop start
40a44: 48 8d 04 07 lea (%rdi,%rax,1),%rax
40a48: 48 89 18 mov %rbx,(%rax)
# list->next = head
40a4b: 48 89 06 mov %rax,(%rsi)
# <!!> head->prev->next = list
40a4e: 48 8b 10 mov (%rax),%rdx
# rdx holds list->next
40a51: 48 89 70 08 mov %rsi,0x8(%rax) #
list->prev = head->prev;
40a55: 48 89 42 08 mov %rax,0x8(%rdx) #
list->next->prev = list
40a59: 48 89 c8 mov %rcx,%rax
40a5c: 48 83 c1 01 add $0x1,%rcx
40a60: 4c 39 c1 cmp %r8,%rcx
40a63: 75 db jne 40a40 <mem_pool_new_fn+0xa0>
In the assembly above, %rbx holds the address of 'head'.
%rsi holds the value of head->prev. This is assigned outside the loop and the
compiler classifies it as a loop invariant, which is where, I think,
the problem is.
This line of code should have been inside the loop.
<!!> - %rsi still holds the value of head->prev that was assigned
outside the loop.
The following experiments eliminate the problem:
1. Using 'volatile' on the address that 'head' points to.
2. Using a function call (logging calls, for example) inside the loop.
3. Using the direct libc calloc instead of the GF_CALLOC.
[GF_CALLOC does some accounting when accounting is enabled. Calls vanilla
libc calloc() otherwise].
So, anything that necessitates a different usage of %rsi seems to be correcting
the behaviour.
4. Using gcc 4.4.3 [ The obvious solution would then be to use 4.4.3,
but I would
like to understand if this is a known problem with 4.3.2. Small
programs written to
emulate this problem do not exhibit the erroneous behaviour.]
Please let me know if any more details about this behaviour are required.
I'll be glad to provide them.
TIA,
Pavan
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Incorrect optimized (-O2) linked list code with 4.3.2
2011-09-12 8:10 Incorrect optimized (-O2) linked list code with 4.3.2 pavan tc
@ 2011-09-23 19:56 ` Richard Guenther
0 siblings, 0 replies; 2+ messages in thread
From: Richard Guenther @ 2011-09-23 19:56 UTC (permalink / raw)
To: pavan tc; +Cc: gcc
On Mon, Sep 12, 2011 at 10:10 AM, pavan tc <pavan.tc@gmail.com> wrote:
> Hi,
>
> I would like to know if there have been issues with optimized linked
> list code with GCC 4.3.2. [optiimization flag : -O2]
>
> The following is the inlined code that has the problem:
>
> static inline void
> list_add_tail (struct list_head *new, struct list_head *head)
> {
> new->next = head;
> new->prev = head->prev;
>
> new->prev->next = new;
> new->next->prev = new;
> }
>
> The above code has been used in the loop as below:
>
> pool = GF_CALLOC (count, padded_sizeof_type, gf_common_mt_long);
> if (!pool) {
> GF_FREE (mem_pool);
> return NULL;
> }
>
> for (i = 0; i < count; i++) {
> list = pool + (i * (padded_sizeof_type));
> INIT_LIST_HEAD (list);
> list_add_tail (list, &mem_pool->list);
> ^^^^^^^^^^^^
> }
>
> '&mem_pool-> list' is used as the list head. mem_pool is a pointer to type :
> struct mem_pool {
> struct list_head list;
> int hot_count;
> int cold_count;
> gf_lock_t lock;
> unsigned long padded_sizeof_type;
> void *pool;
> void *pool_end;
> int real_sizeof_type;
> };
>
> 'list' is the new member being added to the tail of the list pointed to by head.
> It is a pointer to type:
> struct list_head {
> struct list_head *next;
> struct list_head *prev;
> };
>
> The generated assembly for the loop (with the linined list_add_tail())
> is as below:
>
> 40a1c: e8 0f 03 fd ff callq 10d30 <__gf_calloc@plt>
> 40a21: 48 85 c0 test %rax,%rax
> 40a24: 48 89 c7 mov %rax,%rdi
> 40a27: 0f 84 bf 00 00 00 je 40aec <mem_pool_new_fn+0x14c>
> 40a2d: 48 8b 73 08 mov 0x8(%rbx),%rsi
> 40a31: 4d 8d 44 24 01 lea 0x1(%r12),%r8
> 40a36: 31 c0 xor %eax,%eax
> 40a38: b9 01 00 00 00 mov $0x1,%ecx
> 40a3d: 0f 1f 00 nopl (%rax)
> 40a40: 49 0f af c5 imul %r13,%rax
> <=== loop start
> 40a44: 48 8d 04 07 lea (%rdi,%rax,1),%rax
> 40a48: 48 89 18 mov %rbx,(%rax)
> # list->next = head
> 40a4b: 48 89 06 mov %rax,(%rsi)
> # <!!> head->prev->next = list
> 40a4e: 48 8b 10 mov (%rax),%rdx
> # rdx holds list->next
> 40a51: 48 89 70 08 mov %rsi,0x8(%rax) #
> list->prev = head->prev;
> 40a55: 48 89 42 08 mov %rax,0x8(%rdx) #
> list->next->prev = list
> 40a59: 48 89 c8 mov %rcx,%rax
> 40a5c: 48 83 c1 01 add $0x1,%rcx
> 40a60: 4c 39 c1 cmp %r8,%rcx
> 40a63: 75 db jne 40a40 <mem_pool_new_fn+0xa0>
>
> In the assembly above, %rbx holds the address of 'head'.
> %rsi holds the value of head->prev. This is assigned outside the loop and the
> compiler classifies it as a loop invariant, which is where, I think,
> the problem is.
> This line of code should have been inside the loop.
> <!!> - %rsi still holds the value of head->prev that was assigned
> outside the loop.
>
> The following experiments eliminate the problem:
>
> 1. Using 'volatile' on the address that 'head' points to.
> 2. Using a function call (logging calls, for example) inside the loop.
> 3. Using the direct libc calloc instead of the GF_CALLOC.
> [GF_CALLOC does some accounting when accounting is enabled. Calls vanilla
> libc calloc() otherwise].
>
> So, anything that necessitates a different usage of %rsi seems to be correcting
> the behaviour.
>
> 4. Using gcc 4.4.3 [ The obvious solution would then be to use 4.4.3,
> but I would
> like to understand if this is a known problem with 4.3.2. Small
> programs written to
> emulate this problem do not exhibit the erroneous behaviour.]
>
> Please let me know if any more details about this behaviour are required.
> I'll be glad to provide them.
Use -fno-strict-aliasing. Your code invokes undefined behavior.
> TIA,
> Pavan
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-09-23 19:56 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-12 8:10 Incorrect optimized (-O2) linked list code with 4.3.2 pavan tc
2011-09-23 19:56 ` Richard Guenther
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).