From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-170313-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 17298 invoked by alias); 23 Sep 2011 19:56:52 -0000
Received: (qmail 17289 invoked by uid 22791); 23 Sep 2011 19:56:51 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from mail-yx0-f175.google.com (HELO mail-yx0-f175.google.com) (209.85.213.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 23 Sep 2011 19:56:37 +0000
Received: by yxj17 with SMTP id 17so3195158yxj.20        for <gcc@gcc.gnu.org>; Fri, 23 Sep 2011 12:56:36 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.150.208.15 with SMTP id f15mr4105277ybg.401.1316807796466; Fri, 23 Sep 2011 12:56:36 -0700 (PDT)
Received: by 10.151.51.5 with HTTP; Fri, 23 Sep 2011 12:56:36 -0700 (PDT)
In-Reply-To: <CAJE_GZn0k6=Pm_SoPHUY=C1w1djFnbOPzT6x2Ey7jRYpv0XVvA@mail.gmail.com>
References: <CAJE_GZn0k6=Pm_SoPHUY=C1w1djFnbOPzT6x2Ey7jRYpv0XVvA@mail.gmail.com>
Date: Fri, 23 Sep 2011 19:56:00 -0000
Message-ID: <CAFiYyc12q_=kcKj38bT8yHH3boet-+tFv7A-qzjJZFXS7QG17g@mail.gmail.com>
Subject: Re: Incorrect optimized (-O2) linked list code with 4.3.2
From: Richard Guenther <richard.guenther@gmail.com>
To: pavan tc <pavan.tc@gmail.com>
Cc: gcc@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2011-09/txt/msg00261.txt.bz2

On Mon, Sep 12, 2011 at 10:10 AM, pavan tc <pavan.tc@gmail.com> wrote:
> Hi,
>
> I would like to know if there have been issues with optimized linked
> list code with GCC 4.3.2. [optiimization flag : -O2]
>
> The following is the inlined code that has the problem:
>
> static inline void
> list_add_tail (struct list_head *new, struct list_head *head)
> {
> =A0 =A0 =A0 =A0new->next =3D head;
> =A0 =A0 =A0 =A0new->prev =3D head->prev;
>
> =A0 =A0 =A0 =A0new->prev->next =3D new;
> =A0 =A0 =A0 =A0new->next->prev =3D new;
> }
>
> The above code has been used in the loop as below:
>
> =A0 =A0 =A0 =A0pool =3D GF_CALLOC (count, padded_sizeof_type, gf_common_m=
t_long);
> =A0 =A0 =A0 =A0if (!pool) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0GF_FREE (mem_pool);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return NULL;
> =A0 =A0 =A0 =A0}
>
> =A0 =A0 =A0 =A0for (i =3D 0; i < count; i++) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0list =3D pool + (i * (padded_sizeof_type));
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0INIT_LIST_HEAD (list);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0list_add_tail (list, &mem_pool->list);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0^^^^^^^^^^^^
> =A0 =A0 =A0 =A0}
>
> '&mem_pool-> list' is used as the list head. mem_pool is a pointer to typ=
e :
> struct mem_pool {
> =A0 =A0 =A0 =A0struct list_head =A0list;
> =A0 =A0 =A0 =A0int =A0 =A0 =A0 =A0 =A0 =A0 =A0 hot_count;
> =A0 =A0 =A0 =A0int =A0 =A0 =A0 =A0 =A0 =A0 =A0 cold_count;
> =A0 =A0 =A0 =A0gf_lock_t =A0 =A0 =A0 =A0 lock;
> =A0 =A0 =A0 =A0unsigned long =A0 =A0 padded_sizeof_type;
> =A0 =A0 =A0 =A0void =A0 =A0 =A0 =A0 =A0 =A0 *pool;
> =A0 =A0 =A0 =A0void =A0 =A0 =A0 =A0 =A0 =A0 *pool_end;
> =A0 =A0 =A0 =A0int =A0 =A0 =A0 =A0 =A0 =A0 =A0 real_sizeof_type;
> };
>
> 'list' is the new member being added to the tail of the list pointed to b=
y head.
> It is a pointer to type:
> struct list_head {
> =A0 =A0 =A0 =A0struct list_head *next;
> =A0 =A0 =A0 =A0struct list_head *prev;
> };
>
> The generated assembly for the loop (with the linined list_add_tail())
> is as below:
>
> =A0 40a1c: =A0 =A0 =A0 e8 0f 03 fd ff =A0 =A0 =A0 =A0 callq =A010d30 <__g=
f_calloc@plt>
> =A0 40a21: =A0 =A0 =A0 48 85 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 test =A0 %rax=
,%rax
> =A0 40a24: =A0 =A0 =A0 48 89 c7 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mov =A0 =A0%r=
ax,%rdi
> =A0 40a27: =A0 =A0 =A0 0f 84 bf 00 00 00 =A0 je =A0 =A0 40aec <mem_pool_n=
ew_fn+0x14c>
> =A0 40a2d: =A0 =A0 =A0 48 8b 73 08 =A0 =A0 =A0 =A0 =A0mov =A0 =A00x8(%rbx=
),%rsi
> =A0 40a31: =A0 =A0 =A0 4d 8d 44 24 01 =A0 =A0 =A0lea =A0 =A00x1(%r12),%r8
> =A0 40a36: =A0 =A0 =A0 31 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xor =A0 =
=A0%eax,%eax
> =A0 40a38: =A0 =A0 =A0 b9 01 00 00 00 =A0 =A0 mov =A0 =A0$0x1,%ecx
> =A0 40a3d: =A0 =A0 =A0 0f 1f 00 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nopl =A0 (=
%rax)
> =A0 40a40: =A0 =A0 =A0 49 0f af c5 =A0 =A0 =A0 =A0 =A0 =A0imul =A0 %r13,%=
rax
> =A0 =A0 =A0 <=3D=3D=3D loop start
> =A0 40a44: =A0 =A0 =A0 48 8d 04 07 =A0 =A0 =A0 =A0 =A0lea =A0 =A0(%rdi,%r=
ax,1),%rax
> =A0 40a48: =A0 =A0 =A0 48 89 18 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0%rb=
x,(%rax)
> # list->next =3D head
> =A0 40a4b: =A0 =A0 =A0 48 89 06 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0%ra=
x,(%rsi)
> # =A0<!!> head->prev->next =3D list
> =A0 40a4e: =A0 =A0 =A0 48 8b 10 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0(%r=
ax),%rdx
> # rdx holds list->next
> =A0 40a51: =A0 =A0 =A0 48 89 70 08 =A0 =A0 =A0 =A0 mov =A0 =A0%rsi,0x8(%r=
ax) =A0 =A0 =A0 =A0 =A0#
> list->prev =3D head->prev;
> =A0 40a55: =A0 =A0 =A0 48 89 42 08 =A0 =A0 =A0 =A0 mov =A0 =A0%rax,0x8(%r=
dx) =A0 =A0 =A0 =A0 #
> list->next->prev =3D list
> =A0 40a59: =A0 =A0 =A0 48 89 c8 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0%rc=
x,%rax
> =A0 40a5c: =A0 =A0 =A0 48 83 c1 01 =A0 =A0 =A0 =A0 =A0add =A0 =A0$0x1,%rcx
> =A0 40a60: =A0 =A0 =A0 4c 39 c1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cmp =A0 =A0%r=
8,%rcx
> =A0 40a63: =A0 =A0 =A0 75 db =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 jne =A0 =
=A040a40 <mem_pool_new_fn+0xa0>
>
> In the assembly above, %rbx =A0holds the address of 'head'.
> %rsi holds the value of head->prev. This is assigned outside the loop and=
 the
> compiler classifies it as a loop invariant, which is where, I think,
> the problem is.
> This line of code should have been inside the loop.
> <!!> - %rsi still holds the value of head->prev that was assigned
> outside the loop.
>
> The following experiments eliminate the problem:
>
> 1. Using 'volatile' on the address that 'head' points to.
> 2. Using a function call (logging calls, for example) inside the loop.
> 3. Using the direct libc calloc instead of the GF_CALLOC.
> [GF_CALLOC does some accounting when accounting is enabled. Calls vanilla
> libc calloc() otherwise].
>
> So, anything that necessitates a different usage of %rsi seems to be corr=
ecting
> the behaviour.
>
> 4. Using gcc 4.4.3 [ The obvious solution would then be to use 4.4.3,
> but I would
> like to understand if this is a known problem with 4.3.2. Small
> programs written to
> emulate this problem do not exhibit the erroneous behaviour.]
>
> Please let me know if any more details about this behaviour are required.
> I'll be glad to provide them.

Use -fno-strict-aliasing.  Your code invokes undefined behavior.

> TIA,
> Pavan
>