From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-181281-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 11870 invoked by alias); 5 Dec 2013 00:48:56 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 11823 invoked by uid 89); 5 Dec 2013 00:48:55 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_05,SPF_PASS,URIBL_BLOCKED autolearn=ham version=3.3.2
X-HELO: mail-pd0-f175.google.com
Received: from Unknown (HELO mail-pd0-f175.google.com) (209.85.192.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 05 Dec 2013 00:48:54 +0000
Received: by mail-pd0-f175.google.com with SMTP id w10so23326166pde.6        for <gcc@gcc.gnu.org>; Wed, 04 Dec 2013 16:48:46 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:content-type:mime-version:subject:from         :in-reply-to:date:cc:content-transfer-encoding:message-id:references         :to;        bh=FbZZlSpUjADAgILkRFZN30LYt2LAagrkz2Wq3cxT8hQ=;        b=cimw5hW3cbcr0MnMTl/ujFq/zKEQFE2JVSBqYpfIEaEWYG/hzj64tqV1vz9UQ9SVJY         MXBy3KTXDekC0RqlQbmUha0w42o+eTyqvVPL6bDGkkwJdB2iMHS0O2W9ecY2+mO4d9lZ         kMx790IoNPk3+61ukKvlcBDnEvWjq42pItYP7N+G52XSQCdAMH7a15Dy5pU32pRmKPfS         Otq8QrzVxH5URHSbHDH4WcuiZwlMe7ia7bk/VlX7OjVToSK4AfbMcxfSFM8SiyqtzUDt         1423xBim6YdMllQUCD03qISt5qPBYQjehIINElrZoGYFR5o1F7IGOQsMcQdg6tuFGRDK         TRXw==
X-Gm-Message-State: ALoCoQnwhIpaiuoyft05m+QJes1mzRE5mPryfWcDfqQsddDHJcfoKVji0+uANyegZa4+Zs8Wt+nz
X-Received: by 10.68.198.97 with SMTP id jb1mr48763884pbc.104.1386204526378;        Wed, 04 Dec 2013 16:48:46 -0800 (PST)
Received: from [192.168.1.142] (121-72-151-47.dsl.telstraclear.net. [121.72.151.47])        by mx.google.com with ESMTPSA id e6sm32884219pbg.4.2013.12.04.16.48.44        for <multiple recipients>        (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);        Wed, 04 Dec 2013 16:48:45 -0800 (PST)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1822\))
Subject: Re: m68k optimisations?
From: Maxim Kuvyrkov <maxim@kugelworks.com>
In-Reply-To: <CAO9OKOO6NbmPwuSPWwoGFRWbxx0DqrZ8neN_wWNhUyiaZWiZmQ@mail.gmail.com>
Date: Thu, 05 Dec 2013 00:48:00 -0000
Cc: gcc <gcc@gcc.gnu.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F7B4975B-76CC-48BD-9ACE-96C181F0BDF6@kugelworks.com>
References: <CAO9OKOO6NbmPwuSPWwoGFRWbxx0DqrZ8neN_wWNhUyiaZWiZmQ@mail.gmail.com>
To: Fredrik Olsson <peylow@gmail.com>
X-SW-Source: 2013-12/txt/msg00040.txt.bz2

On 9/11/2013, at 12:08 am, Fredrik Olsson <peylow@gmail.com> wrote:

> I have this simple functions:
> int sum_vec(int c, ...) {
>    va_list argptr;
>    va_start(argptr, c);
>    int sum =3D 0;
>    while (c--) {
>        int x =3D va_arg(argptr, int);
>        sum +=3D x;
>    }
>    va_end(argptr);
>    return sum;
> }
>=20
>=20
> When compiling with "-fomit-frame-pointer -Os -march=3D68000 -c -S
> -mshort" I get this assembly (I have manually added comments with
> clock cycles per instruction and a total for a count of 0, 8 and n>0):
>    .even
>    .globl _sum_vec
> _sum_vec:
>    lea (6,%sp),%a0         | 8
>    move.w 4(%sp),%d1       | 12
>    clr.w %d0               | 4
>    jra .L1                 | 12
> .L2:
>    add.w (%a0)+,%d0        | 8
> .L1:
>    dbra %d1,.L2            | 16,12
>    rts                     | 16
> | c=3D=3D0: 8+12+4+12+12+16=3D64
> | c=3D=3D8: 8+12+4+12+(16+8)*8+12+16=3D256
> | c=3D=3Dn: =3D64+24n
>=20
> When instead compiling with "-fomit-frame-pointer -O3 -march=3D68000 -c
> -S -mshort" I expect to get more aggressive optimisation than -Os, or
> at least just as performant, but instead I get this:
>    .even
>    .globl _sum_vec
> _sum_vec:
>    move.w 4(%sp),%d0       | 12
>    jeq .L2                 | 12,8
>    lea (6,%sp),%a0         | 8
>    subq.w #1,%d0           | 4
>    and.l #65535,%d0        | 16
>    add.l %d0,%d0           | 8
>    lea 8(%sp,%d0.l),%a1    | 16
>    clr.w %d0               | 4
> .L1:
>    add.w (%a0)+,%d0        | 8
>    cmp.l %a0,%a1           | 8
>    jne .L1                 | 12|8
>    rts                     | 16
> .L2:
>    clr.w %d0               | 4
>    rts                     | 16
> | c=3D=3D0: 12+12+4+16=3D44
> | c=3D=3D8: 12+8+8+4+16+8+16+4+(8+8+12)*4-4+16=3D316
> | c=3D=3Dn: =3D88+28n
>=20
> The count=3D=3D0 case is better. I can see what optimisation has been
> tried for the loop, but it just not working since both the ini for the
> loop and the loop itself becomes more costly.
>=20
> Being a GCC beginner I would like a few pointers as to how I should go
> about to fix this?

You investigate such problems by comparing intermediate debug dumps of two =
compilation scenarios; by the assembly time it is almost impossible to gues=
s where the problem is coming from.  Add -fdump-tree-all and -fdump-rtl-all=
 to the compilation flags and find which optimization pass makes the wrong =
decision.  Then you trace that optimization pass or file a bug report in ho=
pes that someone (optimization maintainer) will look at it.

Read through GCC wiki for information on debugging and troubleshooting GCC:
- http://gcc.gnu.org/wiki/GettingStarted
- http://gcc.gnu.org/wiki/FAQ
- http://gcc.gnu.org/wiki/

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com