From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-154615-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 654 invoked by alias); 11 Jun 2009 20:55:58 -0000
Received: (qmail 646 invoked by uid 22791); 11 Jun 2009 20:55:57 -0000
X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 	tests=AWL,BAYES_00,J_CHICKENPOX_41,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from yw-out-1718.google.com (HELO yw-out-1718.google.com) (74.125.46.153)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 11 Jun 2009 20:55:48 +0000
Received: by yw-out-1718.google.com with SMTP id 5so842624ywm.26         for <gcc@gcc.gnu.org>; Thu, 11 Jun 2009 13:55:45 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.151.119.3 with SMTP id w3mr5910278ybm.226.1244753745226; Thu,  	11 Jun 2009 13:55:45 -0700 (PDT)
In-Reply-To: <c568a2600905131358y1c369b61u139f3bb987e4c287@mail.gmail.com>
References: <c568a2600905071256x33b2a046v3276b5a7d4fcb63b@mail.gmail.com> 	 <4A03F8EA.5070705@gnu.org> 	 <c568a2600905131358y1c369b61u139f3bb987e4c287@mail.gmail.com>
Date: Thu, 11 Jun 2009 20:55:00 -0000
Message-ID: <c568a2600906111355m41a0801as833a876f944f0c0b@mail.gmail.com>
Subject: Re: Code optimization only in loops
From: Jean Christophe Beyler <jean.christophe.beyler@gmail.com>
To: Paolo Bonzini <bonzini@gnu.org>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-06/txt/msg00269.txt.bz2

I've gone back to this problem (since I've solved another one ;-)).
And I've moved forward a bit:

It seems that if I consider an array of characters, there are no
longer any shifts and therefore I do get my two loads with the use of
an offset:

Code:

char data[1312];

uint64_t goo (uint64_t i)
{
    return data[i] - data[i+13];
}

generates the right code with two loads with the same base but
different offsets.

If I use anything else than a char type, I get the problem in
generation. This seems to confirm that somehow, the way things are
generated blocks the subsequent optimization passes in seeing that the
addresses are linked.

Right now, I'm trying to figure out why I'm getting shifts and is this
the problem instead of a multiply. Since this was one of the
differences between what I get and what the i386 port gets.

If you've got any ideas, thanks again,
Jean Christophe

On Wed, May 13, 2009 at 4:58 PM, Jean Christophe
Beyler<jean.christophe.beyler@gmail.com> wrote:
> Ok, for the i386 port, I use uint32_t instead of uint64_t because
> otherwise the assembly code generated is a bit complicated (I'm on a
> 32 bit machine).
>
> The tree dump from final_cleanup are the same for the goo function:
> goo (i)
> {
> <bb 2>:
> =A0return data[i + 13] + data[i];
>
> }
>
>
> However, the first RTL dump from expand gives this for the i386 port:
>
> (insn 6 5 7 3 ld.c:17 (parallel [
> =A0 =A0 =A0 =A0 =A0 =A0(set (reg:SI 61)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(plus:SI (reg/v:SI 59 [ i ])
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(const_int 13 [0xd])))
> =A0 =A0 =A0 =A0 =A0 =A0(clobber (reg:CC 17 flags))
> =A0 =A0 =A0 =A0]) -1 (nil))
>
> (insn 7 6 8 3 ld.c:17 (set (reg/f:SI 62)
> =A0 =A0 =A0 =A0(symbol_ref:SI ("data") <var_decl 0xb7e7ce60 data>)) -1 (n=
il))
>
> (insn 8 7 9 3 ld.c:17 (set (reg/f:SI 63)
> =A0 =A0 =A0 =A0(symbol_ref:SI ("data") <var_decl 0xb7e7ce60 data>)) -1 (n=
il))
>
> (insn 9 8 10 3 ld.c:17 (set (reg:SI 64)
> =A0 =A0 =A0 =A0(mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(const_int 4 [0x4]))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(reg/f:SI 63)) [3 data S4 A32])) -1 (nil))
>
> (insn 10 9 11 3 ld.c:17 (set (reg:SI 65)
> =A0 =A0 =A0 =A0(mem/s:SI (plus:SI (mult:SI (reg:SI 61)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(const_int 4 [0x4]))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(reg/f:SI 62)) [3 data S4 A32])) -1 (nil))
>
> (insn 11 10 12 3 ld.c:17 (parallel [
> =A0 =A0 =A0 =A0 =A0 =A0(set (reg:SI 60)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(plus:SI (reg:SI 65)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(reg:SI 64)))
> =A0 =A0 =A0 =A0 =A0 =A0(clobber (reg:CC 17 flags))
> =A0 =A0 =A0 =A0]) -1 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (plus:SI
> (mult:SI (reg:SI 61)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(const_int 4 [0x4]))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(reg/f:SI 62)) [3 data S4 A32])
> =A0 =A0 =A0 =A0 =A0 =A0(mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(const_int 4 [0x4]))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(reg/f:SI 63)) [3 data S4 A32]))
> =A0 =A0 =A0 =A0(nil)))
>
> As we can see, the compiler moves 13, and the @ of data, then
> muliplies the 13 with 4 to get the right size and then performs the 2
> loads and finally has a plus.
>
> In my port, I get:
>
> (insn 6 5 7 3 ld.c:17 (set (reg:DI 75)
> =A0 =A0 =A0 =A0(plus:DI (reg/v:DI 73 [ i ])
> =A0 =A0 =A0 =A0 =A0 =A0(const_int 13 [0xd]))) -1 (nil))
>
> (insn 7 6 8 3 ld.c:17 (set (reg/f:DI 76)
> =A0 =A0 =A0 =A0(symbol_ref:DI ("data") <var_decl 0xb7c85bb0 data>)) -1 (n=
il))
>
> (insn 8 7 9 3 ld.c:17 (set (reg:DI 78)
> =A0 =A0 =A0 =A0(const_int 3 [0x3])) -1 (nil))
>
> (insn 9 8 10 3 ld.c:17 (set (reg:DI 77)
> =A0 =A0 =A0 =A0(ashift:DI (reg:DI 75)
> =A0 =A0 =A0 =A0 =A0 =A0(reg:DI 78))) -1 (nil))
>
> (insn 10 9 11 3 ld.c:17 (set (reg/f:DI 79)
> =A0 =A0 =A0 =A0(plus:DI (reg/f:DI 76)
> =A0 =A0 =A0 =A0 =A0 =A0(reg:DI 77))) -1 (nil))
>
> (insn 11 10 12 3 ld.c:17 (set (reg/f:DI 80)
> =A0 =A0 =A0 =A0(symbol_ref:DI ("data") <var_decl 0xb7c85bb0 data>)) -1 (n=
il))
>
> (insn 12 11 13 3 ld.c:17 (set (reg:DI 82)
> =A0 =A0 =A0 =A0(const_int 3 [0x3])) -1 (nil))
>
> (insn 13 12 14 3 ld.c:17 (set (reg:DI 81)
> =A0 =A0 =A0 =A0(ashift:DI (reg/v:DI 73 [ i ])
> =A0 =A0 =A0 =A0 =A0 =A0(reg:DI 82))) -1 (nil))
>
> (insn 14 13 15 3 ld.c:17 (set (reg/f:DI 83)
> =A0 =A0 =A0 =A0(plus:DI (reg/f:DI 80)
> =A0 =A0 =A0 =A0 =A0 =A0(reg:DI 81))) -1 (nil))
>
> (insn 15 14 16 3 ld.c:17 (set (reg:DI 84)
> =A0 =A0 =A0 =A0(mem/s:DI (reg/f:DI 79) [2 data S8 A64])) -1 (nil))
>
> (insn 16 15 17 3 ld.c:17 (set (reg:DI 85)
> =A0 =A0 =A0 =A0(mem/s:DI (reg/f:DI 83) [2 data S8 A64])) -1 (nil))
>
> (insn 17 16 18 3 ld.c:17 (set (reg:DI 74)
> =A0 =A0 =A0 =A0(plus:DI (reg:DI 84)
> =A0 =A0 =A0 =A0 =A0 =A0(reg:DI 85))) -1 (nil))
>
>
> Which seems to be the same idea, except that constant 3 gets load up
> and a shift is performed. Is it possible that it's that that is
> causing my problem in code generation?
>
> I'm trying to figure out why my port is generating a shift instead of
> simply a mult. I actually changed the cost of shift to a large value
> and then it uses adds instead of simply a mult. I seem to think that
> this is then an rtx_cost problem where I'm not telling the compiler
> that a multiplication in this case is correct.
>
> I've been playing with rtx_cost but have been unable to really get it
> to generate the right code.
>
> Thanks again for your help and insight,
> Jc
>
> On Fri, May 8, 2009 at 5:18 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>>
>>> It seems that when set in a loop, the program is able to perform some
>>> type of optimization to actually get the use of the offsets where as
>>> in the case of no loop, we have twice the calculations of instructions
>>> for each address calculations.
>>
>> I suggest you look at the dumps for i386 to see which pass does the
>> changes, and then see what happens in your port.
>>
>> Paolo
>>
>