Re: Code optimization only in loops - Jean Christophe Beyler

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jean Christophe Beyler <jean.christophe.beyler@gmail.com>
To: Paolo Bonzini <bonzini@gnu.org>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Re: Code optimization only in loops
Date: Thu, 11 Jun 2009 20:55:00 -0000	[thread overview]
Message-ID: <c568a2600906111355m41a0801as833a876f944f0c0b@mail.gmail.com> (raw)
In-Reply-To: <c568a2600905131358y1c369b61u139f3bb987e4c287@mail.gmail.com>

I've gone back to this problem (since I've solved another one ;-)).
And I've moved forward a bit:

It seems that if I consider an array of characters, there are no
longer any shifts and therefore I do get my two loads with the use of
an offset:

Code:

char data[1312];

uint64_t goo (uint64_t i)
{
    return data[i] - data[i+13];
}

generates the right code with two loads with the same base but
different offsets.

If I use anything else than a char type, I get the problem in
generation. This seems to confirm that somehow, the way things are
generated blocks the subsequent optimization passes in seeing that the
addresses are linked.

Right now, I'm trying to figure out why I'm getting shifts and is this
the problem instead of a multiply. Since this was one of the
differences between what I get and what the i386 port gets.

If you've got any ideas, thanks again,
Jean Christophe

On Wed, May 13, 2009 at 4:58 PM, Jean Christophe
Beyler<jean.christophe.beyler@gmail.com> wrote:
> Ok, for the i386 port, I use uint32_t instead of uint64_t because
> otherwise the assembly code generated is a bit complicated (I'm on a
> 32 bit machine).
>
> The tree dump from final_cleanup are the same for the goo function:
> goo (i)
> {
> <bb 2>:
>  return data[i + 13] + data[i];
>
> }
>
>
> However, the first RTL dump from expand gives this for the i386 port:
>
> (insn 6 5 7 3 ld.c:17 (parallel [
>            (set (reg:SI 61)
>                (plus:SI (reg/v:SI 59 [ i ])
>                    (const_int 13 [0xd])))
>            (clobber (reg:CC 17 flags))
>        ]) -1 (nil))
>
> (insn 7 6 8 3 ld.c:17 (set (reg/f:SI 62)
>        (symbol_ref:SI ("data") <var_decl 0xb7e7ce60 data>)) -1 (nil))
>
> (insn 8 7 9 3 ld.c:17 (set (reg/f:SI 63)
>        (symbol_ref:SI ("data") <var_decl 0xb7e7ce60 data>)) -1 (nil))
>
> (insn 9 8 10 3 ld.c:17 (set (reg:SI 64)
>        (mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
>                    (const_int 4 [0x4]))
>                (reg/f:SI 63)) [3 data S4 A32])) -1 (nil))
>
> (insn 10 9 11 3 ld.c:17 (set (reg:SI 65)
>        (mem/s:SI (plus:SI (mult:SI (reg:SI 61)
>                    (const_int 4 [0x4]))
>                (reg/f:SI 62)) [3 data S4 A32])) -1 (nil))
>
> (insn 11 10 12 3 ld.c:17 (parallel [
>            (set (reg:SI 60)
>                (plus:SI (reg:SI 65)
>                    (reg:SI 64)))
>            (clobber (reg:CC 17 flags))
>        ]) -1 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (plus:SI
> (mult:SI (reg:SI 61)
>                        (const_int 4 [0x4]))
>                    (reg/f:SI 62)) [3 data S4 A32])
>            (mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
>                        (const_int 4 [0x4]))
>                    (reg/f:SI 63)) [3 data S4 A32]))
>        (nil)))
>
> As we can see, the compiler moves 13, and the @ of data, then
> muliplies the 13 with 4 to get the right size and then performs the 2
> loads and finally has a plus.
>
> In my port, I get:
>
> (insn 6 5 7 3 ld.c:17 (set (reg:DI 75)
>        (plus:DI (reg/v:DI 73 [ i ])
>            (const_int 13 [0xd]))) -1 (nil))
>
> (insn 7 6 8 3 ld.c:17 (set (reg/f:DI 76)
>        (symbol_ref:DI ("data") <var_decl 0xb7c85bb0 data>)) -1 (nil))
>
> (insn 8 7 9 3 ld.c:17 (set (reg:DI 78)
>        (const_int 3 [0x3])) -1 (nil))
>
> (insn 9 8 10 3 ld.c:17 (set (reg:DI 77)
>        (ashift:DI (reg:DI 75)
>            (reg:DI 78))) -1 (nil))
>
> (insn 10 9 11 3 ld.c:17 (set (reg/f:DI 79)
>        (plus:DI (reg/f:DI 76)
>            (reg:DI 77))) -1 (nil))
>
> (insn 11 10 12 3 ld.c:17 (set (reg/f:DI 80)
>        (symbol_ref:DI ("data") <var_decl 0xb7c85bb0 data>)) -1 (nil))
>
> (insn 12 11 13 3 ld.c:17 (set (reg:DI 82)
>        (const_int 3 [0x3])) -1 (nil))
>
> (insn 13 12 14 3 ld.c:17 (set (reg:DI 81)
>        (ashift:DI (reg/v:DI 73 [ i ])
>            (reg:DI 82))) -1 (nil))
>
> (insn 14 13 15 3 ld.c:17 (set (reg/f:DI 83)
>        (plus:DI (reg/f:DI 80)
>            (reg:DI 81))) -1 (nil))
>
> (insn 15 14 16 3 ld.c:17 (set (reg:DI 84)
>        (mem/s:DI (reg/f:DI 79) [2 data S8 A64])) -1 (nil))
>
> (insn 16 15 17 3 ld.c:17 (set (reg:DI 85)
>        (mem/s:DI (reg/f:DI 83) [2 data S8 A64])) -1 (nil))
>
> (insn 17 16 18 3 ld.c:17 (set (reg:DI 74)
>        (plus:DI (reg:DI 84)
>            (reg:DI 85))) -1 (nil))
>
>
> Which seems to be the same idea, except that constant 3 gets load up
> and a shift is performed. Is it possible that it's that that is
> causing my problem in code generation?
>
> I'm trying to figure out why my port is generating a shift instead of
> simply a mult. I actually changed the cost of shift to a large value
> and then it uses adds instead of simply a mult. I seem to think that
> this is then an rtx_cost problem where I'm not telling the compiler
> that a multiplication in this case is correct.
>
> I've been playing with rtx_cost but have been unable to really get it
> to generate the right code.
>
> Thanks again for your help and insight,
> Jc
>
> On Fri, May 8, 2009 at 5:18 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>>
>>> It seems that when set in a loop, the program is able to perform some
>>> type of optimization to actually get the use of the offsets where as
>>> in the case of no loop, we have twice the calculations of instructions
>>> for each address calculations.
>>
>> I suggest you look at the dumps for i386 to see which pass does the
>> changes, and then see what happens in your port.
>>
>> Paolo
>>
>

next prev parent reply	other threads:[~2009-06-11 20:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-07 20:46 Jean Christophe Beyler
2009-05-08 11:40 ` Paolo Bonzini
2009-05-14  2:11   ` Jean Christophe Beyler
2009-06-11 20:55     ` Jean Christophe Beyler [this message]
2009-06-12  7:56       ` Paolo Bonzini
2009-07-13 21:30         ` Jean Christophe Beyler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c568a2600906111355m41a0801as833a876f944f0c0b@mail.gmail.com \
    --to=jean.christophe.beyler@gmail.com \
    --cc=bonzini@gnu.org \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).