public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Where can I put the optimization of got for arm back end at?
@ 2010-03-28 21:00 Carrot Wei
  2010-04-01 18:11 ` Andrew Haley
  0 siblings, 1 reply; 5+ messages in thread
From: Carrot Wei @ 2010-03-28 21:00 UTC (permalink / raw)
  To: gcc, Richard Earnshaw, Paul Brook, nickc

Hi

The detailed description of the optimization is at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM
specific optimization.

This optimization uses one less register (the register hold the GOT
base), to get this beneficial the ideal place for it should be before
register allocation.

Usually expand pass generates instructions to load global variable's
address from GOT entry for each access of the global variable. Later
cse/gcse passes can remove many of them. In order to precisely model
the cost, this optimization should be put after some cse/gcse passes.

So what is the best place for this optimization? Is there any existed
pass can be enhanced with this optimization? Or should I add a new
pass?

thanks
Guozhi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Where can I put the optimization of got for arm back end at?
  2010-03-28 21:00 Where can I put the optimization of got for arm back end at? Carrot Wei
@ 2010-04-01 18:11 ` Andrew Haley
  2010-04-01 20:31   ` Steven Bosscher
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Haley @ 2010-04-01 18:11 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc, Richard Earnshaw, Paul Brook, nickc

On 28/03/10 15:45, Carrot Wei wrote:
> Hi
> 
> The detailed description of the optimization is at
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM
> specific optimization.
> 
> This optimization uses one less register (the register hold the GOT
> base), to get this beneficial the ideal place for it should be before
> register allocation.
> 
> Usually expand pass generates instructions to load global variable's
> address from GOT entry for each access of the global variable. Later
> cse/gcse passes can remove many of them. In order to precisely model
> the cost, this optimization should be put after some cse/gcse passes.
> 
> So what is the best place for this optimization? Is there any existed
> pass can be enhanced with this optimization? Or should I add a new
> pass?

The obvious place is machine-dependent reorg, which is a very late pass.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Where can I put the optimization of got for arm back end at?
  2010-04-01 18:11 ` Andrew Haley
@ 2010-04-01 20:31   ` Steven Bosscher
  2010-04-02  4:06     ` Carrot Wei
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Bosscher @ 2010-04-01 20:31 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Carrot Wei, gcc, Richard Earnshaw, Paul Brook, nickc

On Thu, Apr 1, 2010 at 8:10 PM, Andrew Haley <aph@redhat.com> wrote:
> On 28/03/10 15:45, Carrot Wei wrote:
>> Hi
>>
>> The detailed description of the optimization is at
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM
>> specific optimization.
>>
>> This optimization uses one less register (the register hold the GOT
>> base), to get this beneficial the ideal place for it should be before
>> register allocation.
>>
>> Usually expand pass generates instructions to load global variable's
>> address from GOT entry for each access of the global variable. Later
>> cse/gcse passes can remove many of them. In order to precisely model
>> the cost, this optimization should be put after some cse/gcse passes.
>>
>> So what is the best place for this optimization? Is there any existed
>> pass can be enhanced with this optimization? Or should I add a new
>> pass?
>
> The obvious place is machine-dependent reorg, which is a very late pass.

Yes, and after register allocation, i.e. too late for Guozhi.

Basically there is no place right now to stuff a pass like that.
Question is: Is this optimization really, reallyreallyreally so target
specific that a target-independent pass is not the better option?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Where can I put the optimization of got for arm back end at?
  2010-04-01 20:31   ` Steven Bosscher
@ 2010-04-02  4:06     ` Carrot Wei
       [not found]       ` <l2z5885251a1004050624h1237ec10z2a0bf34b7ba91d2c@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Carrot Wei @ 2010-04-02  4:06 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Andrew Haley, gcc, Richard Earnshaw, Paul Brook, nickc

This is really a good question!

Consider the requirement of this optimization.

1. There should be at least 2 methods to load a global variable's
address from GOT. Usually it means using different relocation types.

2. By default all global variables access use the same one method.

3. In some cases (less than X global variables access) method A is
better, in other cases method B is better.

With these constraints a simplify_GOT optimization pass is applicable.
But these constraints are too weak. The new optimization pass nearly
can do nothing except a call to target specific hook. I suspect such a
pass is acceptable.

We can also add more constraints:

4. If we can restrict method A as following: first load the base
address of GOT into a register pic_reg, then the real global
variable's address is loaded as
            load offset_reg, the offset from GOT base to the GOT entry
            load address, [pic_reg + offset_reg]

With this constraint the new pass knows there is a special register
pic_reg, it can look for and count all usage of pic_reg. If all usages
are method A and the count is more than the target specific threshold,
then the usages can be rewritten as method B. The method detection and
rewritten should be target specific.

I don't know how other targets handle global address access with
-fpic. And how many targets satisfy these 4 constraints.

thanks
Guozhi

On Fri, Apr 2, 2010 at 4:31 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Thu, Apr 1, 2010 at 8:10 PM, Andrew Haley <aph@redhat.com> wrote:
>> On 28/03/10 15:45, Carrot Wei wrote:
>>> Hi
>>>
>>> The detailed description of the optimization is at
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM
>>> specific optimization.
>>>
>>> This optimization uses one less register (the register hold the GOT
>>> base), to get this beneficial the ideal place for it should be before
>>> register allocation.
>>>
>>> Usually expand pass generates instructions to load global variable's
>>> address from GOT entry for each access of the global variable. Later
>>> cse/gcse passes can remove many of them. In order to precisely model
>>> the cost, this optimization should be put after some cse/gcse passes.
>>>
>>> So what is the best place for this optimization? Is there any existed
>>> pass can be enhanced with this optimization? Or should I add a new
>>> pass?
>>
>> The obvious place is machine-dependent reorg, which is a very late pass.
>
> Yes, and after register allocation, i.e. too late for Guozhi.
>
> Basically there is no place right now to stuff a pass like that.
> Question is: Is this optimization really, reallyreallyreally so target
> specific that a target-independent pass is not the better option?
>
> Ciao!
> Steven
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Where can I put the optimization of got for arm back end at?
       [not found]       ` <l2z5885251a1004050624h1237ec10z2a0bf34b7ba91d2c@mail.gmail.com>
@ 2010-04-06 13:38         ` Carrot Wei
  0 siblings, 0 replies; 5+ messages in thread
From: Carrot Wei @ 2010-04-06 13:38 UTC (permalink / raw)
  To: Paul Yuan
  Cc: Steven Bosscher, Andrew Haley, gcc, Richard Earnshaw, Paul Brook, nickc

It is the base of GOT that is loaded once in the function prologue.
But for individual global variable's access, the address of the global
variable is loaded from GOT every time the global variable is accessed
after expand pass. For example, compile the following function with
options -Os -fpic -mthumb

extern int i;
int foo(int j)
{

  int t = i;
  i = j;
  return t;
}

After expand pass I got
...

(insn 6 4 7 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 0 [0x0])
                                        ] 21)
                                    (const_int 4 [0x4])))
                        ] 24))
            ] 3)) -1 (nil))

(insn 7 6 8 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (reg:SI 136)
                (const_int 4 [0x4])
                (const_int 0 [0x0])
            ] 4)) -1 (nil))

(insn 8 7 2 2 src/./static_pic.c:5 (use (reg:SI 136)) -1 (nil))

(insn 2 8 3 2 src/./static_pic.c:3 (set (reg/v:SI 135 [ j ])
        (reg:SI 0 r0 [ j ])) -1 (nil))

(note 3 2 5 2 NOTE_INSN_FUNCTION_BEG)

(note 5 3 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(insn 9 5 10 3 src/./static_pic.c:5 (set (reg:SI 138)
        (unspec:SI [
                (symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
            ] 3)) -1 (nil))

(insn 10 9 11 3 src/./static_pic.c:5 (set (reg/f:SI 137)
        (mem/u/c:SI (plus:SI (reg:SI 136)
                (reg:SI 138)) [0 S4 A32])) -1 (expr_list:REG_EQUAL
(symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
        (nil)))

(insn 11 10 12 3 src/./static_pic.c:5 (set (reg/v:SI 133 [ t ])
        (mem/c/i:SI (reg/f:SI 137) [2 i+0 S4 A32])) -1 (nil))

(insn 12 11 13 3 src/./static_pic.c:6 (set (reg:SI 140)
        (unspec:SI [
                (symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
            ] 3)) -1 (nil))

(insn 13 12 14 3 src/./static_pic.c:6 (set (reg/f:SI 139)
        (mem/u/c:SI (plus:SI (reg:SI 136)
                (reg:SI 140)) [0 S4 A32])) -1 (expr_list:REG_EQUAL
(symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
        (nil)))

(insn 14 13 15 3 src/./static_pic.c:6 (set (mem/c/i:SI (reg/f:SI 139)
[2 i+0 S4 A32])
        (reg/v:SI 135 [ j ])) -1 (nil))

(insn 15 14 16 3 src/./static_pic.c:6 (set (reg:SI 134 [ <retval> ])
        (reg/v:SI 133 [ t ])) -1 (nil))

...

Insn 9 and 10 load the address of global variable i for the first
access. Insn 12 and 13 load the address of i for the second access.

After cse1 pass I got

(insn 6 4 7 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 0 [0x0])
                                        ] 21)
                                    (const_int 4 [0x4])))
                        ] 24))
            ] 3)) 169 {pic_load_addr_thumb1} (nil))

(insn 7 6 8 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (reg:SI 136)
                (const_int 4 [0x4])
                (const_int 0 [0x0])
            ] 4)) 170 {pic_add_dot_plus_four} (nil))

(insn 8 7 2 2 src/./static_pic.c:5 (use (reg:SI 136)) -1 (nil))

(insn 2 8 3 2 src/./static_pic.c:3 (set (reg/v:SI 135 [ j ])
        (reg:SI 0 r0 [ j ])) 167 {*thumb1_movsi_insn} (nil))

(note 3 2 9 2 NOTE_INSN_FUNCTION_BEG)

(insn 9 3 10 2 src/./static_pic.c:5 (set (reg:SI 138)
        (unspec:SI [
                (symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
            ] 3)) 169 {pic_load_addr_thumb1} (nil))

(insn 10 9 11 2 src/./static_pic.c:5 (set (reg/f:SI 137)
        (mem/u/c:SI (plus:SI (reg:SI 136)
                (reg:SI 138)) [0 S4 A32])) 167 {*thumb1_movsi_insn}
(expr_list:REG_EQUAL (symbol_ref:SI ("i") [flags 0xc0]  <var_decl
0x7f9e42227000 i>)
        (nil)))

(insn 11 10 12 2 src/./static_pic.c:5 (set (reg/v:SI 133 [ t ])
        (mem/c/i:SI (reg/f:SI 137) [2 i+0 S4 A32])) 167
{*thumb1_movsi_insn} (nil))

(insn 12 11 13 2 src/./static_pic.c:6 (set (reg:SI 140)
        (reg:SI 138)) 167 {*thumb1_movsi_insn} (nil))

(insn 13 12 14 2 src/./static_pic.c:6 (set (reg/f:SI 139)
        (reg/f:SI 137)) 167 {*thumb1_movsi_insn} (expr_list:REG_EQUAL
(symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
        (nil)))

(insn 14 13 15 2 src/./static_pic.c:6 (set (mem/c/i:SI (reg/f:SI 137)
[2 i+0 S4 A32])
        (reg/v:SI 135 [ j ])) 167 {*thumb1_movsi_insn} (nil))

(insn 15 14 19 2 src/./static_pic.c:6 (set (reg:SI 134 [ <retval> ])
        (reg/v:SI 133 [ t ])) 167 {*thumb1_movsi_insn} (nil))

(insn 19 15 22 2 src/./static_pic.c:8 (set (reg/i:SI 0 r0)
        (reg/v:SI 133 [ t ])) 167 {*thumb1_movsi_insn} (nil))

Now the address of global variable i is loaded once by insn 9 and 10.
The later access of i (insn 13) reuse the result of insn 10 (reg 137).

So we'd better do it after some cse/gcse passes.

On Mon, Apr 5, 2010 at 9:24 PM, Paul Yuan <yingbo.com@gmail.com> wrote:
> I remember that the GOT address is loaded only once in the function
> prologue. It is not the cse/gcse that removes the two load insns. For ARM,
> GOT address is loaded into sl reg.
>
> So simplify_GOT should precede register allocation. Otherwise compiler
> cannot exploit the relaxed register. I suggest the simplify_GOT is
> integrated into expand_pass, where we can consider different targets and
> speed/size trade-off.
>
>
> On Fri, Apr 2, 2010 at 12:06 PM, Carrot Wei <carrot@google.com> wrote:
>>
>> This is really a good question!
>>
>> Consider the requirement of this optimization.
>>
>> 1. There should be at least 2 methods to load a global variable's
>> address from GOT. Usually it means using different relocation types.
>>
>> 2. By default all global variables access use the same one method.
>>
>> 3. In some cases (less than X global variables access) method A is
>> better, in other cases method B is better.
>>
>> With these constraints a simplify_GOT optimization pass is applicable.
>> But these constraints are too weak. The new optimization pass nearly
>> can do nothing except a call to target specific hook. I suspect such a
>> pass is acceptable.
>>
>> We can also add more constraints:
>>
>> 4. If we can restrict method A as following: first load the base
>> address of GOT into a register pic_reg, then the real global
>> variable's address is loaded as
>>            load offset_reg, the offset from GOT base to the GOT entry
>>            load address, [pic_reg + offset_reg]
>>
>> With this constraint the new pass knows there is a special register
>> pic_reg, it can look for and count all usage of pic_reg. If all usages
>> are method A and the count is more than the target specific threshold,
>> then the usages can be rewritten as method B. The method detection and
>> rewritten should be target specific.
>>
>> I don't know how other targets handle global address access with
>> -fpic. And how many targets satisfy these 4 constraints.
>>
>> thanks
>> Guozhi
>>
>> On Fri, Apr 2, 2010 at 4:31 AM, Steven Bosscher <stevenb.gcc@gmail.com>
>> wrote:
>> > On Thu, Apr 1, 2010 at 8:10 PM, Andrew Haley <aph@redhat.com> wrote:
>> >> On 28/03/10 15:45, Carrot Wei wrote:
>> >>> Hi
>> >>>
>> >>> The detailed description of the optimization is at
>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM
>> >>> specific optimization.
>> >>>
>> >>> This optimization uses one less register (the register hold the GOT
>> >>> base), to get this beneficial the ideal place for it should be before
>> >>> register allocation.
>> >>>
>> >>> Usually expand pass generates instructions to load global variable's
>> >>> address from GOT entry for each access of the global variable. Later
>> >>> cse/gcse passes can remove many of them. In order to precisely model
>> >>> the cost, this optimization should be put after some cse/gcse passes.
>> >>>
>> >>> So what is the best place for this optimization? Is there any existed
>> >>> pass can be enhanced with this optimization? Or should I add a new
>> >>> pass?
>> >>
>> >> The obvious place is machine-dependent reorg, which is a very late
>> >> pass.
>> >
>> > Yes, and after register allocation, i.e. too late for Guozhi.
>> >
>> > Basically there is no place right now to stuff a pass like that.
>> > Question is: Is this optimization really, reallyreallyreally so target
>> > specific that a target-independent pass is not the better option?
>> >
>> > Ciao!
>> > Steven
>> >
>
>
>
> --
> Regards,
> Paul Yuan (袁鹏)
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-06 13:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-28 21:00 Where can I put the optimization of got for arm back end at? Carrot Wei
2010-04-01 18:11 ` Andrew Haley
2010-04-01 20:31   ` Steven Bosscher
2010-04-02  4:06     ` Carrot Wei
     [not found]       ` <l2z5885251a1004050624h1237ec10z2a0bf34b7ba91d2c@mail.gmail.com>
2010-04-06 13:38         ` Carrot Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).