public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* TREE_CODE mania
@ 2002-09-03 20:18 Devang Patel
  2002-09-03 20:50 ` Daniel Berlin
  2002-09-04  1:51 ` Neil Booth
  0 siblings, 2 replies; 8+ messages in thread
From: Devang Patel @ 2002-09-03 20:18 UTC (permalink / raw)
  To: gcc

Hi All,

Recently there was a discussion about faster compile time. And
memory usage and/or allocation is considered one probable
culprit behind the slow compiler.

To understand actual memory usage pattern, I instrumented GCC
by replacing TREE_CODE macro with function TREE_CODE_read().
I just wanted to see what is the usage pattern and how bad/good is it.

I collected profiled data for following one line program.

int foo() { return 1;}

Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read()
37572 times! I was expecting that number to be in couple of thousands
range but 37k seems high to me.

I think, such a high number of indirect memory references puts
high  pressure on VM and GCC's memory manager to maintain locality.
May be we can do simple code reorganizations using few extra
local variables to reduce this pressure. Or may be I am unnecessarily
surprised by this number.

Here is relevant gprof data...

                 0.00        0.00     694/37572       _convert <cycle 1> 
[219]
                 0.00        0.00     826/37572       _pushdecl [211]
                 0.00        0.00     927/37572       _int_const_binop 
[23]
                 0.00        0.00    1019/37572       _round_type_align 
[171]
                 0.00        0.00    1027/37572       
_darwin_encode_section_info [194]
                 0.00        0.00    1155/37572       _layout_type 
<cycle 3> [10]
                 0.00        0.00    1306/37572       _tree_size [122]
                 0.00        0.00    1499/37572       
_finalize_type_size [28]
                 0.00        0.00    1674/37572       _integer_zerop 
[156]
                 0.00        0.00    2286/37572       _fold <cycle 1> 
[229]
                 0.00        0.00    2304/37572       _make_decl_rtl [16]
                 0.00        0.00    3193/37572       _integer_onep [135]
                 0.00        0.00    4488/37572       _size_binop [22]
                 0.00        0.00    5258/37572       _is_attribute_p 
[101]
                 0.00        0.00    6537/37572       _force_fit_type 
[114]
[91]     0.0    0.00        0.00   37572         _TREE_CODE_read [91]

Now in this data size_binop looks interesting. It is expensive and 
according
to gprof it is consuming 16% of total compile time.

[22]    16.7    0.00        0.01    1496         _size_binop [22]

-Devang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-03 20:18 TREE_CODE mania Devang Patel
@ 2002-09-03 20:50 ` Daniel Berlin
  2002-09-04 10:19   ` Devang Patel
  2002-09-04 10:23   ` Devang Patel
  2002-09-04  1:51 ` Neil Booth
  1 sibling, 2 replies; 8+ messages in thread
From: Daniel Berlin @ 2002-09-03 20:50 UTC (permalink / raw)
  To: Devang Patel; +Cc: gcc



On Tue, 3 Sep 2002, Devang Patel wrote:

> Hi All,
> 
> Recently there was a discussion about faster compile time. And
> memory usage and/or allocation is considered one probable
> culprit behind the slow compiler.
> 
> To understand actual memory usage pattern, I instrumented GCC
> by replacing TREE_CODE macro with function TREE_CODE_read().
> I just wanted to see what is the usage pattern and how bad/good is it.
> 
> I collected profiled data for following one line program.
> 
> int foo() { return 1;}
> 
> Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read()
> 37572 times! I was expecting that number to be in couple of thousands
> range but 37k seems high to me.
> 
> I think, such a high number of indirect memory references puts
> high  pressure on VM and GCC's memory manager to maintain locality.

Which it doesn't.
Can't we attack this problem directly?
By maybe using object based bins rather than size based ones, at least for 
trees and RTL?

ggc_alloc_rtx and ggc_alloc_tree are just defined to call ggc_alloc, but 
we could change this to do that, no?
Maybe these types of objects are special enough that even though they may 
have different sizes (IE different RTL objects have different sizes), they 
should be in the same bins anyway.

Then we'd have the locality for our most used objects.
Do we really care about locality for many other things anyway?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-03 20:18 TREE_CODE mania Devang Patel
  2002-09-03 20:50 ` Daniel Berlin
@ 2002-09-04  1:51 ` Neil Booth
  2002-09-04 10:15   ` Devang Patel
  1 sibling, 1 reply; 8+ messages in thread
From: Neil Booth @ 2002-09-04  1:51 UTC (permalink / raw)
  To: Devang Patel; +Cc: gcc

Devang Patel wrote:-

> Now in this data size_binop looks interesting. It is expensive and 
> according
> to gprof it is consuming 16% of total compile time.
> 
> [22]    16.7    0.00        0.01    1496         _size_binop [22]

For really short runs, isn't the time bucketing fairly useless?
Or do you have reason to believe it's accurate in your case?

Neil.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-04  1:51 ` Neil Booth
@ 2002-09-04 10:15   ` Devang Patel
  0 siblings, 0 replies; 8+ messages in thread
From: Devang Patel @ 2002-09-04 10:15 UTC (permalink / raw)
  To: Neil Booth; +Cc: gcc


On Wednesday, September 4, 2002, at 01:50 AM, Neil Booth wrote:

> Devang Patel wrote:-
>
>> Now in this data size_binop looks interesting. It is expensive and
>> according
>> to gprof it is consuming 16% of total compile time.
>>
>> [22]    16.7    0.00        0.01    1496         _size_binop [22]
>
> For really short runs, isn't the time bucketing fairly useless?
> Or do you have reason to believe it's accurate in your case?

I collected data by compiling large source[1] and
size_binop drops to 3.7%.  It counted for 300095 TREE_CODE
references out of total 5391019. But in this new example,
TREE_CODE itself costs 3.2%

-Devang
[1] For this new example, line count for preprocessed source
      collected using -E is 81434.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-03 20:50 ` Daniel Berlin
@ 2002-09-04 10:19   ` Devang Patel
  2002-09-04 16:04     ` Daniel Berlin
  2002-09-04 10:23   ` Devang Patel
  1 sibling, 1 reply; 8+ messages in thread
From: Devang Patel @ 2002-09-04 10:19 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]


On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote:

> On Tue, 3 Sep 2002, Devang Patel wrote:
>
>> Hi All,
>>
>> Recently there was a discussion about faster compile time. And
>> memory usage and/or allocation is considered one probable
>> culprit behind the slow compiler.
>>
>> To understand actual memory usage pattern, I instrumented GCC
>> by replacing TREE_CODE macro with function TREE_CODE_read().
>> I just wanted to see what is the usage pattern and how bad/good is it.
>>
>> I collected profiled data for following one line program.
>>
>> int foo() { return 1;}
>>
>> Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read()
>> 37572 times! I was expecting that number to be in couple of thousands
>> range but 37k seems high to me.
>>
>> I think, such a high number of indirect memory references puts
>> high  pressure on VM and GCC's memory manager to maintain locality.
>
> Which it doesn't.
> Can't we attack this problem directly?
> By maybe using object based bins rather than size based ones, at least 
> for
> trees and RTL?

Sure, we can try using different allocation schemes to achieve better 
compile
time performance. But this approach is like -- earn more money and 
allocate
funds in better way to meet the budget. I am thinking in terms, can we 
reduce
expenditure ? I think, we need to work in both direction to achieve 
better
compile time speedup.

-Devang

[-- Attachment #2: Type: text/enriched, Size: 1410 bytes --]



On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote:


<excerpt>On Tue, 3 Sep 2002, Devang Patel wrote:


<excerpt>Hi All,


Recently there was a discussion about faster compile time. And

memory usage and/or allocation is considered one probable

culprit behind the slow compiler.


To understand actual memory usage pattern, I instrumented GCC

by replacing TREE_CODE macro with function TREE_CODE_read().

I just wanted to see what is the usage pattern and how bad/good is it.


I collected profiled data for following one line program.


int foo() { return 1;}


Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read()

37572 times! I was expecting that number to be in couple of thousands

range but 37k seems high to me.


I think, such a high number of indirect memory references puts

high  pressure on VM and GCC's memory manager to maintain locality.

</excerpt>

Which it doesn't.

Can't we attack this problem directly?

By maybe using object based bins rather than size based ones, at least
for 

trees and RTL?

</excerpt>

Sure, we can try using different allocation schemes to achieve better
compile

time performance. But this approach is like -- earn more money and
allocate 

funds in better way to meet the budget. I am thinking in terms, can we
reduce

expenditure ? I think, we need to work in both direction to achieve
better

compile time speedup.


-Devang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-03 20:50 ` Daniel Berlin
  2002-09-04 10:19   ` Devang Patel
@ 2002-09-04 10:23   ` Devang Patel
  1 sibling, 0 replies; 8+ messages in thread
From: Devang Patel @ 2002-09-04 10:23 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc


On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote:

> Do we really care about locality for many other things anyway?

I do not have any answer. But to get an answer we need to actually
know/measure memory usage pattern. I think, before using different
memory allocation and memory management schemes we need to
know what are the requirements. And  using TREE_CODE function
is one of many experiments in an attempt to answer that question.

-Devang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-04 10:19   ` Devang Patel
@ 2002-09-04 16:04     ` Daniel Berlin
  2002-09-04 19:51       ` Devang Patel
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Berlin @ 2002-09-04 16:04 UTC (permalink / raw)
  To: Devang Patel; +Cc: gcc


On Wednesday, September 4, 2002, at 01:19  PM, Devang Patel wrote:

>
> On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote:
>
>> On Tue, 3 Sep 2002, Devang Patel wrote:
>>
>>> Hi All,
>>>
>>> Recently there was a discussion about faster compile time. And
>>> memory usage and/or allocation is considered one probable
>>> culprit behind the slow compiler.
>>>
>>> To understand actual memory usage pattern, I instrumented GCC
>>> by replacing TREE_CODE macro with function TREE_CODE_read().
>>> I just wanted to see what is the usage pattern and how bad/good is 
>>> it.
>>>
>>> I collected profiled data for following one line program.
>>>
>>> int foo() { return 1;}
>>>
>>> Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read()
>>> 37572 times! I was expecting that number to be in couple of thousands
>>> range but 37k seems high to me.
>>>
>>> I think, such a high number of indirect memory references puts
>>> high  pressure on VM and GCC's memory manager to maintain locality.
>>
>> Which it doesn't.
>> Can't we attack this problem directly?
>> By maybe using object based bins rather than size based ones, at 
>> least for
>> trees and RTL?
>
> Sure, we can try using different allocation schemes to achieve better 
> compile
> time performance. But this approach is like -- earn more money and 
> allocate
> funds in better way to meet the budget. I am thinking in terms, can we 
> reduce
> expenditure ?

By the by, did you mark the TREE_CODE_read function as const/pure (i'm 
not sure tree_code's aren't modified in place, if they are, it's both, 
if they aren't, it's at least one of them), so that it accurately 
simulates the macro in terms of actual accesses?

>  I think, we need to work in both direction to achieve better
> compile time speedup.
>
> -Devang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TREE_CODE mania
  2002-09-04 16:04     ` Daniel Berlin
@ 2002-09-04 19:51       ` Devang Patel
  0 siblings, 0 replies; 8+ messages in thread
From: Devang Patel @ 2002-09-04 19:51 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]


On Wednesday, September 4, 2002, at 04:04 PM, Daniel Berlin wrote:

>>
>> Sure, we can try using different allocation schemes to achieve better 
>> compile
>> time performance. But this approach is like -- earn more money and 
>> allocate
>> funds in better way to meet the budget. I am thinking in terms, can 
>> we reduce
>> expenditure ?
>
> By the by, did you mark the TREE_CODE_read function as const/pure (i'm 
> not sure tree_code's aren't modified in place, if they are, it's both, 
> if they aren't, it's at least one of them), so that it accurately 
> simulates the macro in terms of actual accesses?
>

Well, TREE_CODE_read() name is misleading. It is recording read as well 
as write accesses.

-Devang

[-- Attachment #2: Type: text/enriched, Size: 726 bytes --]



On Wednesday, September 4, 2002, at 04:04 PM, Daniel Berlin wrote:


<excerpt><excerpt>

Sure, we can try using different allocation schemes to achieve better
compile

time performance. But this approach is like -- earn more money and
allocate

funds in better way to meet the budget. I am thinking in terms, can we
reduce

expenditure ?

</excerpt>

By the by, did you mark the TREE_CODE_read function as const/pure (i'm
not sure tree_code's aren't modified in place, if they are, it's both,
if they aren't, it's at least one of them), so that it accurately
simulates the macro in terms of actual accesses?


</excerpt>

Well, TREE_CODE_read() name is misleading. It is recording read as
well as write accesses.


-Devang


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-09-05  2:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-03 20:18 TREE_CODE mania Devang Patel
2002-09-03 20:50 ` Daniel Berlin
2002-09-04 10:19   ` Devang Patel
2002-09-04 16:04     ` Daniel Berlin
2002-09-04 19:51       ` Devang Patel
2002-09-04 10:23   ` Devang Patel
2002-09-04  1:51 ` Neil Booth
2002-09-04 10:15   ` Devang Patel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).