* TREE_CODE mania @ 2002-09-03 20:18 Devang Patel 2002-09-03 20:50 ` Daniel Berlin 2002-09-04 1:51 ` Neil Booth 0 siblings, 2 replies; 8+ messages in thread From: Devang Patel @ 2002-09-03 20:18 UTC (permalink / raw) To: gcc Hi All, Recently there was a discussion about faster compile time. And memory usage and/or allocation is considered one probable culprit behind the slow compiler. To understand actual memory usage pattern, I instrumented GCC by replacing TREE_CODE macro with function TREE_CODE_read(). I just wanted to see what is the usage pattern and how bad/good is it. I collected profiled data for following one line program. int foo() { return 1;} Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read() 37572 times! I was expecting that number to be in couple of thousands range but 37k seems high to me. I think, such a high number of indirect memory references puts high pressure on VM and GCC's memory manager to maintain locality. May be we can do simple code reorganizations using few extra local variables to reduce this pressure. Or may be I am unnecessarily surprised by this number. Here is relevant gprof data... 0.00 0.00 694/37572 _convert <cycle 1> [219] 0.00 0.00 826/37572 _pushdecl [211] 0.00 0.00 927/37572 _int_const_binop [23] 0.00 0.00 1019/37572 _round_type_align [171] 0.00 0.00 1027/37572 _darwin_encode_section_info [194] 0.00 0.00 1155/37572 _layout_type <cycle 3> [10] 0.00 0.00 1306/37572 _tree_size [122] 0.00 0.00 1499/37572 _finalize_type_size [28] 0.00 0.00 1674/37572 _integer_zerop [156] 0.00 0.00 2286/37572 _fold <cycle 1> [229] 0.00 0.00 2304/37572 _make_decl_rtl [16] 0.00 0.00 3193/37572 _integer_onep [135] 0.00 0.00 4488/37572 _size_binop [22] 0.00 0.00 5258/37572 _is_attribute_p [101] 0.00 0.00 6537/37572 _force_fit_type [114] [91] 0.0 0.00 0.00 37572 _TREE_CODE_read [91] Now in this data size_binop looks interesting. It is expensive and according to gprof it is consuming 16% of total compile time. [22] 16.7 0.00 0.01 1496 _size_binop [22] -Devang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-03 20:18 TREE_CODE mania Devang Patel @ 2002-09-03 20:50 ` Daniel Berlin 2002-09-04 10:19 ` Devang Patel 2002-09-04 10:23 ` Devang Patel 2002-09-04 1:51 ` Neil Booth 1 sibling, 2 replies; 8+ messages in thread From: Daniel Berlin @ 2002-09-03 20:50 UTC (permalink / raw) To: Devang Patel; +Cc: gcc On Tue, 3 Sep 2002, Devang Patel wrote: > Hi All, > > Recently there was a discussion about faster compile time. And > memory usage and/or allocation is considered one probable > culprit behind the slow compiler. > > To understand actual memory usage pattern, I instrumented GCC > by replacing TREE_CODE macro with function TREE_CODE_read(). > I just wanted to see what is the usage pattern and how bad/good is it. > > I collected profiled data for following one line program. > > int foo() { return 1;} > > Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read() > 37572 times! I was expecting that number to be in couple of thousands > range but 37k seems high to me. > > I think, such a high number of indirect memory references puts > high pressure on VM and GCC's memory manager to maintain locality. Which it doesn't. Can't we attack this problem directly? By maybe using object based bins rather than size based ones, at least for trees and RTL? ggc_alloc_rtx and ggc_alloc_tree are just defined to call ggc_alloc, but we could change this to do that, no? Maybe these types of objects are special enough that even though they may have different sizes (IE different RTL objects have different sizes), they should be in the same bins anyway. Then we'd have the locality for our most used objects. Do we really care about locality for many other things anyway? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-03 20:50 ` Daniel Berlin @ 2002-09-04 10:19 ` Devang Patel 2002-09-04 16:04 ` Daniel Berlin 2002-09-04 10:23 ` Devang Patel 1 sibling, 1 reply; 8+ messages in thread From: Devang Patel @ 2002-09-04 10:19 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc [-- Attachment #1: Type: text/plain, Size: 1414 bytes --] On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote: > On Tue, 3 Sep 2002, Devang Patel wrote: > >> Hi All, >> >> Recently there was a discussion about faster compile time. And >> memory usage and/or allocation is considered one probable >> culprit behind the slow compiler. >> >> To understand actual memory usage pattern, I instrumented GCC >> by replacing TREE_CODE macro with function TREE_CODE_read(). >> I just wanted to see what is the usage pattern and how bad/good is it. >> >> I collected profiled data for following one line program. >> >> int foo() { return 1;} >> >> Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read() >> 37572 times! I was expecting that number to be in couple of thousands >> range but 37k seems high to me. >> >> I think, such a high number of indirect memory references puts >> high pressure on VM and GCC's memory manager to maintain locality. > > Which it doesn't. > Can't we attack this problem directly? > By maybe using object based bins rather than size based ones, at least > for > trees and RTL? Sure, we can try using different allocation schemes to achieve better compile time performance. But this approach is like -- earn more money and allocate funds in better way to meet the budget. I am thinking in terms, can we reduce expenditure ? I think, we need to work in both direction to achieve better compile time speedup. -Devang [-- Attachment #2: Type: text/enriched, Size: 1410 bytes --] On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote: <excerpt>On Tue, 3 Sep 2002, Devang Patel wrote: <excerpt>Hi All, Recently there was a discussion about faster compile time. And memory usage and/or allocation is considered one probable culprit behind the slow compiler. To understand actual memory usage pattern, I instrumented GCC by replacing TREE_CODE macro with function TREE_CODE_read(). I just wanted to see what is the usage pattern and how bad/good is it. I collected profiled data for following one line program. int foo() { return 1;} Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read() 37572 times! I was expecting that number to be in couple of thousands range but 37k seems high to me. I think, such a high number of indirect memory references puts high pressure on VM and GCC's memory manager to maintain locality. </excerpt> Which it doesn't. Can't we attack this problem directly? By maybe using object based bins rather than size based ones, at least for trees and RTL? </excerpt> Sure, we can try using different allocation schemes to achieve better compile time performance. But this approach is like -- earn more money and allocate funds in better way to meet the budget. I am thinking in terms, can we reduce expenditure ? I think, we need to work in both direction to achieve better compile time speedup. -Devang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-04 10:19 ` Devang Patel @ 2002-09-04 16:04 ` Daniel Berlin 2002-09-04 19:51 ` Devang Patel 0 siblings, 1 reply; 8+ messages in thread From: Daniel Berlin @ 2002-09-04 16:04 UTC (permalink / raw) To: Devang Patel; +Cc: gcc On Wednesday, September 4, 2002, at 01:19 PM, Devang Patel wrote: > > On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote: > >> On Tue, 3 Sep 2002, Devang Patel wrote: >> >>> Hi All, >>> >>> Recently there was a discussion about faster compile time. And >>> memory usage and/or allocation is considered one probable >>> culprit behind the slow compiler. >>> >>> To understand actual memory usage pattern, I instrumented GCC >>> by replacing TREE_CODE macro with function TREE_CODE_read(). >>> I just wanted to see what is the usage pattern and how bad/good is >>> it. >>> >>> I collected profiled data for following one line program. >>> >>> int foo() { return 1;} >>> >>> Now, gprof tells me that to compile this, cc1 calls TREE_CODE_read() >>> 37572 times! I was expecting that number to be in couple of thousands >>> range but 37k seems high to me. >>> >>> I think, such a high number of indirect memory references puts >>> high pressure on VM and GCC's memory manager to maintain locality. >> >> Which it doesn't. >> Can't we attack this problem directly? >> By maybe using object based bins rather than size based ones, at >> least for >> trees and RTL? > > Sure, we can try using different allocation schemes to achieve better > compile > time performance. But this approach is like -- earn more money and > allocate > funds in better way to meet the budget. I am thinking in terms, can we > reduce > expenditure ? By the by, did you mark the TREE_CODE_read function as const/pure (i'm not sure tree_code's aren't modified in place, if they are, it's both, if they aren't, it's at least one of them), so that it accurately simulates the macro in terms of actual accesses? > I think, we need to work in both direction to achieve better > compile time speedup. > > -Devang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-04 16:04 ` Daniel Berlin @ 2002-09-04 19:51 ` Devang Patel 0 siblings, 0 replies; 8+ messages in thread From: Devang Patel @ 2002-09-04 19:51 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc [-- Attachment #1: Type: text/plain, Size: 716 bytes --] On Wednesday, September 4, 2002, at 04:04 PM, Daniel Berlin wrote: >> >> Sure, we can try using different allocation schemes to achieve better >> compile >> time performance. But this approach is like -- earn more money and >> allocate >> funds in better way to meet the budget. I am thinking in terms, can >> we reduce >> expenditure ? > > By the by, did you mark the TREE_CODE_read function as const/pure (i'm > not sure tree_code's aren't modified in place, if they are, it's both, > if they aren't, it's at least one of them), so that it accurately > simulates the macro in terms of actual accesses? > Well, TREE_CODE_read() name is misleading. It is recording read as well as write accesses. -Devang [-- Attachment #2: Type: text/enriched, Size: 726 bytes --] On Wednesday, September 4, 2002, at 04:04 PM, Daniel Berlin wrote: <excerpt><excerpt> Sure, we can try using different allocation schemes to achieve better compile time performance. But this approach is like -- earn more money and allocate funds in better way to meet the budget. I am thinking in terms, can we reduce expenditure ? </excerpt> By the by, did you mark the TREE_CODE_read function as const/pure (i'm not sure tree_code's aren't modified in place, if they are, it's both, if they aren't, it's at least one of them), so that it accurately simulates the macro in terms of actual accesses? </excerpt> Well, TREE_CODE_read() name is misleading. It is recording read as well as write accesses. -Devang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-03 20:50 ` Daniel Berlin 2002-09-04 10:19 ` Devang Patel @ 2002-09-04 10:23 ` Devang Patel 1 sibling, 0 replies; 8+ messages in thread From: Devang Patel @ 2002-09-04 10:23 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On Tuesday, September 3, 2002, at 08:51 PM, Daniel Berlin wrote: > Do we really care about locality for many other things anyway? I do not have any answer. But to get an answer we need to actually know/measure memory usage pattern. I think, before using different memory allocation and memory management schemes we need to know what are the requirements. And using TREE_CODE function is one of many experiments in an attempt to answer that question. -Devang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-03 20:18 TREE_CODE mania Devang Patel 2002-09-03 20:50 ` Daniel Berlin @ 2002-09-04 1:51 ` Neil Booth 2002-09-04 10:15 ` Devang Patel 1 sibling, 1 reply; 8+ messages in thread From: Neil Booth @ 2002-09-04 1:51 UTC (permalink / raw) To: Devang Patel; +Cc: gcc Devang Patel wrote:- > Now in this data size_binop looks interesting. It is expensive and > according > to gprof it is consuming 16% of total compile time. > > [22] 16.7 0.00 0.01 1496 _size_binop [22] For really short runs, isn't the time bucketing fairly useless? Or do you have reason to believe it's accurate in your case? Neil. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TREE_CODE mania 2002-09-04 1:51 ` Neil Booth @ 2002-09-04 10:15 ` Devang Patel 0 siblings, 0 replies; 8+ messages in thread From: Devang Patel @ 2002-09-04 10:15 UTC (permalink / raw) To: Neil Booth; +Cc: gcc On Wednesday, September 4, 2002, at 01:50 AM, Neil Booth wrote: > Devang Patel wrote:- > >> Now in this data size_binop looks interesting. It is expensive and >> according >> to gprof it is consuming 16% of total compile time. >> >> [22] 16.7 0.00 0.01 1496 _size_binop [22] > > For really short runs, isn't the time bucketing fairly useless? > Or do you have reason to believe it's accurate in your case? I collected data by compiling large source[1] and size_binop drops to 3.7%. It counted for 300095 TREE_CODE references out of total 5391019. But in this new example, TREE_CODE itself costs 3.2% -Devang [1] For this new example, line count for preprocessed source collected using -E is 81434. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2002-09-05 2:51 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-09-03 20:18 TREE_CODE mania Devang Patel 2002-09-03 20:50 ` Daniel Berlin 2002-09-04 10:19 ` Devang Patel 2002-09-04 16:04 ` Daniel Berlin 2002-09-04 19:51 ` Devang Patel 2002-09-04 10:23 ` Devang Patel 2002-09-04 1:51 ` Neil Booth 2002-09-04 10:15 ` Devang Patel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).