From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21755 invoked by alias); 27 Feb 2004 22:49:33 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 21748 invoked from network); 27 Feb 2004 22:49:32 -0000 Received: from unknown (HELO mail-out4.apple.com) (17.254.13.23) by sources.redhat.com with SMTP; 27 Feb 2004 22:49:32 -0000 Received: from mailgate1.apple.com (a17-128-100-225.apple.com [17.128.100.225]) by mail-out4.apple.com (8.12.11/8.12.11) with ESMTP id i1RMohb6000498 for ; Fri, 27 Feb 2004 14:50:44 -0800 (PST) Received: from relay1.apple.com (relay1.apple.com) by mailgate1.apple.com (Content Technologies SMTPRS 4.3.6) with ESMTP id for ; Fri, 27 Feb 2004 14:49:31 -0800 Received: from [17.201.21.152] (bowdro.apple.com [17.201.21.152]) by relay1.apple.com (8.12.11/8.12.11) with ESMTP id i1RMnEFu029040; Fri, 27 Feb 2004 22:49:15 GMT Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <2B448A8E-6977-11D8-B63F-000A957D89DA@apple.com> Content-Transfer-Encoding: 7bit Cc: Robert Bowdidge From: Robert Bowdidge Subject: cutting the size of CONST_DECLs Date: Sat, 28 Feb 2004 00:21:00 -0000 To: gcc@gcc.gnu.org X-SW-Source: 2004-02/txt/msg01469.txt.bz2 Hi, all, At Apple, the compiler's memory use (and its effect on compile speed) is our biggest concern. One specific worry is declarations. On common Mac programs which include headers for our GUI libraries, declarations fill a significant amount of garbage-collected memory. With -fmem-report, we usually see about 35% of memory filled with declarations. CONST_DECLs are an obvious offender in our programs; many of the Mac headers make heavy use of enums to define symbolic constants. In some programs, CONST_DECLS are the third most common kind of declaration after FUNCTION_DECL and PARM_DECL, and can be responsible for 30% of declarations and 10% of gc'd memory. I've tried one change to cut memory use: creating a new, smaller structure for CONST_DECLs only, and having all other declarations inherit this structure and extend it with the rest of the fields. On a couple typical applications, this change cuts memory use by 4-6% and compilation time by 0-3%. I'm interested in hearing whether others believe such a data structure change would be appropriate in gcc, and whether similar tricks could be done in other parts of gcc. The changes would look something like the following: struct tree_super_decl { struct tree_common common; /* fields needed by CONST_DECL only */ ... } struct tree_decl { struct tree_super_decl super_decl; /* Fields needed by all other declarations */ ... } Parse tree nodes using the tree_super_decl as their structure have a different size from other declarations, so the nodes get a new tree code class ('D' rather than 'd') Macros accessing fields in tree_super_decl now just need to be changed slightly: #define DECL_NAME(x) (DECL_CHECK(x)->decl.name) becomes #define DECL_NAME(x) (SUPER_DECL_CHECK(x)->super_decl.name) In addition, DECL_P() would need to be changed to recognize both 'D' and 'd' as classes. Case statements switching on the tree code class would need the new class added. Similar approaches have been proposed in the past; the difference is that we're focusing on specific kinds of declarations, and we've seen measurable differences in real code. Dan Nicolaescu proposed a similar scheme in February 2003 to take fields only used by FUNCTION_DECL out of the tree_decl structure: http://gcc.gnu.org/ml/gcc/2003-02/msg00587.html Carlo Wood also suggested similar ideas for tree_expr, though he proposed extending tree_expr for CALL_EXPR nodes only: http://gcc.gnu.org/ml/gcc/2003-10/msg00170.html Advantages: * Cuts the size of CONST_DECL. On 3.3, I've been able to get CONST_DECL down to 72 bytes as compared to 116 bytes. More shrinking may be possible by figuring out which field accesses on CONST_DECLs are accidental or pointless. * The same technique could be used to shrink other declarations: PARM_DECL, TYPE_DECL, and RESULT_DECL all appear to use a subset of the tree_decl fields. All three can be shrunk to 100 bytes with no changes. * The change doesn't change tree.h much, and appears only to require minor fixes elsewhere in the gcc sources: some changes in the tree allocation code, adding additional case statements or if clauses when the tree class code is checked explicitly for 'd', changing the DECL_P() to detect both kinds of declaration structures. Disadvantages: * All the bit field and flag variables should be placed in the "tree_super_decl" field to avoid structure size growth due to padding. * Deciding which fields should be in the "small" declaration requires a combination of knowledge of the gcc code and runtime tracing to figure out which fields in a declaration are used for constants. I chose the functions by instrumenting the accessor macros to record references to the different variables over several runs, then did more compiles with "--enable-checking" on to catch the last few stragglers. There's always a chance that some bit of code deep in the compiler accesses a small declaration in the large sense; the only way to catch these is with --enable-checking. * Similarly, bug fixes made after changes to the data structure could add cases where fields not available in a particular declaration; again, these will only get caught by runtime checks. This may be a good thing; once folks are worrying much more about what sort of declaration they may reference, it may encourage more care about which fields are truly needed for each type of declaration. I'm not sure there's any good way to statically identify the spots where constant declarations may be accessed. Although we could use something like the Stanford MC checker to propagage potential types through function calls and if clauses; I'd guess we'd still get bitten by cases where we couldn't prove a declaration might never be a constant declaration. So far, tracking down the last few stragglers by turning on --enable-checking and passing lots of code through the compiler seemed pretty quick. I'd feel even more confident by building all of Apple's sources with an instrumented compiler; the only remaining issues would be port-specific field access patterns. Comments? Robert