From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-409408-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 57210 invoked by alias); 6 Oct 2015 19:18:41 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 57198 invoked by uid 89); 6 Oct 2015 19:18:40 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL,BAYES_50,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 06 Oct 2015 19:18:38 +0000
Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])	by mx1.redhat.com (Postfix) with ESMTPS id AF625C0B64BE;	Tue,  6 Oct 2015 19:18:37 +0000 (UTC)
Received: from [10.10.57.75] (vpn-57-75.rdu2.redhat.com [10.10.57.75])	by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t96JIaxa016220;	Tue, 6 Oct 2015 15:18:36 -0400
Subject: Re: [patch 4/3] Header file reduction - Tools for contrib
To: Bernd Schmidt <bschmidt@redhat.com>,        Bernd Schmidt <bernds_cb1@t-online.de>,        gcc-patches <gcc-patches@gcc.gnu.org>
References: <560DEA79.8050709@redhat.com> <56127AA4.9090707@redhat.com> <5612E939.9000701@redhat.com> <5613A1F9.3030407@codesourcery.com> <5613B846.2090107@redhat.com>
From: Andrew MacLeod <amacleod@redhat.com>
Message-ID: <56141E8C.6060200@redhat.com>
Date: Tue, 06 Oct 2015 19:18:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <5613B846.2090107@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-10/txt/msg00629.txt.bz2

On 10/06/2015 08:02 AM, Bernd Schmidt wrote:
>
>
> This sounds like the intention is to move recognized core files (I 
> assume these are the ones in the "order" array in the tool) to the 
> start, and leaving everything alone? I was a bit confused about this 
> at first; I see for example "timevar.h" moving around without being 
> present in the list, but it looks like it gets added implicitly 
> through being included by df.h. (Incidentally, this looks like another 
> case like the obstack one - a file using timevars should include 
> timevar.h IMO, even if it also includes df.h).
>

Ordering the includes is perhaps more complex than you realize. It more 
complex than I realized when I first started it. it took a long and very 
frustrating period to get it working properly.

There are implicit dependencies between some include files.  The primary 
ordering list is to provide a canonical order for key files so that 
those dependencies are automatically taken care of.  Until now we've 
managed it by hand.    The problem is that the dependencies are not 
necessary always from the main header file.. they may come from one of 
the headers that were included in it. There are lots of dependencies on 
symtab.h for instance, which comes from tree.h   Some other source files 
don't need tree.h, but they do need symtab.h.   If symtab.h isn't in the 
ordering list and the header which uses it is (like cgraph.h) , the tool 
would move cgraph.h above symtab.h and the result doesn't work.

The solution is to take that initial canonical list, and fully expand it 
to include everything that those headers include. This gives a linear 
canonical list of close to 100 files.    It means things like timevar.h 
(which is included by df.h)  are in this "ordering":
<...>
regset.h
alloc-pool.h
timevar.h
df.h
tm_p.h
gimple-iterator
<...>

A source file which does not include df.h  but includes timevar.h muist 
keep it in this same relative ordering, or some other header from the 
ordering list which uses timevar.h may no longer compile. (timevar.h 
would end up after everything in the canonical list instead of in fromt 
of the other file)

This means the any of those 100 headers files which occur in a source 
file should occur in this order.  The original version of the tool tried 
to spell out this exact order, but I realized that was not maintainable 
as headers change, and it was actually far simply to specify the core 
ones In the tool, and let it do the expansion based on what is in the 
current tree.

This also means that taken as a snapshot, you are going to see things 
like timevar.h move around in apparently random fashion... but it is not 
random.  It will be in front of any and all headers listed after it in 
the ordering.  Any headers which don't appear in the canonical list will 
simply retain their current order in the source file, but AFTER all the 
ones in the canonical list.

This also made it fairly easy to remove redundant includes that have 
been seen already by including some other header... I just build the 
list of headers that have been seen already

There are a couple of specialty cases that are handled..
The 'exclude processing' list are headers which shouldn't be expanded 
like above. They can cause irreconcilable problems when expanded , 
especially the front end file files.  They do need to be ordered since 
diagnostics require them to be included first in order to satisfy the 
requirement that   GCC_DIAG_STYLE  be defined before diagnostic.h is 
included.  Plus most of them include tree.h and/or diagnostic.h 
themselves, but we don't want them to impact  the ordering for the 
backend files.

That list puts those core files in an appropriate place canoncailly, but 
doesn't expand into the file because the order we get for the different 
front ends would be different .  Finally diagnostic*.h and friends are 
removed from the list and put at the end to ensure eveything that might 
be needed by them is available.  Again, the front end files would have 
made it much earlier than we wanted for the backend files.

I also disagree with the assertion that " a file using timevars should 
include timevar.h IMO, even if it also includes df.h"  It could, but I 
don't see the value, and I doubt anyone really cares much.    If someone 
ever removes the only thing that does bring timevar.h, you simply add it 
then. That is just part of updating headers.   I'm sure before I run 
this patch not every file which uses timevar.h actually physically 
includes it.  This process will set us to a somewhat consistent state.

  Its simple enough to remove the ones that are redundant in an 
automated way, and very difficult to determine whether they not 
required, but contain content that is used.

The fully expanded canonical list looks something like this:

safe-ctype.h
filenames.h
libiberty.h
hwint.h
system.h
insn-modes.h
machmode.h
signop.h
wide-int.h
double-int.h
real.h
fixed-value.h
statistics.h
gtype-desc.h
ggc.h
vec.h
hashtab.h
inchash.h
mem-stats-traits.h
hash-traits.h
hash-map-traits.h
mem-stats.h
hash-map.h
hash-table.h
hash-set.h
line-map.h
input.h
is-a.h
memory-block.h
coretypes.h
options.h
tm.h
function.h
obstack.h
bitmap.h
sbitmap.h
basic-block.h
dominance.h
cfg.h
backend.h
insn-codes.h
hard-reg-set.h
target.h
genrtl.h
rtl.h
c-target.h
c-target-def.h
symtab.h
tree-core.h
tree-check.h
tree.h
cp-tree.h
c-common.h
c-tree.h
gfortran.h
tree-ssa-alias.h
gimple-expr.h
gimple.h
predict.h
cfghooks.h
regset.h
alloc-pool.h
timevar.hdf.h
tm_p.h
gimple-iterators.h
stringpool.h
tree-ssa-operands.h
gimple-ssa.h
tree-ssanames.h
tree-phinodes.h
ssa-iterators.h
ssa.h
expmed.h
insn-opinit.h
optabs-query.h
optabs-libfuncs.h
insn-config.h
optabs.h
regs.h
emit-rtl.h
ira.h
recog.h
ira-int.h
streamer-hooks.h
plugin-api.h
gcov-iov.h
gcov-io.h
wide-int-print.h
pretty-print.h
bversion.h
lto-streamer.h
data-streamer.h
tree-streamer.h
gimple-streamer.h


>
> Intentionally commented out?
>
>> +
>> + def process_ii (filen):
>> +   return process_include_info (filen, False, False)
>> +
>> + def process_ii_macro (filen):
>> +   return process_include_info (filen, True, False)
>> +
>> + def process_ii_src (filen):
>> +   return process_include_info (filen, False, True)
>> +
>> + def process_ii_macro_src (filen):
>> +   return process_include_info (filen, True, True)
>> +
>> + def ii_base (iinfo):
>> +   return iinfo[0]
>> +
>> + def ii_path (iinfo):
>> +   return iinfo[1]
>> +
>> + def ii_include_list (iinfo):
>> +   return iinfo[2]
>> +
>> + def ii_include_list_cond (iinfo):
>> +   return iinfo[3]
>> +
>> + def ii_include_list_non_cond (iinfo):
>> +   l = ii_include_list (iinfo)
>> +   for n in ii_include_list_cond (iinfo):
>> +     l.remove (n)
>> +   return l
>> +
>> + def ii_macro_consume (iinfo):
>> +   return iinfo[4]
>> +
>> + def ii_macro_define (iinfo):
>> +   return iinfo[5]
>> +
>> + def ii_src (iinfo):
>> +   return iinfo[6]
>> +
>> + def ii_src_line (iinfo):
>> +   return iinfo[7]
>
> That's a lot of little functions with pretty much no clue for the 
> reader what's going on. It looks like maybe there's an array where a 
> struct should have been used?
>

there once was a large comment at the start of process_include_info 
describing the return value vactor... they simply access it.  Im not 
sure where it went.  I will find and put the big comment back in.

Andrew