From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23969 invoked by alias); 9 May 2003 23:23:47 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 23880 invoked from network); 9 May 2003 23:23:46 -0000 Received: from unknown (HELO nikam.ms.mff.cuni.cz) (195.113.18.106) by sources.redhat.com with SMTP; 9 May 2003 23:23:46 -0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id E3B6F4E26D; Sat, 10 May 2003 01:23:47 +0200 (CEST) Date: Fri, 09 May 2003 23:23:00 -0000 From: Jan Hubicka To: Richard Kenner Cc: jh@suse.cz, gcc@gcc.gnu.org Subject: Re: An issue for the SC: horrible documentation quality of GCC Message-ID: <20030509232347.GL1924@kam.mff.cuni.cz> References: <10305092317.AA20314@vlsi1.ultra.nyu.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <10305092317.AA20314@vlsi1.ultra.nyu.edu> User-Agent: Mutt/1.3.28i X-SW-Source: 2003-05/txt/msg00973.txt.bz2 > They are tied in allocator, the point is to tie them earlier, so > expression > (plus (reg a) (reg c)) > will look equivalent to > (plus (reg a) (reg a)) > > You don't need to do a replacement: you just record in a table, like > cse.c does, what all the equivalences are. Then you will end up with never ending dataflow problems in the global pass. The GCSE in the simplest definition is already too complicated for us to be implemented correctly (we are on it since 97 and still it is inferrior to what is done by other compilers). Adding more complexity, like CSE has would make it more crazier. CSE does a lot and is powerfull pass that does approximately what is accomplished by multiple passes as described in literature. The problem is that we don't know how to make it global and/or faster and we know that it is too slow right now. > > I think the confusion here is in trying to do two distinct things > simultaneously: one is to track value and the other is to actually make > changes in insns. Perhaps doing GCSE purely on the reg_equal notes would accomplish what you seem to be shooting for (we would have one canonical interpretation of the value as described above), but I am not sure if it is step in right direction (getting us from the CSE pass performance trap we are in). > > We all agree that it is important to have a good table of all the > equivalent values. But the question of what substitutions in the insn > stream should be done is a completely independent question. > We do have heuristics to deal with this but they don't work well either > and I believe that they work worse than the copy propagation mechanizm > (at least that is how do I explain the speedups I measured after > introducing special purpose copy propagation pass) > > Because you are using that pass to not only modify insns, but also track > values with gcse. If you separate those two functions, you may find that > it's better still. Perhaps I can do the test with disabled GCSE (so I won't take advantage of that). I would guess that it would be still a win. > > In the case we can use it each time, the constant propagation as > implemented will do that (replace every occurence of the pseudo by > constant). I tought you are speaking about situation where it is not > good idea to replace the reigsters by constant? > > That's in the case of a reg-reg assignment, not a use of a constant > within an expression. I am still in the dark. Perhaps it is time to go sleep first and then continue in arguing. Could you give me some example of code what you would like to get transfromed? > I man extended basic block like: > > A > / \ > B C > > That's not what cse.c means by an extended basis block. CSE will actually follow the jumps, so it should track it. It is just pure if-then-else (I didn't draw the join edge) > > Now CSE is deciding whether NOP will be in A or B. A is put before B > (this is usual) and B constains some other instructions. Now it will > place the NOP in B and keep copy in A and it can be the case that A is > executed 1000000 times while B not at all. > > You're forgetting that the case in question had *two* branches. If the inner > isn't executed, you're replacing a branch with a copy, which is much faster > on modern machines. How? Assuming that B has some other instructions, I will end up with both, the copy and the branch. Or do I miss something else? Honza