From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4366 invoked by alias); 6 Oct 2002 22:26:00 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 4352 invoked by uid 71); 6 Oct 2002 22:26:00 -0000 Date: Sun, 06 Oct 2002 15:26:00 -0000 Message-ID: <20021006222600.4351.qmail@sources.redhat.com> To: nobody@gcc.gnu.org Cc: gcc-prs@gcc.gnu.org, From: Daniel Berlin Subject: Re: optimization/8092: cross-jump triggers too often Reply-To: Daniel Berlin X-SW-Source: 2002-10/txt/msg00225.txt.bz2 List-Id: The following reply was made to PR optimization/8092; it has been noted by GNATS. From: Daniel Berlin To: Bernd Paysan Cc: anton@mips.complang.tuwien.ac.at, Anton Ertl , , , , , Subject: Re: optimization/8092: cross-jump triggers too often Date: Sun, 6 Oct 2002 18:19:44 -0400 (EDT) On Sun, 6 Oct 2002, Bernd Paysan wrote: > > Compile with -O2 -fno-cross-jump to see how the initializing code for "global > expressions" like stdout/stderr gets cluttered all over the place. The dump > obtained with -dG shows which expression is copied: > > GCSE pass 1 > > SET hash table (11 buckets, 3 entries) > Index 0 (hash value 5) > (set (reg/v:SI 60) > (const_int 12 [0xc])) > Index 1 (hash value 6) > (set (reg/v:SI 61) > (const_int 123 [0x7b])) > Index 2 (hash value 7) > (set (reg/v:SI 62) > (const_int 1234 [0x4d2])) > > > LDST list: > Pattern ( 0): (mem/f:SI (symbol_ref:SI ("stderr")) [5 stderr+0 S4 A32]) > Loads : (insn_list 42 (nil)) > Stores : (nil) > > Pattern ( 0): (mem/f:SI (symbol_ref:SI ("stdout")) [5 stdout+0 S4 A32]) > Loads : (insn_list 28 (nil)) > Stores : (nil) > > Pattern ( 0): (mem/f:SI (symbol_ref:SI ("stdin")) [5 stdin+0 S4 A32]) > Loads : (insn_list 14 (nil)) > Stores : (nil) > > > Expression hash table (15 buckets, 5 entries) > Index 0 (hash value 7) > (mem/f:SI (symbol_ref:SI ("stdin")) [5 stdin+0 S4 A32]) > Index 1 (hash value 2) > (mem:SI (reg/v/f:SI 59) [4 S4 A32]) > Index 2 (hash value 12) > (plus:SI (reg/v/f:SI 59) > (const_int 4 [0x4])) > Index 3 (hash value 7) > (mem/f:SI (symbol_ref:SI ("stdout")) [5 stdout+0 S4 A32]) > Index 4 (hash value 8) > (mem/f:SI (symbol_ref:SI ("stderr")) [5 stderr+0 S4 A32]) > > PRE: redundant insn 14 (expression 0) in bb 1, reaching reg is 77 > PRE: redundant insn 28 (expression 3) in bb 2, reaching reg is 78 > PRE: redundant insn 42 (expression 4) in bb 3, reaching reg is 79 > PRE/HOIST: edge (0,1), copy expression 0 > PRE/HOIST: end of bb 1, insn 139, copying expression 3 to reg 78 > PRE/HOIST: edge (1,2), copy expression 3 > PRE/HOIST: end of bb 1, insn 142, copying expression 4 to reg 79 > PRE/HOIST: edge (1,3), copy expression 4 > PRE/HOIST: end of bb 2, insn 145, copying expression 4 to reg 79 > PRE/HOIST: edge (2,3), copy expression 4 > PRE/HOIST: end of bb 3, insn 148, copying expression 3 to reg 78 > PRE/HOIST: edge (3,2), copy expression 3 > PRE/HOIST: end of bb 4, insn 151, copying expression 3 to reg 78 > PRE/HOIST: edge (4,2), copy expression 3 > PRE/HOIST: end of bb 4, insn 154, copying expression 4 to reg 79 > PRE/HOIST: edge (4,3), copy expression 4 > PRE/HOIST: end of bb 5, insn 157, copying expression 3 to reg 78 > PRE/HOIST: edge (5,2), copy expression 3 > PRE/HOIST: end of bb 5, insn 160, copying expression 4 to reg 79 > PRE/HOIST: edge (5,3), copy expression 4 > PRE/HOIST: end of bb 6, insn 163, copying expression 3 to reg 78 > PRE/HOIST: edge (6,2), copy expression 3 > PRE/HOIST: end of bb 6, insn 166, copying expression 4 to reg 79 > PRE/HOIST: edge (6,3), copy expression 4 > > PRE GCSE of foo, pass 1: 4072 bytes needed, 3 substs, 21 insns created > > Apparently, GCSE copies constant expressions that seem to be "sufficiently > complicated" all over the place. Although, in this case, the target > architecture implements (mem/f (symbol_ref xxx)) with the same instruction as > (const_int xxx), which is not copied. If I "poison" symbol_ref expressions in > gcse.c (so that they aren't considdered), I still get a few redundant > expressions spread all over the place, e.g. increments of the stack pointer > (which happen fairly often, but doing so in advance is stupid). > > As far as I understood the code, gcse.c copies "partially redundant" > expressions throughout the entire CFG ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Not the entire CFG, only where it is necessary to make them fully available, so it can remove a later use. In your case, all blocks have abnormal edges to all other blocks. This requires it to end up inserting everywhere because all blocks lead to all other blocks. It's still optimal, given this CFG. You just "know" more than the compiler, so you think it's wrong. > to "make them fully redundant". That's > what's happening here. How can an expression that occurs only once be > "redundant"? It's not, it appears load motion is being stupid, and not checking if a load occurs more than once. > And why doesn't gcse.c try to remove copied expressions > afterwards after it finds out that the attempt to improve the code has > failed? Because it's lifetime optimal, so it can't possibly have made the expression live longer than it used to, and thus, it has always improved the code (in terms of number of calculations). Now, if you want it *cost* optimal, that's a slightly different algorithm. >And why is there no cost calculation? If you copy an expression from > one block to m blocks, the costs are multiplied by m. Not necessarily. You assume the code is completely straight line. GCSE should never make an expression be calculated *more times* than it used to be (probably some corner case i'm forgetting about). That's the whole point. GCSE makes the number of calculations along a given path optimal in number, not in cost. Cost optimal code motion often makes the lifetime of an expression way too long. I suggest you read papers on Lazy Code Motion, and cost optimal code motion. > >