From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1791 invoked by alias); 14 May 2014 03:46:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 1774 invoked by uid 89); 14 May 2014 03:46:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-vc0-f175.google.com Received: from mail-vc0-f175.google.com (HELO mail-vc0-f175.google.com) (209.85.220.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 14 May 2014 03:46:23 +0000 Received: by mail-vc0-f175.google.com with SMTP id hu19so1687039vcb.34 for ; Tue, 13 May 2014 20:46:21 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.52.228.134 with SMTP id si6mr697249vdc.5.1400039181039; Tue, 13 May 2014 20:46:21 -0700 (PDT) Received: by 10.220.11.5 with HTTP; Tue, 13 May 2014 20:46:20 -0700 (PDT) In-Reply-To: <20140514030448.GJ5162@bubble.grove.modra.org> References: <20140508014846.GA5162@bubble.grove.modra.org> <20140509024054.GE5162@bubble.grove.modra.org> <20140514030448.GJ5162@bubble.grove.modra.org> Date: Wed, 14 May 2014 03:46:00 -0000 Message-ID: Subject: Re: [RS6000] Fix PR61098, Poor code setting count register From: David Edelsohn To: GCC Patches , Alan Modra Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2014-05/txt/msg01033.txt.bz2 Alan, Danny may have re-organized the code, but I thought that it originally came from Tom Rixx, if not earlier. I seem to remember problems in the past with late creation of TOC entries for constants causing problems, so it was easier to fall back to materializing all integer constants inline. I don't remember the PRs, but I think there were issues with creating a TOC if the late constant were the only TOC reference, or maybe the issue was buying a stack frame to materialize the TOC/GOT for a late constant. And maximum 5 instruction sequence is not really bad relative to a load from the TOC (even with medium model / data in TOC). There are a lot of trade-offs with respect to I$ expansion versus the load hitting in the L1 D$. Alpha emit_set_const does limit the number of instructions, but that is a search for a more efficient sequence than the naive sequence. The Alpha splitters use alpha_split_const_mov(), which tries alpha_emit_set_const() for an efficient sequence and then falls back to alpha_emit_set_long_const() for a naive sequence. Alpha uses PLUS instead of IOR because of the way its ISA works. alpha_emit_set_long_const() always will materialize the constant and does not check for a maximum number of instructions. This is why it's comment says "fall back to straight forward decomposition". However, alpha_emit_set_long_const() and alpha_split_const_mov() can fail, presumably because emit_move_insn() fails, not because of reaching a maximum number of instructions. alpha_legitimate_constant_p() rejecs expensive constants early. Once the splitter is invoked, it always tries to materialize the constant, but the splitter apparently can fail for other reasons. I don't mind exploring the benefits of tighening up rs6000_legitimate_const(), but I'm not sure it's an obvious win, especially with the potential failure corner cases. However, I want to have a better understanding about the part of the patch that removes the FAIL path from the splitters. Thanks, David On Tue, May 13, 2014 at 11:04 PM, Alan Modra wrote: > On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: >> On Thu, May 8, 2014 at 10:40 PM, Alan Modra wrote: >> >> >> Please do not remove all of the comments from the two functions. The >> >> comments should provide some documentation about the different >> >> purposes of the two functions other than setting DEST to a CONST. >> > >> > I believe my updated comment covers the complete purpose of the >> > function nowadays. The comments I removed are out-dated, and should >> > have been removed a long time ago.. rs6000_emit_set_const does not >> > even look at N, it always returns a non-zero result, and the return is >> > only tested for non-zero. I removed MODE too, because that is always >> > the same as GET_MODE (dest). >> >> It is helpful if the comment expresses more than restating the >> information one can glean from the function name. It's useful to note >> that rs6000_emit_set_long_const is a standard decomposition with a >> bounded number of instructions. >> >> >> I think that the way you rearranged the invocations of copy_rtx() in >> >> rs6000_emit_set_long_const() is okay, but it would be good for someone >> >> else to double check. >> > >> > Yeah, that function is a bit messy. I took the approach of always use >> > a bare "dest" once in the last instruction emitted, with every other >> > use getting hit with copy_rtx. The previous approach was similar, >> > but used the bare "dest" on the first instruction emitted. Obviously >> > you don't need copy_rtx anywhere with the new code when >> > can_create_pseudo_p is true, but I felt it wasn't worth optimising >> > that for the added source complication. >> >> Can you help clarify the removal of the code that tests if the >> splitter failed? The splitters in the Alpha port follow mostly the >> same rhythm, with a little bit of further cleanup and consolidation >> relative to the rs6000 port. alpha_split_const_mov() falls back on >> alpha_emit_set_long_const(), but checks that the target is valid and >> allows the splitter to fail. Either the Alpha port is doing >> unnecessary work or this cleanup patch is too aggressive. Either way, >> a comment seems necessary. > > OK, I've had a good look at the history of this code. > > rs6000_emit_set_const and rs6000_emit_set_long_const were introduced > with revision 44516, a largish patch by Dan Berlin. As you hint > above, it seems the functions were copied from alpha. So the > parameters were unnecessary and the comments just plain wrong for the > rs6000 version of code right from the initial commit. Worse, only > half of necessary infrastructure was copied from alpha.. > > So let me lay out what I believe should be happening with > (set (reg) (constant)) > > At expand time, if the above set can't be implemented in a single > instruction, then it should be decomposed to the equivalent set high > part, ori low part, and possibly shift instructions so long as the > resulting sequence is small. I think we basically do this correctly > in rs6000_emit_move. See the num_insns_constant call there. > Constants that can't be evaluated inline by two (or three) > instructions will be replaced with a load from the TOC. > > The same thing ought to happen in the splitters that use > rs6000_emit_set_const. rs6000_emit_set_const should refuse to expand > to too many instructions (just like alpha). We don't do this, but if > we did, this would leave some (set (reg) (constant)) instructions in > the RTL. Alternatively the splitters could generate loads from the > TOC, but see pr57836, which shows the loads from the TOC we crafted > at expand time being reduced back to (set (reg) (constant)). > > Finally, at reload time, any remaining (set (reg) (constant)) > (ie. those that result in a long inline sequence) should be forced to > the TOC. This is the missing part of the infrastructure that wasn't > copied from alpha. Our legitimate_constant_p needs to reject some > constants.. As it is, reload simply expands to a four or five > inline instruction sequence. > > David, I'd like some help with the legitimate_constant_p > implementation. I have something that seems to work (not yet > regression tested) but there are a number of things that I'm not clear > on (eg. the revision 20229 change) so likely will get it wrong. > > -- > Alan Modra > Australia Development Lab, IBM