From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30178 invoked by alias); 14 May 2014 21:27:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 30163 invoked by uid 89); 14 May 2014 21:27:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ve0-f176.google.com Received: from mail-ve0-f176.google.com (HELO mail-ve0-f176.google.com) (209.85.128.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 14 May 2014 21:27:08 +0000 Received: by mail-ve0-f176.google.com with SMTP id jz11so207233veb.21 for ; Wed, 14 May 2014 14:27:06 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.220.159.4 with SMTP id h4mr4898724vcx.1.1400102826088; Wed, 14 May 2014 14:27:06 -0700 (PDT) Received: by 10.220.11.5 with HTTP; Wed, 14 May 2014 14:27:05 -0700 (PDT) In-Reply-To: <20140514095629.GN5162@bubble.grove.modra.org> References: <20140508014846.GA5162@bubble.grove.modra.org> <20140509024054.GE5162@bubble.grove.modra.org> <20140514030448.GJ5162@bubble.grove.modra.org> <20140514095629.GN5162@bubble.grove.modra.org> Date: Wed, 14 May 2014 21:27:00 -0000 Message-ID: Subject: Re: [RS6000] Fix PR61098, Poor code setting count register From: David Edelsohn To: GCC Patches Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2014-05/txt/msg01121.txt.bz2 On Wed, May 14, 2014 at 5:56 AM, Alan Modra wrote: >> I seem to remember problems in the past with late creation of TOC >> entries for constants causing problems, so it was easier to fall back >> to materializing all integer constants inline. I don't remember the >> PRs, but I think there were issues with creating a TOC if the late >> constant were the only TOC reference, or maybe the issue was buying a >> stack frame to materialize the TOC/GOT for a late constant. And >> maximum 5 instruction sequence is not really bad relative to a load >> from the TOC (even with medium model / data in TOC). There are a lot >> of trade-offs with respect to I$ expansion versus the load hitting in >> the L1 D$. > > Sure, but Steve will tell you that the 5 instruction sequence is both > slower due to all the dependent ops, and results in larger code+data > size. We definitely want to avoid it if possible, and pr67836 shows a > case taken from glibc math library code where there should be no > problem in using the TOC. I don't necessarily believe this is a win overall. If the constant reliably is in the L1 D$ (or maybe L2 D$) and accessed with a direct load (data in TOC or medium model), then yes. If it's farther away in the memory hierarchy, then it's not a win. I agree about the code expansion concern, which has its own secondary effects. If this is a constant in a tight loop, okay, but if it's a unique constant, it may not occur elsewhere in the code to be shared and may not be placed in the same cache line as other, recently accessed constants. This would push the load to L3 or farther. Also, remember that this same heuristic is used by AIX, which still defaults to small TOC model. So either the constant is in the TOC anchor constant pool, which hopefully will pre-load the anchor, or will be a constant in the TOC, possibly putting more pressure on TOC size and causing overflow. I am certain that there are anecdotal examples where it is a win for PPC64 Linux, but I would want more evidence that it's a general win. >> alpha_emit_set_long_const() always will materialize the constant and >> does not check for a maximum number of instructions. This is why it's >> comment says "fall back to straight forward decomposition". > No, that is wrong. alpha_emit_set_const does *not* always try to > materialize the constant inline. It does so for constants that need > more than three instructions only when TARGET_BUILD_CONSTANTS. I said that alpha_emit_set_long_const() always materializes the constant, but, as you say, it is not always called. alpha_emit_set_const() may fail if it requires too many instructions or the search depth is too deep. You seem to be referring to some of the logic in alpha_split_const_mov() as well. Again, this definitely is worth exploring. And I am confident that there are cases where loading the constant from memory is a win. I just don't have a good instinct if it is a win most of the time for a broad range of real-world applications. One optimization opportunity in GLIBC is not a general heuristic. I don't think that we know a lot about the context of the use of the constant to apply a finer-grained policy. I think the original code tried to put the constant in memory if it appeared before reload, when everything could be calculated correctly for prologue and materializing the TOC, but tried to materialze any constants that appeared during reload using splitters. That can avoid some of the problem corner cases. The code needs to handle PPC32 PPC64 eABI AIX Thanks, David