From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28104 invoked by alias); 2 Apr 2010 04:06:20 -0000 Received: (qmail 27987 invoked by uid 22791); 2 Apr 2010 04:06:15 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SARE_MSGID_LONG45,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.35) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 02 Apr 2010 04:06:07 +0000 Received: from kpbe17.cbf.corp.google.com (kpbe17.cbf.corp.google.com [172.25.105.81]) by smtp-out.google.com with ESMTP id o32463SX001385 for ; Fri, 2 Apr 2010 06:06:04 +0200 Received: from yxe17 (yxe17.prod.google.com [10.190.2.17]) by kpbe17.cbf.corp.google.com with ESMTP id o324628b030003 for ; Thu, 1 Apr 2010 21:06:02 -0700 Received: by yxe17 with SMTP id 17so956989yxe.20 for ; Thu, 01 Apr 2010 21:06:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.179.4 with HTTP; Thu, 1 Apr 2010 21:06:02 -0700 (PDT) In-Reply-To: References: <7587b291003280745o651a9c61ia1abae6707832526@mail.gmail.com> <4BB4E1A1.4050000@redhat.com> Date: Fri, 02 Apr 2010 04:06:00 -0000 Received: by 10.150.132.17 with SMTP id f17mr2324644ybd.286.1270181162199; Thu, 01 Apr 2010 21:06:02 -0700 (PDT) Message-ID: Subject: Re: Where can I put the optimization of got for arm back end at? From: Carrot Wei To: Steven Bosscher Cc: Andrew Haley , gcc@gcc.gnu.org, Richard Earnshaw , Paul Brook , nickc@redhat.com Content-Type: text/plain; charset=ISO-8859-1 X-System-Of-Record: true Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2010-04/txt/msg00024.txt.bz2 This is really a good question! Consider the requirement of this optimization. 1. There should be at least 2 methods to load a global variable's address from GOT. Usually it means using different relocation types. 2. By default all global variables access use the same one method. 3. In some cases (less than X global variables access) method A is better, in other cases method B is better. With these constraints a simplify_GOT optimization pass is applicable. But these constraints are too weak. The new optimization pass nearly can do nothing except a call to target specific hook. I suspect such a pass is acceptable. We can also add more constraints: 4. If we can restrict method A as following: first load the base address of GOT into a register pic_reg, then the real global variable's address is loaded as load offset_reg, the offset from GOT base to the GOT entry load address, [pic_reg + offset_reg] With this constraint the new pass knows there is a special register pic_reg, it can look for and count all usage of pic_reg. If all usages are method A and the count is more than the target specific threshold, then the usages can be rewritten as method B. The method detection and rewritten should be target specific. I don't know how other targets handle global address access with -fpic. And how many targets satisfy these 4 constraints. thanks Guozhi On Fri, Apr 2, 2010 at 4:31 AM, Steven Bosscher wrote: > On Thu, Apr 1, 2010 at 8:10 PM, Andrew Haley wrote: >> On 28/03/10 15:45, Carrot Wei wrote: >>> Hi >>> >>> The detailed description of the optimization is at >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM >>> specific optimization. >>> >>> This optimization uses one less register (the register hold the GOT >>> base), to get this beneficial the ideal place for it should be before >>> register allocation. >>> >>> Usually expand pass generates instructions to load global variable's >>> address from GOT entry for each access of the global variable. Later >>> cse/gcse passes can remove many of them. In order to precisely model >>> the cost, this optimization should be put after some cse/gcse passes. >>> >>> So what is the best place for this optimization? Is there any existed >>> pass can be enhanced with this optimization? Or should I add a new >>> pass? >> >> The obvious place is machine-dependent reorg, which is a very late pass. > > Yes, and after register allocation, i.e. too late for Guozhi. > > Basically there is no place right now to stuff a pass like that. > Question is: Is this optimization really, reallyreallyreally so target > specific that a target-independent pass is not the better option? > > Ciao! > Steven >