From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21541 invoked by alias); 26 Aug 2014 07:49:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21528 invoked by uid 89); 26 Aug 2014 07:49:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ie0-f179.google.com Received: from mail-ie0-f179.google.com (HELO mail-ie0-f179.google.com) (209.85.223.179) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 26 Aug 2014 07:49:47 +0000 Received: by mail-ie0-f179.google.com with SMTP id rl12so10759064iec.24 for ; Tue, 26 Aug 2014 00:49:44 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.50.111.112 with SMTP id ih16mr20496046igb.30.1409039384843; Tue, 26 Aug 2014 00:49:44 -0700 (PDT) Received: by 10.64.143.5 with HTTP; Tue, 26 Aug 2014 00:49:44 -0700 (PDT) In-Reply-To: <53FB5184.3030500@redhat.com> References: <20140707114750.GB31640@tucnak.redhat.com> <20140822121151.GA60032@msticlxl57.ims.intel.com> <53FB5184.3030500@redhat.com> Date: Tue, 26 Aug 2014 07:49:00 -0000 Message-ID: Subject: Re: Enable EBX for x86 in 32bits PIC code From: Ilya Enkovich To: Vladimir Makarov Cc: gcc@gnu.org, gcc-patches , Evgeny Stupachenko , Richard Biener , Uros Bizjak , Jeff Law Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2014-08/txt/msg02366.txt.bz2 2014-08-25 19:08 GMT+04:00 Vladimir Makarov : > On 2014-08-22 8:21 AM, Ilya Enkovich wrote: >> >> Hi, >> >> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in >> 32bit PIC mode. It was decided that the best approach would be to not fix >> ebx register, use speudo register for GOT base address and let allocator do >> the rest. This should be similar to how clang and icc work with GOT base >> address. I've been working for some time on such patch and now want to >> share my results. >> >> The idea of the patch was very simple and included few things; >> 1. Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do >> not have any hard reg fixed for PIC. >> 2. Initialize pic_offset_table_rtx with a new pseudo register in the >> begining of a function expand. >> 3. Change ABI so that there is a possible implicit PIC argument for >> calls; pic_offset_table_rtx is used as an arg value if such implicit arg >> exist. >> >> Such approach worked well on small tests but trying to run some benchmarks >> we faced a problem with reload of address constants. The problem is that >> when we try to rematerialize address constant or some constant memory >> reference, we have to use pic_offset_table_rtx. It means we insert new >> usages of a speudo register and alocator cannot handle it correctly. Same >> problem also applies for float and vector constants. >> >> Rematerialization is not the only case causing new pic_offset_table_rtx >> usage. Another case is a split of some instructions using constant but not >> having proper constraints. E.g. pushtf pattern allows push of constant but >> it has to be replaced with push of memory in reload pass causing additional >> usage of pic_offset_table_rtx. >> >> There are two ways to fix it. The first one is to support modifications >> of pseudo register live range during reload and correctly allocate hard regs >> for its new usages (currently we have some hard reg allocated for new usage >> of pseudo reg but it may contain value of some other pseudo reg; thus we >> reveal the problem at runtime only). >> > > I believe there is already code to deal with this situation. It is code for > risky transformations (please check flag lra_risky_transformation_p). If > this flag is set, next lra assign subpass is running and checking > correctness of assignments (e.g. checking situation when two different > pseudos have intersected live ranges and the same assigned hard reg. If > such dangerous situation is found, it is fixed). I tried to remove my restrictions from setup_reg_equiv and initialize lra_risky_transformation_p with 'true' in lra_constraints instead. I got only 50% pass rate for SPEC2000 on Ofast with LTO. Will search for fail reason. Ilya > > >> The second way is to avoid all cases when new usages of >> pic_offset_table_rtx appear in reload. That is a way I chose because it >> appeared simplier to me and would allow me to get some performance data >> faster. Also having rematerialization of address anf float constants in PIC >> mode would mean we have higher register pressure, thus having them on stack >> should be even more efficient. To achieve it I had to cut off reg equivs to >> all exprs using symbol references and all constants living in the memory. I >> also had to avoid instructions requiring split in reload causing load of >> constant from memory (*push[txd]f). >> >> Resulting compiler successfully passes make check, compiles EEMBC and >> SPEC2000 benchmarks. There is no confidence I covered all cases and there >> still may be some templates causing split in reload with new >> pic_offset_table_rtx usages. I think support of reload with pseudo PIC >> would be better and more general solution. But I don't know how difficult >> is to implement it though. Any ideas on resolving this reload issue? >> > > Please see what I mentioned above. May be it can fix the degradation. > Rematerialization is important for performance and switching it of > completely is not wise. > > > >> I collected some performance numbers for EEMBC and SPEC2000 benchmarks. >> Here are patch results for -Ofast optlevel with LTO collectd on Avoton >> server: >> AUTOmark +1,9% >> TELECOMmark +4,0% >> DENmark +10,0% >> SPEC2000 -0,5% >> >> There are few degradations on EEMBC benchmarks but on SPEC2000 situation >> is different and we see more performance losses. Some of them are caused by >> disabled rematerialization of address constants. In some cases relaxed ebx >> causes more spills/fills in plaecs where GOT is frequently used. There are >> also some minor fixes required in the patch to allow more efficient function >> prolog (avoid unnecessary GOT register initialization and allow its >> initialization without ebx usage). Suppose some performance problems may be >> resolved but a good fix for reload should go first. >> >> > > Ilya, the optimization you are trying to implement is important in many > cases and should be in some way included in gcc. If the degradations can be > solved in a way i mentioned above we could introduce a machine-dependent > flag. >