From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7292 invoked by alias); 26 Aug 2014 15:25:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7267 invoked by uid 89); 26 Aug 2014 15:25:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 26 Aug 2014 15:25:48 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7QFPiTJ003026 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Aug 2014 11:25:45 -0400 Received: from topor.usersys.redhat.com ([10.15.16.142]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s7QFPhlX026351; Tue, 26 Aug 2014 11:25:43 -0400 Message-ID: <53FCA6F5.7020405@redhat.com> Date: Tue, 26 Aug 2014 15:25:00 -0000 From: Vladimir Makarov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 MIME-Version: 1.0 To: Ilya Enkovich CC: gcc@gnu.org, gcc-patches , Evgeny Stupachenko , Richard Biener , Uros Bizjak , Jeff Law Subject: Re: Enable EBX for x86 in 32bits PIC code References: <20140707114750.GB31640@tucnak.redhat.com> <20140822121151.GA60032@msticlxl57.ims.intel.com> <53FB5184.3030500@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2014-08/txt/msg02408.txt.bz2 On 08/26/2014 04:57 AM, Ilya Enkovich wrote: > 2014-08-26 11:49 GMT+04:00 Ilya Enkovich : >> 2014-08-25 19:08 GMT+04:00 Vladimir Makarov : >>> On 2014-08-22 8:21 AM, Ilya Enkovich wrote: >>>> Hi, >>>> >>>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in >>>> 32bit PIC mode. It was decided that the best approach would be to not fix >>>> ebx register, use speudo register for GOT base address and let allocator do >>>> the rest. This should be similar to how clang and icc work with GOT base >>>> address. I've been working for some time on such patch and now want to >>>> share my results. >>>> >>>> The idea of the patch was very simple and included few things; >>>> 1. Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do >>>> not have any hard reg fixed for PIC. >>>> 2. Initialize pic_offset_table_rtx with a new pseudo register in the >>>> begining of a function expand. >>>> 3. Change ABI so that there is a possible implicit PIC argument for >>>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg >>>> exist. >>>> >>>> Such approach worked well on small tests but trying to run some benchmarks >>>> we faced a problem with reload of address constants. The problem is that >>>> when we try to rematerialize address constant or some constant memory >>>> reference, we have to use pic_offset_table_rtx. It means we insert new >>>> usages of a speudo register and alocator cannot handle it correctly. Same >>>> problem also applies for float and vector constants. >>>> >>>> Rematerialization is not the only case causing new pic_offset_table_rtx >>>> usage. Another case is a split of some instructions using constant but not >>>> having proper constraints. E.g. pushtf pattern allows push of constant but >>>> it has to be replaced with push of memory in reload pass causing additional >>>> usage of pic_offset_table_rtx. >>>> >>>> There are two ways to fix it. The first one is to support modifications >>>> of pseudo register live range during reload and correctly allocate hard regs >>>> for its new usages (currently we have some hard reg allocated for new usage >>>> of pseudo reg but it may contain value of some other pseudo reg; thus we >>>> reveal the problem at runtime only). >>>> >>> I believe there is already code to deal with this situation. It is code for >>> risky transformations (please check flag lra_risky_transformation_p). If >>> this flag is set, next lra assign subpass is running and checking >>> correctness of assignments (e.g. checking situation when two different >>> pseudos have intersected live ranges and the same assigned hard reg. If >>> such dangerous situation is found, it is fixed). >> I tried to remove my restrictions from setup_reg_equiv and initialize >> lra_risky_transformation_p with 'true' in lra_constraints instead. I >> got only 50% pass rate for SPEC2000 on Ofast with LTO. Will search >> for fail reason. > I've looked into one of fails. There is still a problem with > allocation in reload. Here is a piece of code which uses float > constant: > > (insn 1199 1198 1200 96 (set (reg:SI 3 bx) > (reg:SI 1301 [528])) /usr/include/bits/stdlib-float.h:28 90 > {*movsi_internal} > (nil)) > (call_insn 1200 1199 1201 96 (set (reg:DF 8 st) > (call (mem:QI (symbol_ref:SI ("strtod") [flags 0x41] > ) [0 strtod S1 A8]) > (const_int 8 [0x8]))) /usr/include/bits/stdlib-float.h:28 > 661 {*call_value} > (expr_list:REG_DEAD (reg:SI 3 bx) > (expr_list:REG_CALL_DECL (symbol_ref:SI ("strtod") [flags > 0x41] ) > (expr_list:REG_EH_REGION (const_int 0 [0]) > (nil)))) > (expr_list (use (reg:SI 3 bx)) > (expr_list:SI (use (reg:SI 3 bx)) > (expr_list:SI (use (mem/f:SI (reg/f:SI 7 sp) [0 S4 A32])) > (expr_list:SI (use (mem/f:SI (plus:SI (reg/f:SI 7 sp) > (const_int 4 [0x4])) [0 S4 A32])) > (nil)))))) > (insn 1201 1200 1202 96 (set (reg:DF 321 [ D.7817 ]) > (reg:DF 8 st)) /usr/include/bits/stdlib-float.h:28 128 {*movdf_internal} > (expr_list:REG_DEAD (reg:DF 8 st) > (nil))) > (insn 1202 1201 1203 96 (set (reg:SF 322 [ D.7804 ]) > (float_truncate:SF (reg:DF 321 [ D.7817 ]))) read_arch.c:700 > 157 {*truncdfsf_fast_sse} > (expr_list:REG_DEAD (reg:DF 321 [ D.7817 ]) > (nil))) > (insn 1203 1202 1204 96 (set (mem:SF (reg/f:SI 198 [ D.7812 ]) [4 > _130->frequency+0 S4 A32]) > (reg:SF 322 [ D.7804 ])) read_arch.c:700 129 {*movsf_internal} > (nil)) > (insn 1204 1203 1205 96 (set (reg:SF 1209) > (mem/u/c:SF (plus:SI (reg:SI 1301 [528]) > (const:SI (unspec:SI [ > (symbol_ref/u:SI ("*.LC12") [flags 0x2]) > ] UNSPEC_GOTOFF))) [4 S4 A32])) > read_arch.c:701 129 {*movsf_internal} > (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0]) > (nil))) > (note 1205 1204 1206 96 NOTE_INSN_DELETED) > (note 1206 1205 1207 96 NOTE_INSN_DELETED) > (insn 1207 1206 1208 96 (set (reg:CCFP 17 flags) > (compare:CCFP (reg:SF 1209) > (reg:SF 322 [ D.7804 ]))) read_arch.c:701 53 {*cmpisf_sse} > (nil)) > (jump_insn 1208 1207 3075 96 (set (pc) > (if_then_else (ge (reg:CCFP 17 flags) > (const_int 0 [0])) > (label_ref:SI 3114) > (pc))) read_arch.c:701 606 {*jcc_1} > (expr_list:REG_DEAD (reg:CCFP 17 flags) > (int_list:REG_BR_PROB 2 (nil))) > -> 3114) > (note 3075 1208 1209 97 [bb 97] NOTE_INSN_BASIC_BLOCK) > (insn 1209 3075 1210 97 (set (reg:SF 1208) > (mem/u/c:SF (plus:SI (reg:SI 1301 [528]) > (const:SI (unspec:SI [ > (symbol_ref/u:SI ("*.LC11") [flags 0x2]) > ] UNSPEC_GOTOFF))) [4 S4 A32])) > read_arch.c:701 129 {*movsf_internal} > (expr_list:REG_EQUIV (const_double:SF 1.0e+0 [0x0.8p+1]) > (nil))) > (note 1210 1209 1211 97 NOTE_INSN_DELETED) > (note 1211 1210 1212 97 NOTE_INSN_DELETED) > (insn 1212 1211 1213 97 (set (reg:CCFP 17 flags) > (compare:CCFP (reg:SF 322 [ D.7804 ]) > (reg:SF 1208))) read_arch.c:701 53 {*cmpisf_sse} > (nil)) > > We have PIC register r1301 (former r528) used for constant load (insn > 1209). This register was actually loaded to bx (insn 1199) and this > hard reg may be used by insn 1209. During reload we have insn 1209 > removed and a new one created instead: > > (insn 3864 1211 1212 104 (set (reg:SI 0 ax [1468]) > (plus:SI (reg:SI 6 bp [528]) > (const:SI (unspec:SI [ > (symbol_ref/u:SI ("*.LC11") [flags 0x2]) > ] UNSPEC_GOTOFF)))) read_arch.c:701 213 {*leasi} > (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC11") [flags 0x2]) > (nil))) > (insn 1212 3864 1213 104 (set (reg:CCFP 17 flags) > (compare:CCFP (reg:SF 21 xmm0 [orig:322 D.7804 ] [322]) > (mem/u/c:SF (reg:SI 0 ax [1468]) [4 S4 A32]))) > read_arch.c:701 53 {*cmpisf_sse} > (nil)) > > In this new instruction bp is used which is wrong. We actually have > required value in bx. In debugger I also checked that bp doesn't have > required value. I suppose I enabled flag correctly because found this > in the log: "Spill r1301 after risky transformations". Is it possible > we are still not allowed to use the original PIC register (r528) and > should use a reg copy created for particular region (in this case > r1301)? > It is hard for me to say without the full patch and the test. I can only guess that 1301 gets a wrong class and therefore assigned to the wrong hard ref. Could you send me the patch and the test. I'll look at this and inform you what is going on.