From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-376199-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7292 invoked by alias); 26 Aug 2014 15:25:54 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 7267 invoked by uid 89); 26 Aug 2014 15:25:50 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 26 Aug 2014 15:25:48 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7QFPiTJ003026	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);	Tue, 26 Aug 2014 11:25:45 -0400
Received: from topor.usersys.redhat.com ([10.15.16.142])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s7QFPhlX026351;	Tue, 26 Aug 2014 11:25:43 -0400
Message-ID: <53FCA6F5.7020405@redhat.com>
Date: Tue, 26 Aug 2014 15:25:00 -0000
From: Vladimir Makarov <vmakarov@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
MIME-Version: 1.0
To: Ilya Enkovich <enkovich.gnu@gmail.com>
CC: gcc@gnu.org, gcc-patches <gcc-patches@gcc.gnu.org>,        Evgeny Stupachenko <evstupac@gmail.com>,        Richard Biener <richard.guenther@gmail.com>,        Uros Bizjak <ubizjak@gmail.com>, Jeff Law <law@redhat.com>
Subject: Re: Enable EBX for x86 in 32bits PIC code
References: <CAOvf_xxsQ_oYGqNAVQ1+BW+CuD3mzebZ2xma0jpF=WfyZMCRCA@mail.gmail.com>	<CAFiYyc1mFtTezkTJORmJJq+yht=qPSwiN7KDn19+bSuSdaqvMQ@mail.gmail.com>	<CAOvf_xyeVeg2oB9Xxz8RMEQ6gyfJY5whd9s4ygoAAEaMU9efnA@mail.gmail.com>	<20140707114750.GB31640@tucnak.redhat.com>	<CAMbmDYZV_fx0jxmKHhLsC2pJ7pDzuu6toEAH72izOdpq6KGyfg@mail.gmail.com>	<20140822121151.GA60032@msticlxl57.ims.intel.com>	<53FB5184.3030500@redhat.com>	<CAMbmDYbP77pmkqpJD8cnXbe7_8aRMnYF-tnfKRubULn8-aDCdw@mail.gmail.com> <CAMbmDYacBWjKtCYPB0A2m=fkUTk_Wt5D6f2aEkH7C3paWaR7ag@mail.gmail.com>
In-Reply-To: <CAMbmDYacBWjKtCYPB0A2m=fkUTk_Wt5D6f2aEkH7C3paWaR7ag@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2014-08/txt/msg02408.txt.bz2

On 08/26/2014 04:57 AM, Ilya Enkovich wrote:
> 2014-08-26 11:49 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>> 2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
>>> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>>>> Hi,
>>>>
>>>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>>>> 32bit PIC mode.  It was decided that the best approach would be to not fix
>>>> ebx register, use speudo register for GOT base address and let allocator do
>>>> the rest.  This should be similar to how clang and icc work with GOT base
>>>> address.  I've been working for some time on such patch and now want to
>>>> share my results.
>>>>
>>>> The idea of the patch was very simple and included few things;
>>>>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>>>> not have any hard reg fixed for PIC.
>>>>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the
>>>> begining of a function expand.
>>>>   3.  Change ABI so that there is a possible implicit PIC argument for
>>>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>>>> exist.
>>>>
>>>> Such approach worked well on small tests but trying to run some benchmarks
>>>> we faced a problem with reload of address constants.  The problem is that
>>>> when we try to rematerialize address constant or some constant memory
>>>> reference, we have to use pic_offset_table_rtx.  It means we insert new
>>>> usages of a speudo register and alocator cannot handle it correctly.  Same
>>>> problem also applies for float and vector constants.
>>>>
>>>> Rematerialization is not the only case causing new pic_offset_table_rtx
>>>> usage.  Another case is a split of some instructions using constant but not
>>>> having proper constraints.  E.g. pushtf pattern allows push of constant but
>>>> it has to be replaced with push of memory in reload pass causing additional
>>>> usage of pic_offset_table_rtx.
>>>>
>>>> There are two ways to fix it.  The first one is to support modifications
>>>> of pseudo register live range during reload and correctly allocate hard regs
>>>> for its new usages (currently we have some hard reg allocated for new usage
>>>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>>>> reveal the problem at runtime only).
>>>>
>>> I believe there is already code to deal with this situation.  It is code for
>>> risky transformations (please check flag lra_risky_transformation_p).  If
>>> this flag is set, next lra assign subpass is running and checking
>>> correctness of assignments (e.g. checking situation when two different
>>> pseudos have intersected live ranges and the same assigned hard reg.  If
>>> such dangerous situation is found, it is fixed).
>> I tried to remove my restrictions from setup_reg_equiv and initialize
>> lra_risky_transformation_p with 'true' in lra_constraints instead.  I
>> got only 50% pass rate for SPEC2000 on Ofast with LTO.  Will search
>> for fail reason.
> I've looked into one of fails.  There is still a problem with
> allocation in reload. Here is a piece of code which uses float
> constant:
>
> (insn 1199 1198 1200 96 (set (reg:SI 3 bx)
>         (reg:SI 1301 [528])) /usr/include/bits/stdlib-float.h:28 90
> {*movsi_internal}
>      (nil))
> (call_insn 1200 1199 1201 96 (set (reg:DF 8 st)
>         (call (mem:QI (symbol_ref:SI ("strtod") [flags 0x41]
> <function_decl 0x2b29b8ea8900 strtod>) [0 strtod S1 A8])
>             (const_int 8 [0x8]))) /usr/include/bits/stdlib-float.h:28
> 661 {*call_value}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (expr_list:REG_CALL_DECL (symbol_ref:SI ("strtod") [flags
> 0x41]  <function_decl 0x2b29b8ea8900 strtod>)
>             (expr_list:REG_EH_REGION (const_int 0 [0])
>                 (nil))))
>     (expr_list (use (reg:SI 3 bx))
>         (expr_list:SI (use (reg:SI 3 bx))
>             (expr_list:SI (use (mem/f:SI (reg/f:SI 7 sp) [0  S4 A32]))
>                 (expr_list:SI (use (mem/f:SI (plus:SI (reg/f:SI 7 sp)
>                                 (const_int 4 [0x4])) [0  S4 A32]))
>                     (nil))))))
> (insn 1201 1200 1202 96 (set (reg:DF 321 [ D.7817 ])
>         (reg:DF 8 st)) /usr/include/bits/stdlib-float.h:28 128 {*movdf_internal}
>      (expr_list:REG_DEAD (reg:DF 8 st)
>         (nil)))
> (insn 1202 1201 1203 96 (set (reg:SF 322 [ D.7804 ])
>         (float_truncate:SF (reg:DF 321 [ D.7817 ]))) read_arch.c:700
> 157 {*truncdfsf_fast_sse}
>      (expr_list:REG_DEAD (reg:DF 321 [ D.7817 ])
>         (nil)))
> (insn 1203 1202 1204 96 (set (mem:SF (reg/f:SI 198 [ D.7812 ]) [4
> _130->frequency+0 S4 A32])
>         (reg:SF 322 [ D.7804 ])) read_arch.c:700 129 {*movsf_internal}
>      (nil))
> (insn 1204 1203 1205 96 (set (reg:SF 1209)
>         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC12") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> read_arch.c:701 129 {*movsf_internal}
>      (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0])
>         (nil)))
> (note 1205 1204 1206 96 NOTE_INSN_DELETED)
> (note 1206 1205 1207 96 NOTE_INSN_DELETED)
> (insn 1207 1206 1208 96 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 1209)
>             (reg:SF 322 [ D.7804 ]))) read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
> (jump_insn 1208 1207 3075 96 (set (pc)
>         (if_then_else (ge (reg:CCFP 17 flags)
>                 (const_int 0 [0]))
>             (label_ref:SI 3114)
>             (pc))) read_arch.c:701 606 {*jcc_1}
>      (expr_list:REG_DEAD (reg:CCFP 17 flags)
>         (int_list:REG_BR_PROB 2 (nil)))
>  -> 3114)
> (note 3075 1208 1209 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
> (insn 1209 3075 1210 97 (set (reg:SF 1208)
>         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> read_arch.c:701 129 {*movsf_internal}
>      (expr_list:REG_EQUIV (const_double:SF 1.0e+0 [0x0.8p+1])
>         (nil)))
> (note 1210 1209 1211 97 NOTE_INSN_DELETED)
> (note 1211 1210 1212 97 NOTE_INSN_DELETED)
> (insn 1212 1211 1213 97 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 322 [ D.7804 ])
>             (reg:SF 1208))) read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
>
> We have PIC register r1301 (former r528) used for constant load (insn
> 1209).  This register was actually loaded to bx (insn 1199) and this
> hard reg may be used by insn 1209.  During reload we have insn 1209
> removed and a new one created instead:
>
> (insn 3864 1211 1212 104 (set (reg:SI 0 ax [1468])
>         (plus:SI (reg:SI 6 bp [528])
>             (const:SI (unspec:SI [
>                         (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>                     ] UNSPEC_GOTOFF)))) read_arch.c:701 213 {*leasi}
>      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>         (nil)))
> (insn 1212 3864 1213 104 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 21 xmm0 [orig:322 D.7804 ] [322])
>             (mem/u/c:SF (reg:SI 0 ax [1468]) [4  S4 A32])))
> read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
>
> In this new instruction bp is used which is wrong. We actually have
> required value in bx. In debugger I also checked that bp doesn't have
> required value.  I suppose I enabled flag correctly because found this
> in the log: "Spill r1301 after risky transformations".  Is it possible
> we are still not allowed to use the original PIC register (r528) and
> should use a reg copy created for particular region (in this case
> r1301)?
>
It is hard for me to say without the full patch and the test.  I can
only guess that 1301 gets a wrong class and therefore assigned to the
wrong hard ref.

Could you send me the patch and the test.  I'll look at this and inform
you what is going on.