From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7043 invoked by alias); 22 Oct 2012 08:21:31 -0000 Received: (qmail 7027 invoked by uid 22791); 22 Oct 2012 08:21:29 -0000 X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_VF X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 22 Oct 2012 08:21:20 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 22 Oct 2012 09:21:18 +0100 Received: from [10.1.69.67] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 22 Oct 2012 09:21:16 +0100 Message-ID: <508501FB.4060901@arm.com> Date: Mon, 22 Oct 2012 08:23:00 -0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: Julian Brown CC: "gcc-patches@gcc.gnu.org" , Ramana Radhakrishnan Subject: Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode References: <20121020123824.4e9251b5@octopus> In-Reply-To: <20121020123824.4e9251b5@octopus> X-MC-Unique: 112102209211804301 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-10/txt/msg01915.txt.bz2 On 20/10/12 12:38, Julian Brown wrote: > Hi, > > Quite a few tests fail for big-endian multilibs which use VFP > instructions at present. One reason for many of these is glaringly > obvious once you notice it: for D registers interpreted as two S > registers, the lower-numbered register is always the less-significant > part of the value, and the higher-numbered register the > more-significant -- regardless of the endianness the processor is > running in. > > However, for big-endian mode, when DFmode values are represented in > memory (or indeed core registers), the opposite is true. So, a subreg > expression such as the following will work fine on core registers (or > e.g. pseudos assigned to stack slots): > > (subreg:SI (reg:DF) 0) > > but, when applied to a VFP register Dn, it should be resolved to the > hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e. > the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should > be the most-significant part of the value). For the relatively few cases > where DFmode values are interpreted as a pair of (integer) words, this > means that wrong code is generated. > > My feeling is that implementing a "proper" solution to this problem is > probably impractical -- the closest existing macros to control > behaviour aren't sufficient for this case: > > * FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct > as is it. > > * REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian > order in registers, but refers to *all* registers. We only want to > change the behaviour for the VFP registers. Defining a new macro > FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would > differ depending on the hard register under observation: that seems > like too much to ask of generic machinery in the middle-end. > > So, the attached patch just avoids the problem, by pretending that > greater-than-word-size values in VFP registers, in big-endian mode, are > opaque and cannot be subreg'ed. In practice, for at least the test case > I looked at, this isn't as much of a pessimisation as you might expect > -- the value in question might already be stored in core registers > (e.g. for function arguments with -mfloat-abi=3Dsoftfp), so can be > retrieved directly from those rather than via memory. > > This is the testsuite delta for current FSF mainline, with multilibs > adjusted to build for little/big-endian, and using options > "-mbig-endian -mfloat-abi=3Dsoftfp -mfpu=3Dvfpv3" for testing: > > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= 1 execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= 2 execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= 2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= 2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= 3 -fomit-frame-pointer execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= 3 -g execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O= s execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign= 1.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c= execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c exe= cution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -O1 > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -O2 > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -O3 -fomit-frame-pointer > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -O3 -g > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -Og -g > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe= cution, -Os > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 c_c= ompat_x_tst.o-c_compat_y_tst.o execute > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= 1 execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= 2 execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= 2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= 2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= 3 -fomit-frame-pointer execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= 3 -g execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O= s execution test > > OK for mainline, or any comments? (I've included the multilib tweaks I > used in the attached patch for reference, though I'm not proposing to > apply those.) > > Thanks, > > Julian > > ChangeLog > > gcc/ > * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Avoid subreg'ing > VFP D registers in big-endian mode. > > The patch to arm.h is OK. The patch to t-arm-elf is not. I presume the=20 latter was just an oversight in patch preparation as there is no=20 ChangeLog entry for it. R. > vfp-subregs-bigendian-2.diff > > > Index: gcc/config/arm/arm.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/config/arm/arm.h (revision 192576) > +++ gcc/config/arm/arm.h (working copy) > @@ -1205,8 +1205,15 @@ enum reg_class > /* In VFPv1, VFP registers could only be accessed in the mode they > were set, so subregs would be invalid there. However, we don't > support VFPv1 at the moment, and the restriction was lifted in > - VFPv2. */ > -#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) 0 > + VFPv2. > + In big-endian mode, modes greater than word size (i.e. DFmode) are st= ored in > + VFP registers in little-endian order. We can't describe that accurat= ely to > + GCC, so avoid taking subregs of such values. */ > +#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ > + (TARGET_VFP && TARGET_BIG_END \ > + && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD \ > + || GET_MODE_SIZE (TO) > UNITS_PER_WORD) \ > + && reg_classes_intersect_p (VFP_REGS, (CLASS))) > > /* The class value for index registers, and the one for base regs. */ > #define INDEX_REG_CLASS (TARGET_THUMB1 ? LO_REGS : GENERAL_REGS) > Index: gcc/config/arm/t-arm-elf > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/config/arm/t-arm-elf (revision 192576) > +++ gcc/config/arm/t-arm-elf (working copy) > @@ -17,8 +17,8 @@ > # along with GCC; see the file COPYING3. If not see > # . > > -MULTILIB_OPTIONS =3D marm/mthumb > -MULTILIB_DIRNAMES =3D arm thumb > +MULTILIB_OPTIONS =3D marm > +MULTILIB_DIRNAMES =3D arm > MULTILIB_EXCEPTIONS =3D > MULTILIB_MATCHES =3D > > @@ -49,9 +49,9 @@ MULTILIB_EXCEPTIONS +=3D *mthumb/*mfloa > # MULTILIB_DIRNAMES +=3D ep9312 > # MULTILIB_EXCEPTIONS +=3D *mthumb/*mcpu=3Dep9312* > #=20=09 > -# MULTILIB_OPTIONS +=3D mlittle-endian/mbig-endian > -# MULTILIB_DIRNAMES +=3D le be > -# MULTILIB_MATCHES +=3D mbig-endian=3Dmbe mlittle-endian=3Dmle > +MULTILIB_OPTIONS +=3D mlittle-endian/mbig-endian > +MULTILIB_DIRNAMES +=3D le be > +MULTILIB_MATCHES +=3D mbig-endian=3Dmbe mlittle-endian=3Dmle > # > # MULTILIB_OPTIONS +=3D mfloat-abi=3Dhard/mfloat-abi=3Dsoft > # MULTILIB_DIRNAMES +=3D fpu soft >