From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-329588-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7043 invoked by alias); 22 Oct 2012 08:21:31 -0000
Received: (qmail 7027 invoked by uid 22791); 22 Oct 2012 08:21:29 -0000
X-SWARE-Spam-Status: No, hits=-1.5 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_VF
X-Spam-Check-By: sourceware.org
Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 22 Oct 2012 08:21:20 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 22 Oct 2012 09:21:18 +0100
Received: from [10.1.69.67] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Mon, 22 Oct 2012 09:21:16 +0100
Message-ID: <508501FB.4060901@arm.com>
Date: Mon, 22 Oct 2012 08:23:00 -0000
From: Richard Earnshaw <rearnsha@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1
MIME-Version: 1.0
To: Julian Brown <julian@codesourcery.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode
References: <20121020123824.4e9251b5@octopus>
In-Reply-To: <20121020123824.4e9251b5@octopus>
X-MC-Unique: 112102209211804301
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-10/txt/msg01915.txt.bz2

On 20/10/12 12:38, Julian Brown wrote:
> Hi,
>
> Quite a few tests fail for big-endian multilibs which use VFP
> instructions at present. One reason for many of these is glaringly
> obvious once you notice it: for D registers interpreted as two S
> registers, the lower-numbered register is always the less-significant
> part of the value, and the higher-numbered register the
> more-significant -- regardless of the endianness the processor is
> running in.
>
> However, for big-endian mode, when DFmode values are represented in
> memory (or indeed core registers), the opposite is true. So, a subreg
> expression such as the following will work fine on core registers (or
> e.g. pseudos assigned to stack slots):
>
> (subreg:SI (reg:DF) 0)
>
> but, when applied to a VFP register Dn, it should be resolved to the
> hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e.
> the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should
> be the most-significant part of the value). For the relatively few cases
> where DFmode values are interpreted as a pair of (integer) words, this
> means that wrong code is generated.
>
> My feeling is that implementing a "proper" solution to this problem is
> probably impractical -- the closest existing macros to control
> behaviour aren't sufficient for this case:
>
> * FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct
>    as is it.
>
> * REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian
>    order in registers, but refers to *all* registers. We only want to
>    change the behaviour for the VFP registers. Defining a new macro
>    FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would
>    differ depending on the hard register under observation: that seems
>    like too much to ask of generic machinery in the middle-end.
>
> So, the attached patch just avoids the problem, by pretending that
> greater-than-word-size values in VFP registers, in big-endian mode, are
> opaque and cannot be subreg'ed. In practice, for at least the test case
> I looked at, this isn't as much of a pessimisation as you might expect
> -- the value in question might already be stored in core registers
> (e.g. for function arguments with -mfloat-abi=3Dsoftfp), so can be
> retrieved directly from those rather than via memory.
>
> This is the testsuite delta for current FSF mainline, with multilibs
> adjusted to build for little/big-endian, and using options
> "-mbig-endian -mfloat-abi=3Dsoftfp -mfpu=3Dvfpv3" for testing:
>
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
1  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
2  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
2 -flto -fno-use-linker-plugin -flto-partition=3Dnone  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
3 -fomit-frame-pointer  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
3 -g  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O=
s  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign=
1.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c=
 execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c exe=
cution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -O1
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -O2
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -O3 -fomit-frame-pointer
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -O3 -g
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -Og -g
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c exe=
cution,  -Os
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 c_c=
ompat_x_tst.o-c_compat_y_tst.o execute
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
1  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
2  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
2 -flto -fno-use-linker-plugin -flto-partition=3Dnone  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
3 -fomit-frame-pointer  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
3 -g  execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O=
s  execution test
>
> OK for mainline, or any comments? (I've included the multilib tweaks I
> used in the attached patch for reference, though I'm not proposing to
> apply those.)
>
> Thanks,
>
> Julian
>
> ChangeLog
>
>      gcc/
>      * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Avoid subreg'ing
>      VFP D registers in big-endian mode.
>
>

The patch to arm.h is OK.  The patch to t-arm-elf is not.  I presume the=20
latter was just an oversight in patch preparation as there is no=20
ChangeLog entry for it.

R.

> vfp-subregs-bigendian-2.diff
>
>
> Index: gcc/config/arm/arm.h
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/config/arm/arm.h	(revision 192576)
> +++ gcc/config/arm/arm.h	(working copy)
> @@ -1205,8 +1205,15 @@ enum reg_class
>   /* In VFPv1, VFP registers could only be accessed in the mode they
>      were set, so subregs would be invalid there.  However, we don't
>      support VFPv1 at the moment, and the restriction was lifted in
> -   VFPv2.  */
> -#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) 0
> +   VFPv2.
> +   In big-endian mode, modes greater than word size (i.e. DFmode) are st=
ored in
> +   VFP registers in little-endian order.  We can't describe that accurat=
ely to
> +   GCC, so avoid taking subregs of such values.  */
> +#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
> +  (TARGET_VFP && TARGET_BIG_END				\
> +   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD		\
> +       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)		\
> +   && reg_classes_intersect_p (VFP_REGS, (CLASS)))
>
>   /* The class value for index registers, and the one for base regs.  */
>   #define INDEX_REG_CLASS  (TARGET_THUMB1 ? LO_REGS : GENERAL_REGS)
> Index: gcc/config/arm/t-arm-elf
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/config/arm/t-arm-elf	(revision 192576)
> +++ gcc/config/arm/t-arm-elf	(working copy)
> @@ -17,8 +17,8 @@
>   # along with GCC; see the file COPYING3.  If not see
>   # <http://www.gnu.org/licenses/>.
>
> -MULTILIB_OPTIONS     =3D marm/mthumb
> -MULTILIB_DIRNAMES    =3D arm thumb
> +MULTILIB_OPTIONS     =3D marm
> +MULTILIB_DIRNAMES    =3D arm
>   MULTILIB_EXCEPTIONS  =3D
>   MULTILIB_MATCHES     =3D
>
> @@ -49,9 +49,9 @@ MULTILIB_EXCEPTIONS    +=3D *mthumb/*mfloa
>   # MULTILIB_DIRNAMES   +=3D ep9312
>   # MULTILIB_EXCEPTIONS +=3D *mthumb/*mcpu=3Dep9312*
>   #=20=09
> -# MULTILIB_OPTIONS     +=3D mlittle-endian/mbig-endian
> -# MULTILIB_DIRNAMES    +=3D le be
> -# MULTILIB_MATCHES     +=3D mbig-endian=3Dmbe mlittle-endian=3Dmle
> +MULTILIB_OPTIONS     +=3D mlittle-endian/mbig-endian
> +MULTILIB_DIRNAMES    +=3D le be
> +MULTILIB_MATCHES     +=3D mbig-endian=3Dmbe mlittle-endian=3Dmle
>   #
>   # MULTILIB_OPTIONS    +=3D mfloat-abi=3Dhard/mfloat-abi=3Dsoft
>   # MULTILIB_DIRNAMES   +=3D fpu soft
>