From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 0DE593857C5A for ; Mon, 6 Mar 2023 08:00:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DE593857C5A Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1678089603; bh=reOAaXW2q2d7ejOtVxsj05scNV+OVhtShSCsThyWW28=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=EshDj4vDlRAmsaGcNL/vP3KVLs93d/+VncaPO1KVcGGQQRV8yW0y6d9X5Sf0RIwZX +SSr8Zk0Ozml50ZeMlwRunsyJGxB563ntczBqUOzCeviOtbZ9UwAaSFWtmHNTziuU5 seBLmEMgHhz5gL6TeX+kAMZcbokFkYJt/qZrh1Z8= Received: from [IPv6:240e:456:1020:bd2:48ae:29ab:cdd8:861c] (unknown [IPv6:240e:456:1020:bd2:48ae:29ab:cdd8:861c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 1A8E965B38; Mon, 6 Mar 2023 02:59:58 -0500 (EST) Message-ID: <048b45f31fd2dc07ed9859eb189401015be90e53.camel@xry111.site> Subject: Pushed: [PATCH v2] LoongArch: Stop -mfpu from silently breaking ABI [PR109000] From: Xi Ruoyao To: Lulu Cheng , gcc-patches@gcc.gnu.org Cc: WANG Xuerui , Chenghua Xu , Yujie Yang Date: Mon, 06 Mar 2023 15:59:51 +0800 In-Reply-To: <88ce32ee-75e1-6a93-df99-3b75b391cd0f@loongson.cn> References: <20230303081658.6383-1-xry111@xry111.site> <88ce32ee-75e1-6a93-df99-3b75b391cd0f@loongson.cn> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4 MIME-Version: 1.0 X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Pushed r13-6500 and r12-9225. On Mon, 2023-03-06 at 15:21 +0800, Lulu Cheng wrote: >=20 > =E5=9C=A8 2023/3/3 =E4=B8=8B=E5=8D=884:16, Xi Ruoyao =E5=86=99=E9=81=93: > > In the toolchain convention, we describe -mfpu=3D as: > >=20 > > "Selects the allowed set of basic floating-point instructions and > > registers. This option should not change the FP calling convention > > unless it's necessary." > >=20 > > Though not explicitly stated, the rationale of this rule is to allow > > combinations like "-mabi=3Dlp64s -mfpu=3D64".=C2=A0 This will be useful= for > > running applications with LP64S/F ABI on a double-float-capable > > LoongArch hardware and using a math library with LP64S/F ABI but > > native > > double float HW instructions, for a better performance. > >=20 > > And now a case in Linux kernel has again proven the usefulness of > > this > > kind of combination.=C2=A0 The AMDGPU DCN kernel driver needs to perfor= m > > some > > floating-point operation, but the entire kernel uses LP64S ABI.=C2=A0 S= o > > the > > translation units of the AMDGPU DCN driver need to be compiled with > > -mfpu=3D64 (the kernel lacks soft-FP routines in libgcc), but - > > mabi=3Dlp64s > > (or you can't link it with the other part of the kernel). > >=20 > > Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to > > determine the floating calling convention.=C2=A0 This causes "-mfpu=3D6= 4" > > silently allow using $fa* to pass parameters and return values EVEN > > IF > > -mabi=3Dlp64s is used.=C2=A0 To make things worse, the generated object > > file > > has SOFT-FLOAT set in the eflags field so the linker will happily > > link > > it with other LP64S ABI object files, but obviously this will lead > > to > > bad results at runtime.=C2=A0 And for now all loongarch64 CPU models (- > > march > > settings) implies -mfpu=3D64 on by default, so the issue makes a > > single > > "-mabi=3Dlp64s" option basically broken (fortunately most projects for > > eg > > the Linux kernel have used -msoft-float which implies both - > > mabi=3Dlp64s > > and -mfpu=3Dnone as we've recommended in the toolchain convention > > doc). > >=20 > > The fix is simple: use TARGET_*_FLOAT_ABI instead. > >=20 > > I consider this a bug fix: the behavior difference from the > > toolchain > > convention doc is a bug, and generating object files with SOFT-FLOAT > > flag but parameters/return values passed through FPRs is definitely > > a > > bug. > >=20 > > Bootstrapped and regtested on loongarch64-linux-gnu.=C2=A0 Ok for trunk > > and > > release/gcc-12 branch? >=20 > LGTM! >=20 > Thanks! >=20 > >=20 > > gcc/ChangeLog: > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0PR target/109000 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* config/loongarch/loon= garch.h (FP_RETURN): Use > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0TARGET_*_FLOAT_ABI inst= ead of TARGET_*_FLOAT. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(UNITS_PER_FP_ARG): Lik= ewise. > >=20 > > gcc/testsuite/ChangeLog: > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0PR target/109000 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* gcc.target/loongarch/= flt-abi-isa-1.c: New test. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* gcc.target/loongarch/= flt-abi-isa-2.c: New test. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* gcc.target/loongarch/= flt-abi-isa-3.c: New test. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* gcc.target/loongarch/= flt-abi-isa-4.c: New test. > > --- > > =C2=A0 gcc/config/loongarch/loongarch.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 |=C2=A0 4 ++-- > > =C2=A0 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c | 14 > > ++++++++++++++ > > =C2=A0 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c | 10 ++++++++= ++ > > =C2=A0 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c |=C2=A0 9 +++= ++++++ > > =C2=A0 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c | 10 ++++++++= ++ > > =C2=A0 5 files changed, 45 insertions(+), 2 deletions(-) > > =C2=A0 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-is= a- > > 1.c > > =C2=A0 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-is= a- > > 2.c > > =C2=A0 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-is= a- > > 3.c > > =C2=A0 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-is= a- > > 4.c > >=20 > > diff --git a/gcc/config/loongarch/loongarch.h > > b/gcc/config/loongarch/loongarch.h > > index f4e903d46bb..f8167875646 100644 > > --- a/gcc/config/loongarch/loongarch.h > > +++ b/gcc/config/loongarch/loongarch.h > > @@ -676,7 +676,7 @@ enum reg_class > > =C2=A0=C2=A0=C2=A0=C2=A0 point values.=C2=A0 */ > > =C2=A0=20 > > =C2=A0 #define GP_RETURN (GP_REG_FIRST + 4) > > -#define FP_RETURN ((TARGET_SOFT_FLOAT) ? GP_RETURN : (FP_REG_FIRST > > + 0)) > > +#define FP_RETURN ((TARGET_SOFT_FLOAT_ABI) ? GP_RETURN : > > (FP_REG_FIRST + 0)) > > =C2=A0=20 > > =C2=A0 #define MAX_ARGS_IN_REGISTERS 8 > > =C2=A0=20 > > @@ -1154,6 +1154,6 @@ struct GTY (()) machine_function > > =C2=A0 /* The largest type that can be passed in floating-point > > registers.=C2=A0 */ > > =C2=A0 /* TODO: according to mabi.=C2=A0 */ > > =C2=A0 #define UNITS_PER_FP_ARG=C2=A0 \ > > -=C2=A0 (TARGET_HARD_FLOAT ? (TARGET_DOUBLE_FLOAT ? 8 : 4) : 0) > > +=C2=A0 (TARGET_HARD_FLOAT_ABI ? (TARGET_DOUBLE_FLOAT_ABI ? 8 : 4) : 0) > > =C2=A0=20 > > =C2=A0 #define FUNCTION_VALUE_REGNO_P(N) ((N) =3D=3D GP_RETURN || (N) = =3D=3D > > FP_RETURN) > > diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c > > b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c > > new file mode 100644 > > index 00000000000..1c9490f6a87 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c > > @@ -0,0 +1,14 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-mabi=3Dlp64d -mfpu=3D64 -march=3Dloongarch64 -O2" } = */ > > +/* { dg-final { scan-assembler "frecip\\.d" } } */ > > +/* { dg-final { scan-assembler-not "movgr2fr\\.d" } } */ > > +/* { dg-final { scan-assembler-not "movfr2gr\\.d" } } */ > > + > > +/* FPU is used for calculation and FPR is used for arguments and > > return > > +=C2=A0=C2=A0 values.=C2=A0 */ > > + > > +double > > +t (double x) > > +{ > > +=C2=A0 return 1.0 / x; > > +} > > diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c > > b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c > > new file mode 100644 > > index 00000000000..0580fd65d3a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c > > @@ -0,0 +1,10 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-mabi=3Dlp64s -mfpu=3D64 -march=3Dloongarch64 -O2" } = */ > > +/* { dg-final { scan-assembler "frecip\\.d" } } */ > > +/* { dg-final { scan-assembler "movgr2fr\\.d" } } */ > > +/* { dg-final { scan-assembler "movfr2gr\\.d" } } */ > > + > > +/* FPU is used for calculation but FPR cannot be used for arguments > > and > > +=C2=A0=C2=A0 return values.=C2=A0 */ > > + > > +#include "flt-abi-isa-1.c" > > diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c > > b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c > > new file mode 100644 > > index 00000000000..16a926f57a1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-mabi=3Dlp64s -mfpu=3Dnone -march=3Dloongarch64 -O2" = } > > */ > > +/* { dg-final { scan-assembler-not "frecip\\.d" } } */ > > +/* { dg-final { scan-assembler-not "movgr2fr\\.d" } } */ > > +/* { dg-final { scan-assembler-not "movfr2gr\\.d" } } */ > > + > > +/* FPU cannot be used at all.=C2=A0 */ > > + > > +#include "flt-abi-isa-1.c" > > diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c > > b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c > > new file mode 100644 > > index 00000000000..43b579c3fac > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c > > @@ -0,0 +1,10 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-msoft-float -march=3Dloongarch64 -O2" } */ > > +/* { dg-final { scan-assembler-not "frecip\\.d" } } */ > > +/* { dg-final { scan-assembler-not "movgr2fr\\.d" } } */ > > +/* { dg-final { scan-assembler-not "movfr2gr\\.d" } } */ > > + > > +/* -msoft-float implies both -mabi=3Dlp64s and -mfpu=3Dnone. > > +=C2=A0=C2=A0 FPU cannot be used at all.=C2=A0 */ > > + > > +#include "flt-abi-isa-1.c" >=20 --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University