From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 467343858D1E for ; Wed, 29 Nov 2023 07:12:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 467343858D1E Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 467343858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701241963; cv=none; b=BPrvUDb1KvGAeN/juoLpVHFzl6IZl/X78UYc1pOUrbCYimcLpvcM9qLqVgWb1HAom1en1cloAktxsbJg5Iohfy3insMW1JuP2PySYcfnFEYNwAeQAXwFAqoB31M1p3TveNMDtvl0TuC3Lf6jm5frIP9e6q7IcA/1Z0pESQU/sZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701241963; c=relaxed/simple; bh=PFnKpE2pvP2fiSzIIBwejFl0NgjR3LoMRMB+637C/OY=; h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=qWk8BIg7H9ujyiu2hq53yazrFwgam2P1tONLvluajJdW+ueAIoh4I81SJU/fXnYQG/uM7xKPN01RBKJk20vkulqq155xiUTv5DhXYNcbWQlaSb2Y93VILe3E0wBRGEFYuRpibZPB9jOpVjjCuxMj3Rew2KRSRhstg4R0VBrhBnU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1701241961; bh=PFnKpE2pvP2fiSzIIBwejFl0NgjR3LoMRMB+637C/OY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=RaxSLawAVj2rGS7sUoHUtn7XIvmUrw/K3GJ88zpJGGgSnPGFG1MJGO2fG8t9xfjcg qm7wW2ddFDIxA/1fmCu+YBAsvC5KWWKDGeemk1KzWLY6mX/RYqedV37msTAP3EQvHy kuzNLZhBtmY6+QHaOmJoaFDS6esmGQtO00SuZPCM= Received: from [127.0.0.1] (unknown [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 3859B66938; Wed, 29 Nov 2023 02:12:38 -0500 (EST) Message-ID: Subject: Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations From: Xi Ruoyao To: gcc-patches@gcc.gnu.org Cc: chenglulu , i@xen0n.name, xuchenghua@loongson.cn Date: Wed, 29 Nov 2023 15:12:36 +0800 In-Reply-To: <20231120004728.205167-1-xry111@xry111.site> References: <20231120004728.205167-1-xry111@xry111.site> Autocrypt: addr=xry111@xry111.site; prefer-encrypt=mutual; keydata=mDMEYnkdPhYJKwYBBAHaRw8BAQdAsY+HvJs3EVKpwIu2gN89cQT/pnrbQtlvd6Yfq7egugi0HlhpIFJ1b3lhbyA8eHJ5MTExQHhyeTExMS5zaXRlPoiTBBMWCgA7FiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQrKrSDhnnEOPHFgD8D9vUToTd1MF5bng9uPJq5y3DfpcxDp+LD3joA3U2TmwA/jZtN9xLH7CGDHeClKZK/ZYELotWfJsqRcthOIGjsdAPuDgEYnkdPhIKKwYBBAGXVQEFAQEHQG+HnNiPZseiBkzYBHwq/nN638o0NPwgYwH70wlKMZhRAwEIB4h4BBgWCgAgFiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwwACgkQrKrSDhnnEOPjXgD/euD64cxwqDIqckUaisT3VCst11RcnO5iRHm6meNIwj0BALLmWplyi7beKrOlqKfuZtCLbiAPywGfCNg8LOTt4iMD Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.1 MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote: > The [1/5] patch is the PR112578 fix at > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html. > It has been changed to remove the nearbyint pattern (because nearbyint > should not raise FE_INEXACT even if -ffp-int-builtin-inexact). > As other patches depending on the simd.md file introduced by this, sendin= g > it as the first of this series. >=20 > As many LASX instructions are only differentiated from the corresponding > LSX instruction with operand length, create simd.md file to contain the > RTX templates sharable by LSX and LASX.=C2=A0 This makes the code cleaner= and > easier to maintain. >=20 > The [2/5] and [3/5] patches make vector product highpart and rotate > shift operations for GNU vectors and auto vectorization. >=20 > The [4/5] patch is a simple code cleanup, with no function change. >=20 > The [5/5] patch uses LSX for FP scalar rounding operations if LSX is > available and -ffp-int-builtin-exact.=C2=A0 We do this because the base F= P > ISA does not have such instructions.=C2=A0 Using LSX is overkill, but sti= ll > much faster than calling libc functions. >=20 > Bootstrapped and regtested on loongarch64-linux-gnu.=C2=A0 Ok for trunk? Pushed r14-5950 .. r14-5954 with minor change: a FSF copyright disclaimer is added into simd.md in the 1st patch, and an used match_scratch is removed from 2 in the 5th patch. > Xi Ruoyao (5): > =C2=A0 LoongArch: Fix usage of LSX and LASX frint/ftint instructions > =C2=A0=C2=A0=C2=A0 [PR112578] > =C2=A0 LoongArch: Use standard pattern name and RTX code for LSX/LASX muh > =C2=A0=C2=A0=C2=A0 instructions > =C2=A0 LoongArch: Use standard pattern name and RTX code for LSX/LASX rot= ate > =C2=A0=C2=A0=C2=A0 shift > =C2=A0 LoongArch: Remove lrint_allow_inexact > =C2=A0 LoongArch: Use LSX for scalar FP rounding with explicit rounding m= ode >=20 > =C2=A0gcc/config/loongarch/lasx.md=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 283 -----= ------------ > =C2=A0gcc/config/loongarch/loongarch-builtins.cc=C2=A0=C2=A0=C2=A0 |=C2= =A0 52 ++-- > =C2=A0gcc/config/loongarch/loongarch.md=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 12 +- > =C2=A0gcc/config/loongarch/lsx.md=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 293= ------------------ > =C2=A0gcc/config/loongarch/simd.md=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 268 +++++= +++++++++++ > =C2=A0.../loongarch/vect-frint-no-inexact.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 |=C2=A0 48 +++ > =C2=A0.../loongarch/vect-frint-scalar-no-inexact.c=C2=A0 |=C2=A0 23 ++ > =C2=A0.../gcc.target/loongarch/vect-frint-scalar.c=C2=A0 |=C2=A0 43 +++ > =C2=A0.../gcc.target/loongarch/vect-frint.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 |=C2=A0 85 +++++ > =C2=A0.../loongarch/vect-ftint-no-inexact.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 |=C2=A0 44 +++ > =C2=A0.../gcc.target/loongarch/vect-ftint.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 |=C2=A0 83 +++++ > =C2=A0gcc/testsuite/gcc.target/loongarch/vect-muh.c |=C2=A0 36 +++ > =C2=A0.../gcc.target/loongarch/vect-rotr.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 36 +++ > =C2=A013 files changed, 701 insertions(+), 605 deletions(-) > =C2=A0create mode 100644 gcc/config/loongarch/simd.md > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no= -inexact.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-sc= alar-no-inexact.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-sc= alar.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no= -inexact.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c > =C2=A0create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University