From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [IPv6:2001:470:683e::1]) by sourceware.org (Postfix) with ESMTPS id 06F27382FCAF for ; Thu, 17 Nov 2022 06:28:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 06F27382FCAF Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1668666532; bh=JqaXNumYQ/YB6hGoRn6syqLgObn4z+CgGE/A+GxQoQk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=OLpKw5L3C4UxvhI4HI5AaZtDh99RJgzGpSh36BHbtbnCOPSbzS1kdEjqfzTciWPbq jn4W67MZomLJatU09LNSXDftje+vmSD9RXQHhRIlKwFl2FgIpWHe07SIx4XgFl+2/r HI8bV2B0E3Qe92izHYdp5P4BD61CEWopUWS0CEmQ= Received: from localhost.localdomain (xry111.site [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id C62F265C1F; Thu, 17 Nov 2022 01:28:50 -0500 (EST) Message-ID: <2ad47ff810d7c819a20f7f7d486ec507eed1ef4e.camel@xry111.site> Subject: Re: [PATCH v3] LoongArch: Add prefetch instructions. From: Xi Ruoyao To: Lulu Cheng , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn, xujiahao Date: Thu, 17 Nov 2022 14:28:48 +0800 In-Reply-To: <20221116021027.519897-1-chenglulu@loongson.cn> References: <20221116021027.519897-1-chenglulu@loongson.cn> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.0 MIME-Version: 1.0 X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_SUSPICIOUS_NTLD,GIT_PATCH_0,LIKELY_SPAM_FROM,PDS_OTHER_BAD_TLD,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: LGTM. A minor issue is "enabling -fprefetch-loop-arrays at -O3" is not documented, but AArch64 and i386 are already doing this anyway. We can add the fact into the doc later. On Wed, 2022-11-16 at 10:10 +0800, Lulu Cheng wrote: > v2 -> v3: > 1. Remove preldx support. >=20 > --------------------------------------- > Enable sw prefetching at -O3 and higher. >=20 > Co-Authored-By: xujiahao >=20 > gcc/ChangeLog: >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* config/loongarch/constr= aints.md (ZD): New constraint. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* config/loongarch/loonga= rch-def.c: Initial number of parallel > prefetch. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* config/loongarch/loonga= rch-tune.h (struct loongarch_cache): > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Define number of parallel= prefetch. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* config/loongarch/loonga= rch.cc > (loongarch_option_override_internal): > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Set up parameters to be u= sed in prefetching algorithm. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* config/loongarch/loonga= rch.md (prefetch): New template. > --- > =C2=A0gcc/config/loongarch/constraints.md=C2=A0=C2=A0 | 10 ++++++++++ > =C2=A0gcc/config/loongarch/loongarch-def.c=C2=A0 |=C2=A0 2 ++ > =C2=A0gcc/config/loongarch/loongarch-tune.h |=C2=A0 1 + > =C2=A0gcc/config/loongarch/loongarch.cc=C2=A0=C2=A0=C2=A0=C2=A0 | 28 > +++++++++++++++++++++++++++ > =C2=A0gcc/config/loongarch/loongarch.md=C2=A0=C2=A0=C2=A0=C2=A0 | 14 ++++= ++++++++++ > =C2=A05 files changed, 55 insertions(+) >=20 > diff --git a/gcc/config/loongarch/constraints.md > b/gcc/config/loongarch/constraints.md > index 43cb7b5f0f5..46f7f63ae31 100644 > --- a/gcc/config/loongarch/constraints.md > +++ b/gcc/config/loongarch/constraints.md > @@ -86,6 +86,10 @@ > =C2=A0;;=C2=A0=C2=A0=C2=A0 "ZB" > =C2=A0;;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "An address that is held in a gene= ral-purpose register. > =C2=A0;;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 The offset is zero" > +;;=C2=A0=C2=A0=C2=A0 "ZD" > +;;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0"An address operand whose address is for= med by a base register > +;;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 and offset that is suitable for use in = instructions with the > same > +;;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 addressing mode as @code{preld}." > =C2=A0;; "<" "Matches a pre-dec or post-dec operand." (Global non- > architectural) > =C2=A0;; ">" "Matches a pre-inc or post-inc operand." (Global non- > architectural) > =C2=A0 > @@ -190,3 +194,9 @@ (define_memory_constraint "ZB" > =C2=A0=C2=A0 The offset is zero" > =C2=A0=C2=A0 (and (match_code "mem") > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (match_test "REG_P (XEXP (op, = 0))"))) > + > +(define_address_constraint "ZD" > +=C2=A0 "An address operand whose address is formed by a base register > +=C2=A0=C2=A0 and offset that is suitable for use in instructions with th= e same > +=C2=A0=C2=A0 addressing mode as @code{preld}." > +=C2=A0=C2=A0 (match_test "loongarch_12bit_offset_address_p (op, mode)")) > diff --git a/gcc/config/loongarch/loongarch-def.c > b/gcc/config/loongarch/loongarch-def.c > index cbf995d81b5..80ab10a52a8 100644 > --- a/gcc/config/loongarch/loongarch-def.c > +++ b/gcc/config/loongarch/loongarch-def.c > @@ -62,11 +62,13 @@ loongarch_cpu_cache[N_TUNE_TYPES] =3D { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .l1d_line_size =3D 64, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .l1d_size =3D 64, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .l2d_size =3D 256, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .simultaneous_prefetches =3D 4, > =C2=A0=C2=A0 }, > =C2=A0=C2=A0 [CPU_LA464] =3D { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .l1d_line_size =3D 64, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .l1d_size =3D 64, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .l2d_size =3D 256, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .simultaneous_prefetches =3D 4, > =C2=A0=C2=A0 }, > =C2=A0}; > =C2=A0 > diff --git a/gcc/config/loongarch/loongarch-tune.h > b/gcc/config/loongarch/loongarch-tune.h > index 6f3530f5c02..8e3eb29472b 100644 > --- a/gcc/config/loongarch/loongarch-tune.h > +++ b/gcc/config/loongarch/loongarch-tune.h > @@ -45,6 +45,7 @@ struct loongarch_cache { > =C2=A0=C2=A0=C2=A0=C2=A0 int l1d_line_size;=C2=A0 /* bytes */ > =C2=A0=C2=A0=C2=A0=C2=A0 int l1d_size;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 /* KiB */ > =C2=A0=C2=A0=C2=A0=C2=A0 int l2d_size;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 /* kiB */ > +=C2=A0=C2=A0=C2=A0 int simultaneous_prefetches; /* number of parallel pr= efetch */ > =C2=A0}; > =C2=A0 > =C2=A0#endif /* LOONGARCH_TUNE_H */ > diff --git a/gcc/config/loongarch/loongarch.cc > b/gcc/config/loongarch/loongarch.cc > index 8d5d8d965dd..8ee32c90573 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.=C2=A0 If not see > =C2=A0#include "context.h" > =C2=A0#include "builtins.h" > =C2=A0#include "rtl-iter.h" > +#include "opts.h" > =C2=A0 > =C2=A0/* This file should be included last.=C2=A0 */ > =C2=A0#include "target-def.h" > @@ -6100,6 +6101,33 @@ loongarch_option_override_internal (struct > gcc_options *opts) > =C2=A0=C2=A0 if (loongarch_branch_cost =3D=3D 0) > =C2=A0=C2=A0=C2=A0=C2=A0 loongarch_branch_cost =3D loongarch_cost->branch= _cost; > =C2=A0 > +=C2=A0 /* Set up parameters to be used in prefetching algorithm.=C2=A0 *= / > +=C2=A0 int simultaneous_prefetches > +=C2=A0=C2=A0=C2=A0 =3D loongarch_cpu_cache[LARCH_ACTUAL_TUNE].simultaneo= us_prefetches; > + > +=C2=A0 SET_OPTION_IF_UNSET (opts, &global_options_set, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 param_simultaneous_p= refetches, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 simultaneous_prefetc= hes); > + > +=C2=A0 SET_OPTION_IF_UNSET (opts, &global_options_set, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 param_l1_cache_line_= size, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_line_size); > + > +=C2=A0 SET_OPTION_IF_UNSET (opts, &global_options_set, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 param_l1_cache_size, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_size); > + > +=C2=A0 SET_OPTION_IF_UNSET (opts, &global_options_set, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 param_l2_cache_size, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l2d_size); > + > + > +=C2=A0 /* Enable sw prefetching at -O3 and higher.=C2=A0 */ > +=C2=A0 if (opts->x_flag_prefetch_loop_arrays < 0 > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && (opts->x_optimize >=3D 3 || opts->x_fl= ag_profile_use) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && !opts->x_optimize_size) > +=C2=A0=C2=A0=C2=A0 opts->x_flag_prefetch_loop_arrays =3D 1; > + > =C2=A0=C2=A0 if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib) > =C2=A0=C2=A0=C2=A0=C2=A0 error ("%qs cannot be used for compiling a share= d library", > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "-mdirect-ex= tern-access"); > diff --git a/gcc/config/loongarch/loongarch.md > b/gcc/config/loongarch/loongarch.md > index 682ab961741..2fda5381904 100644 > --- a/gcc/config/loongarch/loongarch.md > +++ b/gcc/config/loongarch/loongarch.md > @@ -3282,6 +3282,20 @@ (define_expand "untyped_call" > =C2=A0;;=C2=A0 .................... > =C2=A0;; > =C2=A0 > +(define_insn "prefetch" > +=C2=A0 [(prefetch (match_operand 0 "address_operand" "ZD") > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (matc= h_operand 1 "const_int_operand" "n") > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (matc= h_operand 2 "const_int_operand" "n"))] > +=C2=A0 "" > +{ > +=C2=A0 switch (INTVAL (operands[1])) > +=C2=A0 { > +=C2=A0=C2=A0=C2=A0 case 0: return "preld\t0,%a0"; > +=C2=A0=C2=A0=C2=A0 case 1: return "preld\t8,%a0"; > +=C2=A0=C2=A0=C2=A0 default: gcc_unreachable (); > +=C2=A0 } > +}) > + > =C2=A0(define_insn "nop" > =C2=A0=C2=A0 [(const_int 0)] > =C2=A0=C2=A0 "" --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University