From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62e.google.com (mail-ej1-x62e.google.com [IPv6:2a00:1450:4864:20::62e]) by sourceware.org (Postfix) with ESMTPS id B344F3826FCA for ; Thu, 2 Jun 2022 02:12:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B344F3826FCA Received: by mail-ej1-x62e.google.com with SMTP id me5so6709632ejb.2 for ; Wed, 01 Jun 2022 19:12:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XOcFA//I2utfYavkoevlPjKpT7ivabJSp6UF0fA8CmQ=; b=3gwzboXhcxHywjPZBTzxEjptWDyyEMlsNdAoavvxQEuWaDimHvXz6eq1SoIU582nAV jyNXxQqLU3YCdDQNTHCsz0V4p57xi8kHAmVQysfFnzEI2KVJMBpe7iKLcK/mcNGa1k7F X38KxeSI8wYLWzbF8eXiEGahT1KUUc7ze73lkrIguOj9sJN1Xeeko2MFV1ZKnLQTAfh/ guBXP4gmA/+gpEUdgLMj+UO/3c0Enza3swhKq/wRzTNTl2NOTdEUlquKVY5YWzZ0BsFu jaftizwqICP2wVnV8H+8WJnpEKvIuec5WF7hNi/f9TOEFD59jgI+DQiMIdD/oj/sBTHO PpoQ== X-Gm-Message-State: AOAM530o8BuIDw9wN1isIpDzcbOFeSxFkFGehID7a5nBlK6rj2utzAIX +Ta2R2Vl1ObRb+/Q32N1mPei+LS7cnqLM5oLhdQ= X-Google-Smtp-Source: ABdhPJzEEAu4to6hrlxx21pLcnrAoHmPw2QgTvr/2WDVzH9NppF2KrJnxmhdrna6SqOFjAdPfrtxPOrhw81w6CWT0QU= X-Received: by 2002:a17:907:a420:b0:6ff:7864:994 with SMTP id sg32-20020a170907a42000b006ff78640994mr2228600ejc.626.1654135921157; Wed, 01 Jun 2022 19:12:01 -0700 (PDT) MIME-Version: 1.0 References: <20220523181209.2208136-1-vineetg@rivosinc.com> In-Reply-To: From: Kito Cheng Date: Thu, 2 Jun 2022 10:11:49 +0800 Message-ID: Subject: Re: [PATCH] [PR/target 105666] RISC-V: Inhibit FP <--> int register moves via tune param To: Vineet Gupta Cc: Philipp Tomsich , GCC Patches , Andrew Waterman , gnu-toolchain@rivosinc.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jun 2022 02:12:06 -0000 I just hesitated for a few days about backporting this, but I think it's OK to back port because 1. Simple enough 2. Good for general RISC-V core Committed with your latest testsuite fix. Thanks! On Wed, May 25, 2022 at 3:38 AM Vineet Gupta wrote: > > > > On 5/24/22 00:59, Kito Cheng wrote: > > Committed, thanks! > > Thx for the quick action Kito, > Can this be backported to gcc 12 as well ? > > Thx, > -Vineet > > > > > On Tue, May 24, 2022 at 3:40 AM Philipp Tomsich > > wrote: > >> Good catch! > >> > >> On Mon, 23 May 2022 at 20:12, Vineet Gupta wrote: > >> > >>> Under extreme register pressure, compiler can use FP <--> int > >>> moves as a cheap alternate to spilling to memory. > >>> This was seen with SPEC2017 FP benchmark 507.cactu: > >>> ML_BSSN_Advect.cc:ML_BSSN_Advect_Body() > >>> > >>> | fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > >>> | .LVL325: > >>> | ld s9,184(sp) # _12469, %sfp > >>> | ... > >>> | .LVL339: > >>> | fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > >>> | > >>> > >>> The FMV instructions could be costlier (than stack spill) on certain > >>> micro-architectures, thus this needs to be a per-cpu tunable > >>> (default being to inhibit on all existing RV cpus). > >>> > >>> Testsuite run with new test reports 10 failures without the fix > >>> corresponding to the build variations of pr105666.c > >>> > >>> | === gcc Summary === > >>> | > >>> | # of expected passes 123318 (+10) > >>> | # of unexpected failures 34 (-10) > >>> | # of unexpected successes 4 > >>> | # of expected failures 780 > >>> | # of unresolved testcases 4 > >>> | # of unsupported tests 2796 > >>> > >>> gcc/Changelog: > >>> > >>> * config/riscv/riscv.cc: (struct riscv_tune_param): Add > >>> fmv_cost. > >>> (rocket_tune_info): Add default fmv_cost 8. > >>> (sifive_7_tune_info): Ditto. > >>> (thead_c906_tune_info): Ditto. > >>> (optimize_size_tune_info): Ditto. > >>> (riscv_register_move_cost): Use fmv_cost for int<->fp moves. > >>> > >>> gcc/testsuite/Changelog: > >>> > >>> * gcc.target/riscv/pr105666.c: New test. > >>> > >>> Signed-off-by: Vineet Gupta > >>> --- > >>> gcc/config/riscv/riscv.cc | 9 ++++ > >>> gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++ > >>> 2 files changed, 64 insertions(+) > >>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c > >>> > >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > >>> index ee756aab6940..f3ac0d8865f0 100644 > >>> --- a/gcc/config/riscv/riscv.cc > >>> +++ b/gcc/config/riscv/riscv.cc > >>> @@ -220,6 +220,7 @@ struct riscv_tune_param > >>> unsigned short issue_rate; > >>> unsigned short branch_cost; > >>> unsigned short memory_cost; > >>> + unsigned short fmv_cost; > >>> bool slow_unaligned_access; > >>> }; > >>> > >>> @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info > >>> = { > >>> 1, /* issue_rate */ > >>> 3, /* branch_cost */ > >>> 5, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> true, /* > >>> slow_unaligned_access */ > >>> }; > >>> > >>> @@ -298,6 +300,7 @@ static const struct riscv_tune_param > >>> sifive_7_tune_info = { > >>> 2, /* issue_rate */ > >>> 4, /* branch_cost */ > >>> 3, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> true, /* > >>> slow_unaligned_access */ > >>> }; > >>> > >>> @@ -311,6 +314,7 @@ static const struct riscv_tune_param > >>> thead_c906_tune_info = { > >>> 1, /* issue_rate */ > >>> 3, /* branch_cost */ > >>> 5, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> false, /* slow_unaligned_access */ > >>> }; > >>> > >>> @@ -324,6 +328,7 @@ static const struct riscv_tune_param > >>> optimize_size_tune_info = { > >>> 1, /* issue_rate */ > >>> 1, /* branch_cost */ > >>> 2, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> false, /* slow_unaligned_access */ > >>> }; > >>> > >>> @@ -4737,6 +4742,10 @@ static int > >>> riscv_register_move_cost (machine_mode mode, > >>> reg_class_t from, reg_class_t to) > >>> { > >>> + if ((from == FP_REGS && to == GR_REGS) || > >>> + (from == GR_REGS && to == FP_REGS)) > >>> + return tune_param->fmv_cost; > >>> + > >>> return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; > >>> } > >>> > >>> diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c > >>> b/gcc/testsuite/gcc.target/riscv/pr105666.c > >>> new file mode 100644 > >>> index 000000000000..904f3bc0763f > >>> --- /dev/null > >>> +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c > >>> @@ -0,0 +1,55 @@ > >>> +/* Shamelessly plugged off > >>> gcc/testsuite/gcc.c-torture/execute/pr28982a.c. > >>> + > >>> + The idea is to induce high register pressure for both int/fp registers > >>> + so that they spill. By default FMV instructions would be used to stash > >>> + int reg to a fp reg (and vice-versa) but that could be costlier than > >>> + spilling to stack. */ > >>> + > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-march=rv64g -ffast-math" } */ > >>> + > >>> +#define NITER 4 > >>> +#define NVARS 20 > >>> +#define MULTI(X) \ > >>> + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ > >>> + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) > >>> + > >>> +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] > >>> +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 > >>> +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), > >>> ptr##INDEX += inc##INDEX > >>> +#define COPYOUT(INDEX) results[INDEX] = result##INDEX > >>> + > >>> +double *ptrs[NVARS]; > >>> +double results[NVARS]; > >>> +int incs[NVARS]; > >>> + > >>> +void __attribute__((noinline)) > >>> +foo (int n) > >>> +{ > >>> + int MULTI (DECLAREI); > >>> + double MULTI (DECLAREF); > >>> + while (n--) > >>> + MULTI (LOOP); > >>> + MULTI (COPYOUT); > >>> +} > >>> + > >>> +double input[NITER * NVARS]; > >>> + > >>> +int > >>> +main (void) > >>> +{ > >>> + int i; > >>> + > >>> + for (i = 0; i < NVARS; i++) > >>> + ptrs[i] = input + i, incs[i] = i; > >>> + for (i = 0; i < NITER * NVARS; i++) > >>> + input[i] = i; > >>> + foo (NITER); > >>> + for (i = 0; i < NVARS; i++) > >>> + if (results[i] != i * NITER * (NITER + 1) / 2) > >>> + return 1; > >>> + return 0; > >>> +} > >>> + > >>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ > >>> +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */ > >>> -- > >>> 2.32.0 > >>> > >>> >