From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id BC3533857B86 for ; Tue, 24 May 2022 19:38:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BC3533857B86 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pg1-x52b.google.com with SMTP id a9so15110448pgv.12 for ; Tue, 24 May 2022 12:38:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=oS2Z+kRzhviA2E+vriWiJkelr36XMYch1mnbhCPLSQg=; b=Zx6gYDga4UjjByJH1biIwioxiBBfXR3wPVrxyMYoDR1CRxPfOzJUqdXC+mE1Rvmepz GHbqNIPCNrGuyzXabf6Qgim1jIsPwYy6p3A0hsIFUAdda8ofZxduxnE7NHCKsjEz6M2k 06ZXUjbSRlOMRjA4kiqhN52zelb0iDw4q1sBVClUWXawjRcSAF3hb9csWMrCM+RYiFqw OORmKUO1TKVBbn/7SfQvtLWElNwigYYJsSXbMTRC3rti/y219JS154OTkpKMSlDsct2b UflWC7myipi/9OfwhbvaCM1jZ1fJA6X3iAx5G5KRomjeSTUaDykvOWQlEwvgPZQSYaK+ pDsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=oS2Z+kRzhviA2E+vriWiJkelr36XMYch1mnbhCPLSQg=; b=54f3XB3yKxtiZ0s9aUKrwTLjLFUyuHwtob8jYkBbLMh7RuSO/yUi5Ric32pOUQ14mT ZeR6uUeWzvv1hMMopvalOvrgtCjAcH7jkRtpbngBGg/Tg4u4ELXpf5k1jg5BebVwSPIM lUouj9x6xwJAkL7Tx4n7ZrbV3ljzQqi4LtVl6WErP8HrkPlfYiFHe81f8kuuXwxcOkk4 9+3U4K+kFIhdjz+wnv1i4upy5Ng65Gbc/tmJ/qQY7CXkcZ4VGpxBVcyUXrrW08buT3Zj xgkhVF8GlCw+QnaSkE8v8iXfFAHbHBkIIz4B3zFf1knczYd41bIzw3XW4vZK9Y2udiOt A6nQ== X-Gm-Message-State: AOAM530HZiFTQC8BUbZDIxiH9zaa65QtG2bPLjQO4pyBqORLwtH3VPXM fHDQaex+eDRgBjzk7RF9LE5lXA== X-Google-Smtp-Source: ABdhPJzeumN9kYOXRMVZiuk/gvDvHq4qSCCJuF12U+mUpwkFaeLpB3TUpzSNlEhub+//WJJrOZRgCg== X-Received: by 2002:a62:1413:0:b0:518:4259:200e with SMTP id 19-20020a621413000000b005184259200emr27751730pfu.41.1653421083647; Tue, 24 May 2022 12:38:03 -0700 (PDT) Received: from [192.168.50.116] (c-24-4-73-83.hsd1.ca.comcast.net. [24.4.73.83]) by smtp.gmail.com with ESMTPSA id s7-20020a056a001c4700b0050dc76281dcsm9662518pfw.182.2022.05.24.12.38.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 May 2022 12:38:03 -0700 (PDT) Message-ID: Date: Tue, 24 May 2022 12:38:02 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH] [PR/target 105666] RISC-V: Inhibit FP <--> int register moves via tune param Content-Language: en-US To: Kito Cheng , Philipp Tomsich Cc: Andrew Waterman , GCC Patches , gnu-toolchain@rivosinc.com References: <20220523181209.2208136-1-vineetg@rivosinc.com> From: Vineet Gupta In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 May 2022 19:38:07 -0000 On 5/24/22 00:59, Kito Cheng wrote: > Committed, thanks! Thx for the quick action Kito, Can this be backported to gcc 12 as well ? Thx, -Vineet > > On Tue, May 24, 2022 at 3:40 AM Philipp Tomsich > wrote: >> Good catch! >> >> On Mon, 23 May 2022 at 20:12, Vineet Gupta wrote: >> >>> Under extreme register pressure, compiler can use FP <--> int >>> moves as a cheap alternate to spilling to memory. >>> This was seen with SPEC2017 FP benchmark 507.cactu: >>> ML_BSSN_Advect.cc:ML_BSSN_Advect_Body() >>> >>> | fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 >>> | .LVL325: >>> | ld s9,184(sp) # _12469, %sfp >>> | ... >>> | .LVL339: >>> | fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 >>> | >>> >>> The FMV instructions could be costlier (than stack spill) on certain >>> micro-architectures, thus this needs to be a per-cpu tunable >>> (default being to inhibit on all existing RV cpus). >>> >>> Testsuite run with new test reports 10 failures without the fix >>> corresponding to the build variations of pr105666.c >>> >>> | === gcc Summary === >>> | >>> | # of expected passes 123318 (+10) >>> | # of unexpected failures 34 (-10) >>> | # of unexpected successes 4 >>> | # of expected failures 780 >>> | # of unresolved testcases 4 >>> | # of unsupported tests 2796 >>> >>> gcc/Changelog: >>> >>> * config/riscv/riscv.cc: (struct riscv_tune_param): Add >>> fmv_cost. >>> (rocket_tune_info): Add default fmv_cost 8. >>> (sifive_7_tune_info): Ditto. >>> (thead_c906_tune_info): Ditto. >>> (optimize_size_tune_info): Ditto. >>> (riscv_register_move_cost): Use fmv_cost for int<->fp moves. >>> >>> gcc/testsuite/Changelog: >>> >>> * gcc.target/riscv/pr105666.c: New test. >>> >>> Signed-off-by: Vineet Gupta >>> --- >>> gcc/config/riscv/riscv.cc | 9 ++++ >>> gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++ >>> 2 files changed, 64 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c >>> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >>> index ee756aab6940..f3ac0d8865f0 100644 >>> --- a/gcc/config/riscv/riscv.cc >>> +++ b/gcc/config/riscv/riscv.cc >>> @@ -220,6 +220,7 @@ struct riscv_tune_param >>> unsigned short issue_rate; >>> unsigned short branch_cost; >>> unsigned short memory_cost; >>> + unsigned short fmv_cost; >>> bool slow_unaligned_access; >>> }; >>> >>> @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info >>> = { >>> 1, /* issue_rate */ >>> 3, /* branch_cost */ >>> 5, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> true, /* >>> slow_unaligned_access */ >>> }; >>> >>> @@ -298,6 +300,7 @@ static const struct riscv_tune_param >>> sifive_7_tune_info = { >>> 2, /* issue_rate */ >>> 4, /* branch_cost */ >>> 3, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> true, /* >>> slow_unaligned_access */ >>> }; >>> >>> @@ -311,6 +314,7 @@ static const struct riscv_tune_param >>> thead_c906_tune_info = { >>> 1, /* issue_rate */ >>> 3, /* branch_cost */ >>> 5, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> false, /* slow_unaligned_access */ >>> }; >>> >>> @@ -324,6 +328,7 @@ static const struct riscv_tune_param >>> optimize_size_tune_info = { >>> 1, /* issue_rate */ >>> 1, /* branch_cost */ >>> 2, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> false, /* slow_unaligned_access */ >>> }; >>> >>> @@ -4737,6 +4742,10 @@ static int >>> riscv_register_move_cost (machine_mode mode, >>> reg_class_t from, reg_class_t to) >>> { >>> + if ((from == FP_REGS && to == GR_REGS) || >>> + (from == GR_REGS && to == FP_REGS)) >>> + return tune_param->fmv_cost; >>> + >>> return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; >>> } >>> >>> diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c >>> b/gcc/testsuite/gcc.target/riscv/pr105666.c >>> new file mode 100644 >>> index 000000000000..904f3bc0763f >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c >>> @@ -0,0 +1,55 @@ >>> +/* Shamelessly plugged off >>> gcc/testsuite/gcc.c-torture/execute/pr28982a.c. >>> + >>> + The idea is to induce high register pressure for both int/fp registers >>> + so that they spill. By default FMV instructions would be used to stash >>> + int reg to a fp reg (and vice-versa) but that could be costlier than >>> + spilling to stack. */ >>> + >>> +/* { dg-do compile } */ >>> +/* { dg-options "-march=rv64g -ffast-math" } */ >>> + >>> +#define NITER 4 >>> +#define NVARS 20 >>> +#define MULTI(X) \ >>> + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ >>> + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) >>> + >>> +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] >>> +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 >>> +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), >>> ptr##INDEX += inc##INDEX >>> +#define COPYOUT(INDEX) results[INDEX] = result##INDEX >>> + >>> +double *ptrs[NVARS]; >>> +double results[NVARS]; >>> +int incs[NVARS]; >>> + >>> +void __attribute__((noinline)) >>> +foo (int n) >>> +{ >>> + int MULTI (DECLAREI); >>> + double MULTI (DECLAREF); >>> + while (n--) >>> + MULTI (LOOP); >>> + MULTI (COPYOUT); >>> +} >>> + >>> +double input[NITER * NVARS]; >>> + >>> +int >>> +main (void) >>> +{ >>> + int i; >>> + >>> + for (i = 0; i < NVARS; i++) >>> + ptrs[i] = input + i, incs[i] = i; >>> + for (i = 0; i < NITER * NVARS; i++) >>> + input[i] = i; >>> + foo (NITER); >>> + for (i = 0; i < NVARS; i++) >>> + if (results[i] != i * NITER * (NITER + 1) / 2) >>> + return 1; >>> + return 0; >>> +} >>> + >>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ >>> +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */ >>> -- >>> 2.32.0 >>> >>>