From: Jeff Law <jlaw@ventanamicro.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: xry111@xry111.site, jiawei@iscas.ac.cn, gcc-patches@gcc.gnu.org,
kito.cheng@sifive.com, christoph.muellner@vrull.eu,
wuwei2016@iscas.ac.cn, shihua@iscas.ac.cn, shiyulong@iscas.ac.cn,
chenyixuan@iscas.ac.cn
Subject: Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)
Date: Mon, 25 Mar 2024 14:27:34 -0600 [thread overview]
Message-ID: <3315b6a9-96df-415d-b82f-806dada10154@ventanamicro.com> (raw)
In-Reply-To: <mhng-0c2148c5-3ef5-4480-8989-746d7deee700@palmer-ri-x1c9>
On 3/25/24 2:13 PM, Palmer Dabbelt wrote:
> On Mon, 25 Mar 2024 12:59:14 PDT (-0700), Jeff Law wrote:
>>
>>
>> On 3/25/24 1:48 PM, Xi Ruoyao wrote:
>>> On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:
>>>>> +/* Costs to use when optimizing for xiangshan nanhu. */
>>>>> +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
>>>>> + {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* fp_add */
>>>>> + {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* fp_mul */
>>>>> + {COSTS_N_INSNS (10), COSTS_N_INSNS (20)}, /* fp_div */
>>>>> + {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* int_mul */
>>>>> + {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
>>>>> + 6, /* issue_rate */
>>>>> + 3, /* branch_cost */
>>>>> + 3, /* memory_cost */
>>>>> + 3, /* fmv_cost */
>>>>> + true, /* slow_unaligned_access */
>>>>> + false, /* use_divmod_expansion */
>>>>> + RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH, /* fusible_ops */
>>>>> + NULL, /* vector cost */
>>>
>>>> Is your integer division really that fast? The table above essentially
>>>> says that your cpu can do integer division in 6 cycles.
>>>
>>> Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
>>> so forgive me for "hijacking" this thread...
>>>
>>> The problem seems integer division may spend different number of cycles
>>> for different inputs: on LoongArch LA664 I've observed 5 cycles for some
>>> inputs and 39 cycles for other inputs.
>>>
>>> So should we use the minimal value, the maximum value, or something in-
>>> between for TARGET_RTX_COSTS and pipeline descriptions?
>> Yea, early outs are relatively common in the actual hardware
>> implementation.
>>
>> The biggest reason to refine the cost of a division is so that we've got
>> a reasonably accurate cost for division by a constant -- which can often
>> be done with multiplication by reciprocal sequence. The multiplication
>> by reciprocal sequence will use mult, add, sub, shadd insns and you need
>> a reasonable cost model for those so you can compare against the cost of
>> a hardware division.
>>
>> So to answer your question. Choose something sensible, you probably
>> don't want the fastest case and you may not want the slowest case.
>
> Maybe we should have some sort of per-bit-set cost hook for mul/div?
> Without that we're kind of just guessing at whether the implmentation
> has early outs based on hueristics used to implicitly generate the cost
> models.
>
> Not sure that's really worth the complexity, though...
I'd doubt it's worth the complexity. Picking some reasonable value gets
you the vast majority of the benefit. Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger. So what's left is a
reasonable cost, particularly for the division-by-constant case where we
need a ceiling for synth_mult.
Jeff
next prev parent reply other threads:[~2024-03-25 20:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-27 8:52 [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture Jiawei
2024-03-19 2:54 ` Jeff Law
2024-03-19 12:43 ` jiawei
2024-03-25 19:48 ` TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.) Xi Ruoyao
2024-03-25 19:59 ` Jeff Law
2024-03-25 20:13 ` Palmer Dabbelt
2024-03-25 20:27 ` Jeff Law [this message]
2024-03-25 20:31 ` Palmer Dabbelt
2024-03-25 20:49 ` Jeff Law
2024-03-25 20:57 ` Palmer Dabbelt
2024-03-25 21:41 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3315b6a9-96df-415d-b82f-806dada10154@ventanamicro.com \
--to=jlaw@ventanamicro.com \
--cc=chenyixuan@iscas.ac.cn \
--cc=christoph.muellner@vrull.eu \
--cc=gcc-patches@gcc.gnu.org \
--cc=jiawei@iscas.ac.cn \
--cc=kito.cheng@sifive.com \
--cc=palmer@dabbelt.com \
--cc=shihua@iscas.ac.cn \
--cc=shiyulong@iscas.ac.cn \
--cc=wuwei2016@iscas.ac.cn \
--cc=xry111@xry111.site \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).