From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=rR8w=6O=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029])
	by sourceware.org (Postfix) with ESMTPS id B7FF43858D32
	for <gcc-patches@gcc.gnu.org>; Sat, 18 Feb 2023 21:57:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B7FF43858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pj1-x1029.google.com with SMTP id pw17-20020a17090b279100b00236a0d55d3aso1934602pjb.3
        for <gcc-patches@gcc.gnu.org>; Sat, 18 Feb 2023 13:57:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=HePokk7APiQX74aHmOR5z6txAyt0hdA/k+sy4Ax0c3M=;
        b=p12J7E9lAYOd2uq5zghfThKnIZhk3a13cykRxvV+I1lnP470xqWWF3JEZD3b9Daz9h
         LcsLMPgNIpuWMpFDbt4QI4zDHIdk9AWlOAVtt2tCtPtcBOU4gTiua5ooPdNqzr9mY6Hr
         +BUN83bA6L2DVT/Q+Ll2+LmIkkGExAKXnpqoOzBS3O+dQZYMQMF9LLC7+CYUOx4wykAc
         duPF9cR7qFfw2iFjpvB9eO0GDr5pXjYe3WocQWFdLTzNemsD84TW7wLF9oq8WhIMHsFR
         UTYx/shEVuKJanq+xm/HtRFChKyY6IKV7o1OqTrycDb64Ewi6qFvgjHIYgGetpHtfdMp
         Tmvg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=HePokk7APiQX74aHmOR5z6txAyt0hdA/k+sy4Ax0c3M=;
        b=HCTq/Y/4+vn5PMzGNXata16pxzqtn7zyBw9Sb9J5xFmHsCgqwWP2+XycrRr2drUNVF
         gQyYHUIEtNk9vtM2h1q/zor+ghdmTsvBxrfHfmnnVqOEC0uQ3ygIZKfziU/ZkGncd6qK
         jekTewyATm9JWns7iI+xgT2zmAjT/wFBKu0j5h0GCYwAmU16HF4jEnE0PhkMKUMWG2Nd
         YsyGYbYxbUVP3LzCejH6lRerzhwFAj/cQGM4hWFXJ9QueJr1lhS88uILtf4HbcEy6S2B
         ig6HKzvRpoTZWom24AGtq9yrw/SU6/0LWHqhXuMTWpG35tc2TVeCwk22I2vMGWsluWP0
         TNaw==
X-Gm-Message-State: AO0yUKXYzRxGWXBJSuL8HlCnJdMntUmiCOJ6eKZSV+lU+0uQVxb8GpVW
	mN6qf1743KtTOydXXtDZvSw=
X-Google-Smtp-Source: AK7set9Maf5CDbamGiwLHIeXswdxrLkJk8SjNYMoF3g/RnnuZ7pPAcApe3Hg1f3qT6jItlYsvV7mKg==
X-Received: by 2002:a17:90a:1955:b0:236:9e16:b49b with SMTP id 21-20020a17090a195500b002369e16b49bmr155292pjh.21.1676757439411;
        Sat, 18 Feb 2023 13:57:19 -0800 (PST)
Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a])
        by smtp.gmail.com with ESMTPSA id k123-20020a636f81000000b004f198707cdbsm3939842pgc.55.2023.02.18.13.57.18
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Sat, 18 Feb 2023 13:57:18 -0800 (PST)
Message-ID: <0610a1d2-fa60-7c03-02fa-60320f13cd84@gmail.com>
Date: Sat, 18 Feb 2023 14:57:17 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.7.1
Subject: Re: RISC-V: Add divmod instruction support
Content-Language: en-US
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: gcc-patches@gcc.gnu.org
References: <mhng-e6ef3c83-7428-420c-a8dc-428b624c54a3@palmer-ri-x1c9a>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <mhng-e6ef3c83-7428-420c-a8dc-428b624c54a3@palmer-ri-x1c9a>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 2/18/23 14:30, Palmer Dabbelt wrote:
> On Sat, 18 Feb 2023 13:06:02 PST (-0800), jeffreyalaw@gmail.com wrote:
>>
>>
>> On 2/18/23 11:26, Palmer Dabbelt wrote:
>>> On Fri, 17 Feb 2023 06:02:40 PST (-0800), gcc-patches@gcc.gnu.org wrote:
>>>> Hi all,
>>>> If we have division and remainder calculations with the same operands:
>>>>
>>>>   a = b / c;
>>>>   d = b % c;
>>>>
>>>> We can replace the calculation of remainder with multiplication +
>>>> subtraction, using the result from the previous division:
>>>>
>>>>   a = b / c;
>>>>   d = a * c;
>>>>   d = b - d;
>>>>
>>>> Which will be faster.
>>>
>>> Do you have any benchmarks that show that performance increase?  The ISA
>>> manual specifically says the suggested sequence is div+mod, and while
>>> those suggestions don't always pan out for real hardware it's likely
>>> that at least some implementations will end up with the ISA-suggested
>>> fusions.
>> It'll almost certainly be visible in mcf.  Been there, done that.  In
>> fact, that's why I asked the team Matevos works on to poke at this case
>> as I went through this issue on another processor.
>>
>> It can also be run through LLVM's MCA to estimate counts if you've got a
>> pipeline description.  THe div+rem will come out at around ~40c while a
>> div+mul+sub should weigh in around 25c for Veyron v1.
> 
> Do you have a link to the patches somewhere?  I couldn't find them 
> online, just the custom instruction support.  Or even just some docs 
> describing what the pipeline does, as just basing one performance model 
> on another is kind of a double-edged sword.
It is.  But div/rem is pretty simple.  20c each, not pipelined, using a 
shared unit.  There's some early out paths, but the compiler isn't going 
to be able to model those as they depend on the number of bits on in the 
inputs.  Basically as long as we can do a mult+sub in < 20c, Matevos's 
sequence is faster.

If we have implementations that support fusion at some point, then we 
can twiddle the expander appropriately.  Similarly we could easily 
consider selecting on -Os as well since div+rem is smaller than 
div+mul+sub.  I'm sure Matevos is open to adjustments to that patch.

We haven't done a full eval on the pipeline modeling yet and with gcc in 
stage4, it didn't seem advisable to try and push it through.  Similarly 
I don't think Matevos's patch should really be a gcc-13 thing, it really 
should be gcc-14.

> 
> That said, I think just knowing the processor doesn't do the div+mod 
> fusion is sufficient to turn something like this on for the mtune for 
> that processor.  That's different than turning it on globally, though -- 
> unless it turns out nobody is actually doing the fusion suggested in the 
> ISA manual, which wouldn't be super surprising.
I'm not aware of anyone doing fusion of divmod in the risc-v space.

For prior ports I've worked on, the hardware folks made is painfully 
clear that the cost of adding another output port on the unit was a 
non-starter.  That port had a pretty fast divider with at least some 
overlap and the div + mul + sub sequence was still better in general, 
though the early out cases made it much harder to evaluate.


> 
> Maybe some of the SiFive and T-Head folks can chime in on whether or not 
> their processors perform the fusion in question -- and if so, do the 
> instructions need to say back-to-back?  It doesn't look like we're 
> really targeting the code sequences the ISA suggests as it stands, so 
> maybe it's OK to just switch the default over too?
Happy to take in their input.  I suspect they'll ultimately prefer the 
sequence Matevos is generating.


> 
> It also brings up the question of mulh+mul fusions, which I don't think 
> we've really looked at (though maybe they're a lot less important for 
> rv64).
Not on our radar for V1 or V2.
jeff