From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id 36E273858D33 for ; Tue, 28 Feb 2023 12:54:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 36E273858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk Received: by angie.orcam.me.uk (Postfix, from userid 500) id AC98492009C; Tue, 28 Feb 2023 13:54:04 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id A577992009B; Tue, 28 Feb 2023 12:54:04 +0000 (GMT) Date: Tue, 28 Feb 2023 12:54:04 +0000 (GMT) From: "Maciej W. Rozycki" To: Alexander Monakov cc: Richard Biener , Jeff Law , Andrew Pinski , Palmer Dabbelt , gcc-patches@gcc.gnu.org Subject: Re: RISC-V: Add divmod instruction support In-Reply-To: <72c83a06-6733-a982-04e8-64dbd754cb5e@ispras.ru> Message-ID: References: <26ca669c-70ce-e475-717e-3c36f1e1c703@gmail.com> <72c83a06-6733-a982-04e8-64dbd754cb5e@ispras.ru> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3488.9 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_INFOUSMEBIZ,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 20 Feb 2023, Alexander Monakov wrote: > > > That's the kind of stuff I'd expect to happen at the tree level though, > > > before expand. > > > > The GIMPLE pass forming divmod could indeed choose to emit the > > div + mul/sub sequence instead if an actual divmod pattern isn't available. > > It could even generate some fake mul/sub/mod RTXen to cost the two > > variants against each other but I seriously doubt any uarch that implements > > division/modulo has a slower mul/sub. > > Making a correct decision requires knowing to which degree the divider is > pipelined, and costs won't properly reflect that. If the divider accepts > a new div/mod instruction every couple of cycles, it's faster to just issue > a div followed by a mod with the same operands. I guess there exist microarchitectures that have their divider pipelined, but I haven't come across one. I think division is not an operation that is commonly optimised for in hw RTL, whether integer or FP, not at least in embedded applications, and given the nature of the operation I believe it would be particularly costly in terms of silicon. I've seen latencies and repeat rates of up to 50 quoted in CPU documentation even for 32-bit integer division while multiplication had a latency of 5 and a repeat rate of 1 (i.e. fully pipelined) for the same microarchitecture. So (taking the data dependency into account) the latency of a DIV + MOD operation would be 100 for said microarchitecture (and the same repeat rate), while a DIV + MUL/SUB sequence would have a latency of 56 and a repeat rate of 50. Quite an improvement. > Therefore I think in this case it's fair for GIMPLE level to just check if > the divmod pattern is available, and let the target do the fine tuning via > the divmod expander. Hmm, we have DFA scheduling available that is supposed to give latencies and repeat rates for individual operations. Wouldn't it be possible to get hold of this information? If we cannot make use of this information at the GIMPLE level (which I'd consider regrettable), then I think that maybe we need a target hook to say division is cheap that would prevent a DIV + MOD to DIV + MUL/SUB transformation from happening, perhaps off by default. I think it would be regrettable too if every backend for targets/subtargets that do not have a hardware DIVMOD operation (e.g. MIPS has one at certain ISA levels only) or a pipelined division would have to code the transformation by hand. > It would make sense for tree-ssa-mathopts to emit div + mul/sub when neither > 'divmod' nor 'mod' patterns are available, because RTL expansion will do the > same, just later, and we'll rely on RTL CSE to clean up the redundant div. That as well. > But RISC-V has both 'div' and 'mod', so as I tried to explain in the first > paragraph we should let the target decide. Still I think it would be best if RISC-V ought to supply a divmod pattern only where fused DIV/MOD execution is present in the microarchitecture. Maciej