From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=YEbi=6X=kernel.crashing.org=segher@sourceware.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
	by sourceware.org (Postfix) with ESMTP id 811DC3858C5F
	for <gcc-patches@gcc.gnu.org>; Mon, 27 Feb 2023 20:54:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 811DC3858C5F
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
	by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 31RKrVAk005458;
	Mon, 27 Feb 2023 14:53:31 -0600
Received: (from segher@localhost)
	by gate.crashing.org (8.14.1/8.14.1/Submit) id 31RKrVGU005457;
	Mon, 27 Feb 2023 14:53:31 -0600
X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f
Date: Mon, 27 Feb 2023 14:53:31 -0600
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Pat Haugen <pthaugen@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, "Kewen.Lin" <linkw@linux.ibm.com>,
        David Edelsohn <dje.gcc@gmail.com>,
        Peter Bergner <bergner@linux.ibm.com>
Subject: Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy
Message-ID: <20230227205331.GC25951@gate.crashing.org>
References: <3cad2a5e-dd68-2fbe-d52b-e077a7405623@linux.ibm.com> <20230227170835.GA25951@gate.crashing.org> <20578dd1-fba8-858a-a6e5-cdbb3ca0b6c1@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20578dd1-fba8-858a-a6e5-cdbb3ca0b6c1@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi!

On Mon, Feb 27, 2023 at 02:12:23PM -0600, Pat Haugen wrote:
> On 2/27/23 11:08 AM, Segher Boessenkool wrote:
> >On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:
> >>The define_insns for the modulo operation currently force the target
> >>register
> >>to a distinct reg in preparation for a possible future peephole combining
> >>div/mod. But this can lead to cases of a needless copy being inserted. 
> >>Fixed
> >>with the following patch.
> >
> >Have you verified those peepholes still match?
> 
> Yes, I verified the peepholes still match and transform the sequence.

Please add the testcases for that then?  Or do we have tests for it
already :-)

> >Do those peepholes actually improve performance?  On new CPUs?  The code
> >here says
> >;; On machines with modulo support, do a combined div/mod the old fashioned
> >;; method, since the multiply/subtract is faster than doing the mod 
> >instruction
> >;; after a divide.
> >but that really should not be true: we can do the div and mod in
> >parallel (except in SMT4 perhaps, which we never schedule for anyway),
> >so that should always be strictly faster.
> >
> Since the modulo insns were introduced in Power9, we're just talking 
> Power9/Power10. On paper, I would agree that separate div/mod could be 
> slightly faster to get the mod result,

"Slightly".  It takes 12 cycles for the two in parallel (64-bit, p9),
but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2).  It
is all worse if the units are busy of course, or if there are other
problems.

> but if you throw in another 
> independent div or mod in the insn stream then doing the peephole should 
> be a clear win since that 3rd insn can execute in parallel with the 
> initial divide as opposed to waiting for the one of the first div/mod to 
> clear the exclusive stage of the pipe.

That is the SMT4 case, the one we do not optimise for.  SMT2 and ST can
do four in parallel.  This means you can start a div or mod every 2nd
cycle on average, so it is very unlikely you will ever be limited by
this on real code.


Segher