From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id D911F3858D32 for ; Mon, 27 Feb 2023 17:09:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D911F3858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 31RH8ai1023754; Mon, 27 Feb 2023 11:08:36 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 31RH8ZJG023751; Mon, 27 Feb 2023 11:08:35 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Mon, 27 Feb 2023 11:08:35 -0600 From: Segher Boessenkool To: Pat Haugen Cc: GCC Patches , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy Message-ID: <20230227170835.GA25951@gate.crashing.org> References: <3cad2a5e-dd68-2fbe-d52b-e077a7405623@linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3cad2a5e-dd68-2fbe-d52b-e077a7405623@linux.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote: > The define_insns for the modulo operation currently force the target > register > to a distinct reg in preparation for a possible future peephole combining > div/mod. But this can lead to cases of a needless copy being inserted. Fixed > with the following patch. Have you verified those peepholes still match? Do those peepholes actually improve performance? On new CPUs? The code here says ;; On machines with modulo support, do a combined div/mod the old fashioned ;; method, since the multiply/subtract is faster than doing the mod instruction ;; after a divide. but that really should not be true: we can do the div and mod in parallel (except in SMT4 perhaps, which we never schedule for anyway), so that should always be strictly faster. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ All files in gcc.target/powerpc/ test for this already. Just leave off the target clause here? > +/* { dg-require-effective-target powerpc_p9modulo_ok } */ Leave out this line, because ... > +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ ... the -mcpu= forces it to true always. > +/* Verify r3 is used as source and target, no copy inserted. */ > +/* { dg-final { scan-assembler-not {\mmr\M} } } */ That is probably good enough, yeah, since the test results in only a handful of insns. Segher