From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 46E723858D32 for ; Mon, 27 Feb 2023 20:12:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 46E723858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31RJLZrU012590; Mon, 27 Feb 2023 20:12:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=VUfDiJrb6ocwhFJHek4bawqQBrV+TeJE1k301vS1yjU=; b=UJrHsyEyOsnAsa2bp1EdxnFHkdFBfsRFwqffhSeX4hmiYneEMDwgOH0dsRXfD9oRe64B YuCAy+ABmkQFbXf6LLJG+oZea0q07JRVfZtY8Q4qsf4JkAmUR5gd+srAQ9T7BhDVsa9O vwcNXDLYLaE3nW1SYauJhnoUwn2BzhGUrog7ZTniYVrqhemmFGdneaWN57TyNr3ixRrG u6EmtrGXUbnHwUwJWkBU2pNZOnLAdWgAzEKaoQOjp85kxf6c1V7s3QLl8puMlkNQftJB WWwxkdt4Zg/DG63WBM3Nf/dEG/tiujuqO+WAGXAoELQlytZjJWWkD4O5E2+a2RQQH0mW NA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3p12kx93jr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Feb 2023 20:12:28 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31RJd2CY035225; Mon, 27 Feb 2023 20:12:27 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3p12kx93j8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Feb 2023 20:12:27 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31RIRjDr005646; Mon, 27 Feb 2023 20:12:26 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([9.208.129.113]) by ppma04dal.us.ibm.com (PPS) with ESMTPS id 3nybex0at9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Feb 2023 20:12:26 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31RKCOMj5702360 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Feb 2023 20:12:24 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 78E455805A; Mon, 27 Feb 2023 20:12:24 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EF7E958060; Mon, 27 Feb 2023 20:12:23 +0000 (GMT) Received: from [9.65.241.84] (unknown [9.65.241.84]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 27 Feb 2023 20:12:23 +0000 (GMT) Message-ID: <20578dd1-fba8-858a-a6e5-cdbb3ca0b6c1@linux.ibm.com> Date: Mon, 27 Feb 2023 14:12:23 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy Content-Language: en-US To: Segher Boessenkool Cc: GCC Patches , "Kewen.Lin" , David Edelsohn , Peter Bergner References: <3cad2a5e-dd68-2fbe-d52b-e077a7405623@linux.ibm.com> <20230227170835.GA25951@gate.crashing.org> From: Pat Haugen In-Reply-To: <20230227170835.GA25951@gate.crashing.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 3s2rBtEa9dfMhOMWWDl9Akr4a-RyIVSA X-Proofpoint-ORIG-GUID: YelFYevgFnyQ3duN3bJdXM-Xeoxh0Khl X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-27_17,2023-02-27_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=737 adultscore=0 lowpriorityscore=0 malwarescore=0 suspectscore=0 clxscore=1011 phishscore=0 impostorscore=0 mlxscore=0 bulkscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302270155 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2/27/23 11:08 AM, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote: >> The define_insns for the modulo operation currently force the target >> register >> to a distinct reg in preparation for a possible future peephole combining >> div/mod. But this can lead to cases of a needless copy being inserted. Fixed >> with the following patch. > > Have you verified those peepholes still match? Yes, I verified the peepholes still match and transform the sequence. > > Do those peepholes actually improve performance? On new CPUs? The code > here says > ;; On machines with modulo support, do a combined div/mod the old fashioned > ;; method, since the multiply/subtract is faster than doing the mod instruction > ;; after a divide. > but that really should not be true: we can do the div and mod in > parallel (except in SMT4 perhaps, which we never schedule for anyway), > so that should always be strictly faster. > Since the modulo insns were introduced in Power9, we're just talking Power9/Power10. On paper, I would agree that separate div/mod could be slightly faster to get the mod result, but if you throw in another independent div or mod in the insn stream then doing the peephole should be a clear win since that 3rd insn can execute in parallel with the initial divide as opposed to waiting for the one of the first div/mod to clear the exclusive stage of the pipe. >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c >> @@ -0,0 +1,17 @@ >> +/* { dg-do compile { target { powerpc*-*-* } } } */ > > All files in gcc.target/powerpc/ test for this already. Just leave off > the target clause here? > >> +/* { dg-require-effective-target powerpc_p9modulo_ok } */ > > Leave out this line, because ... > >> +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ > > ... the -mcpu= forces it to true always. Will update. -Pat > >> +/* Verify r3 is used as source and target, no copy inserted. */ > >> +/* { dg-final { scan-assembler-not {\mmr\M} } } */ > > That is probably good enough, yeah, since the test results in only a > handful of insns. > > > Segher