From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 5F08B386F0F1 for ; Mon, 27 Jun 2022 09:04:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5F08B386F0F1 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25R8Bsbu027997; Mon, 27 Jun 2022 09:04:20 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gy8tx1e2r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 09:04:20 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25R8FtAe025933; Mon, 27 Jun 2022 09:04:19 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gy8tx1e1a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 09:04:19 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25R8q9sr006205; Mon, 27 Jun 2022 09:04:17 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma03fra.de.ibm.com with ESMTP id 3gwt08t0tw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 09:04:16 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25R94EEp21823938 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Jun 2022 09:04:14 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 915494C046; Mon, 27 Jun 2022 09:04:14 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 107F14C040; Mon, 27 Jun 2022 09:04:13 +0000 (GMT) Received: from [9.200.41.46] (unknown [9.200.41.46]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 27 Jun 2022 09:04:12 +0000 (GMT) Message-ID: Date: Mon, 27 Jun 2022 17:04:12 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [rs6000 PATCH] Improve constant integer multiply using rldimi. Content-Language: en-US To: Roger Sayle References: <006101d8899f$295807b0$7c081710$@nextmovesoftware.com> Cc: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn From: "Kewen.Lin" In-Reply-To: <006101d8899f$295807b0$7c081710$@nextmovesoftware.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: NZRgbnq7C-0tSNbX190Yt1KCtXZ0eUBQ X-Proofpoint-ORIG-GUID: _2AmURexKC0u_SQ8r8d2kd8-GtYuvNr_ Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-27_06,2022-06-24_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 phishscore=0 adultscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 malwarescore=0 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206270039 X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2022 09:04:22 -0000 Hi Roger, on 2022/6/27 04:56, Roger Sayle wrote: > > > This patch tweaks the code generated on POWER for integer multiplications > > by a constant, by making use of rldimi instructions. Much like x86's > > lea instruction, rldimi can be used to implement a shift and add pair > > in some circumstances. For rldimi this is when the shifted operand > > is known to have no bits in common with the added operand. > > > > Hence for the new testcase below: > > > > int foo(int x) > > { > > int t = x & 42; > > return t * 0x2001; > > } > > > > when compiled with -O2, GCC currently generates: > > > > andi. 3,3,0x2a > > slwi 9,3,13 > > add 3,9,3 > > extsw 3,3 > > blr > > > > with this patch, we now generate: > > > > andi. 3,3,0x2a > > rlwimi 3,3,13,0,31-13 > > extsw 3,3 > > blr > > > > It turns out this optimization already exists in the form of a combine > > splitter in rs6000.md, but the constraints on combine splitters, > > requiring three of four input instructions (and generating one or two > > output instructions) mean it doesn't get applied as often as it could. > > This patch converts the define_split into a define_insn_and_split to > > catch more cases (such as the one above). > > > > The one bit that's tricky/controversial is the use of RTL's > > nonzero_bits which is accurate during the combine pass when this > > pattern is first recognized, but not as advanced (not kept up to > > date) when this pattern is eventually split. To support this, > > I've used a "|| reload_completed" idiom. Does this approach seem > > reasonable? [I've another patch of x86 that uses the same idiom]. > > I tested this patch on powerpc64-linux-gnu, it caused the below ICE against test case gcc/testsuite/gcc.c-torture/compile/pr93098.c. gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function ‘foo’: gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizable insn: (insn 104 32 34 2 (set (reg:SI 185 [+4 ]) (ior:SI (and:SI (reg:SI 200 [+4 ]) (const_int 4294967295 [0xffffffff])) (ashift:SI (reg:SI 140) (const_int 32 [0x20])))) "gcc/testsuite/gcc.c-torture/compile/pr93098.c":6:11 -1 (nil)) during RTL pass: subreg3 dump file: pr93098.c.291r.subreg3 gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler error: in extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) gcc/rtl-error.cc:108 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) gcc/rtl-error.cc:116 0x10ae427f extract_insn(rtx_insn*) gcc/recog.cc:2791 0x11b239bb decompose_multiword_subregs gcc/lower-subreg.cc:1555 0x11b25013 execute gcc/lower-subreg.cc:1818 The above trace shows we fails to recog the pattern again due to the inaccurate nonzero_bits information as you pointed out above. There was another patch [1] which wasn't on trunk but touched this same define_split, not sure if that can help or we can follow the similar idea. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/585841.html BR, Kewen