From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id AE5C5385AFAD for ; Thu, 27 Jul 2023 17:12:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE5C5385AFAD Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=K87QbNl2DLkw2MeSDLwrfuEnTFbHWfjtrPr8XLIv7P4=; b=rQFeHy+Jbu3gMcKHXrO2FJBarU BWeXX6kDbwLNbF3Fg5UyGZjKOuVOJLJ3P4fg/zNUvs1c43vG/MCtHmKDkLEmsk2rqaQQeD4P7Q2d+ 1b8OPn7+QqgpZJG+UW6vdAg5zp3rh3xtn8qCGKflta/oA89Px57i82U1PfQ3JLVdZ/lueVLjeDJr3 GCqGKeyvklaUm4d7kdLVT0h12bDoMHet6hTLKooUagemT0xnYdDDeT4RTyVwt+5qyizHiE4bIbxrl NsaogcUOR6GUy28uaFvap0lZZIMCWwzj1C1ZHzV7I319EMO15IS/GyH9mjLTtwRKLnB7ERCL8Kn3s YpeKINMw==; Received: from [185.62.158.67] (port=52592 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qP4Xk-0003vX-0s; Thu, 27 Jul 2023 13:12:52 -0400 From: "Roger Sayle" To: "'Richard Biener'" Cc: References: <004c01d9beeb$98c6fcf0$ca54f6d0$@nextmovesoftware.com> In-Reply-To: Subject: RE: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog. Date: Thu, 27 Jul 2023 18:12:50 +0100 Message-ID: <001601d9c0ad$93934e00$bab9ea00$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0017_01D9C0B5.F557B600" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQC5Hm1ImX5/tOh1DwSv3xIAcCWIEAJrbeKdsfu+22A= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0017_01D9C0B5.F557B600 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Richard, You're 100% right. It=E2=80=99s possible to significantly clean-up this = code, replacing the body of the conditional with a call to force_reg and simplifying the = conditions under which it is called. These improvements are implemented in the = patch below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap = and make -k check, both with and without -m32, as usual. Interestingly, the CONCAT clause afterwards is still required (I've = learned something new), as calling force_reg (or gen_reg_rtx) with HCmode, actually = returns a CONCAT instead of a REG, so although the code looks dead, it's required to = build libgcc during a bootstrap. But the remaining clean-up is good, reducing the number of = source lines and making the logic easier to understand. Ok for mainline? 2023-07-27 Roger Sayle Richard Biener gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register. Roger -- > -----Original Message----- > From: Richard Biener > Sent: 25 July 2023 12:50 >=20 > On Tue, Jul 25, 2023 at 1:31=E2=80=AFPM Roger Sayle = > wrote: > > > > This patch is the third in series of fixes for PR > > rtl-optimization/110587, a compile-time regression with -O0, that > > attempts to address the underlying cause. As noted previously, the > > pathological test case pr28071.c contains a large number of useless > > register-to-register moves that can produce quadratic behaviour (in > > LRA). These move are generated during RTL expansion in > > emit_group_load_1, where the middle-end attempts to simplify the > > source before calling extract_bit_field. This is reasonable if the > > source is a complex expression (from before the tree-ssa = optimizers), > > or a SUBREG, or a hard register, but it's not particularly useful to > > copy a pseudo register into a new pseudo register. This patch = eliminates that > redundancy. > > > > The -fdump-tree-expand for pr28071.c compiled with -O0 currently > > contains 777K lines, with this patch it contains 717K lines, i.e. > > saving about 60K lines (admittedly of debugging text output, but it = makes the > point). > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make = bootstrap > > and make -k check, both with and without --target_board=3Dunix{-m32} > > with no new failures. Ok for mainline? > > > > As always, I'm happy to revert this change quickly if there's a > > problem, and investigate why this additional copy might (still) be > > needed on other > > non-x86 targets. >=20 > @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx = orig_src, > tree type, > be loaded directly into the destination. */ > src =3D orig_src; > if (!MEM_P (orig_src) > + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) > && (!CONSTANT_P (orig_src) > || (GET_MODE (orig_src) !=3D mode > && GET_MODE (orig_src) !=3D VOIDmode))) >=20 > so that means the code guarded by the conditional could instead be = transformed > to >=20 > src =3D force_reg (mode, orig_src); >=20 > ? Btw, the || (GET_MODE (orig_src) !=3D mode && GET_MODE (orig_src) = !=3D > VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) = for the > move ... that might also mean we have to use force_reg (GET_MODE = (orig_src) =3D=3D > VOIDmode ? mode : GET_MODE (orig_src), orig_src)) >=20 > Otherwise I think this is OK, as said, using force_reg somehow would = improve > readability here I think. >=20 > I also wonder how the >=20 > else if (GET_CODE (src) =3D=3D CONCAT) >=20 > case will ever trigger with the current code. >=20 > Richard. >=20 > > > > 2023-07-25 Roger Sayle > > > > gcc/ChangeLog > > PR middle-end/28071 > > PR rtl-optimization/110587 > > * expr.cc (emit_group_load_1): Avoid copying a pseudo = register into > > a new pseudo register, i.e. only copy hard regs into a new = pseudo. > > > > ------=_NextPart_000_0017_01D9C0B5.F557B600 Content-Type: text/plain; name="patchhg4.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchhg4.txt" diff --git a/gcc/expr.cc b/gcc/expr.cc=0A= index fff09dc..174f8ac 100644=0A= --- a/gcc/expr.cc=0A= +++ b/gcc/expr.cc=0A= @@ -2622,16 +2622,11 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx = orig_src, tree type,=0A= be loaded directly into the destination. */=0A= src =3D orig_src;=0A= if (!MEM_P (orig_src)=0A= - && (!CONSTANT_P (orig_src)=0A= - || (GET_MODE (orig_src) !=3D mode=0A= - && GET_MODE (orig_src) !=3D VOIDmode)))=0A= + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))=0A= + && !CONSTANT_P (orig_src))=0A= {=0A= - if (GET_MODE (orig_src) =3D=3D VOIDmode)=0A= - src =3D gen_reg_rtx (mode);=0A= - else=0A= - src =3D gen_reg_rtx (GET_MODE (orig_src));=0A= -=0A= - emit_move_insn (src, orig_src);=0A= + gcc_assert (GET_MODE (orig_src) !=3D VOIDmode);=0A= + src =3D force_reg (GET_MODE (orig_src), orig_src);=0A= }=0A= =0A= /* Optimize the access just a bit. */=0A= ------=_NextPart_000_0017_01D9C0B5.F557B600--